<h1 align="center"> House Price Dataset   </h1> 

#Dataset despcription

Context
Driver behavior is one of the most important aspects in the design, development, and application of Advanced Driving Assistance Systems (ADAS) and Intelligent Transportation Systems (ITS), which can be affected by many factors. If you are able to measure the driving style of your staff, there is a lot of actions you can take in order to improve fleet safety, global road safety as well as fuel efficiency and emissions.

Content
Dataset for modeling risky driver behaviors based on accelerometer (X, Y, Z axis in meters per second squared (m/$s^2$)) and gyroscope (X, Y, Z axis in degrees per second (°/s) ) data.
* Sampling Rate: Average 2 samples (rows) per second.
* Cars: Ford Fiesta 1.4, Ford Fiesta 1.25, Hyundai i20.
* Drivers: 3 different drivers with the ages of 27, 28 and 37.
* Best Window Size: 14 seconds
* Sensor: MPU6050
* Device: Raspberry Pi 3 Model B
> Driver Behaviors:

1.   Sudden Acceleration (Class Label: 1)
2.   Sudden Right Turn (Class Label: 2)
3. Sudden Left Turn (Class Label: 3)
4. Sudden Break (Class Label: 4)


> Acknowledgements:
> > Yuksel, Asim; Atmaca, Şerafettin (2020), “Driving Behavior Dataset”, Mendeley Data, V2, doi: 10.17632/jj3tw8kj6h.2

id :a notation for a house

date: Date house was sold

price: Price is prediction target

bedrooms: Number of Bedrooms/House

bathrooms: Number of bathrooms/bedrooms

sqft_living: square footage of the home

sqft_lot: square footage of the lot

floors :Total floors (levels) in house

waterfront :House which has a view to a waterfront

view: Has been viewed

condition :How good the condition is Overall

grade: overall grade given to the housing unit, based on King County grading system

sqft_above :square footage of house apart from basement

sqft_basement: square footage of the basement

yr_built :Built Year

yr_renovated :Year when house was renovated

zipcode:zip code

lat: Latitude coordinate

long: Longitude coordinate

sqft_living15 :Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area

sqft_lot15 :lotSize area in 2015(implies-- some renovations)

# Exploring the dataset

In [2]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.model_selection import GridSearchCV

import matplotlib.pyplot as plt
import seaborn as sns





In [1]:
#put your path here
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Colab Notebooks/Na GITa/

Mounted at /content/drive
/content/drive/MyDrive/Colab Notebooks/Na GITa


In [3]:
df = pd.read_csv("data/houses.csv")
df.sample(10)

Unnamed: 0,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,price
299,3.0,1.75,1590.0,11914.0,1.0,0.0,2.0,3.0,7.0,1090.0,500.0,1957.0,0.0,98188.0,47.4427,-122.274,1630.0,26.75
708,3.0,1.75,1490.0,7150.0,1.0,0.0,0.0,5.0,6.0,1490.0,0.0,1967.0,0.0,98059.0,47.5015,-122.124,1350.0,31.0
13,3.0,1.75,1370.0,9680.0,1.0,0.0,0.0,4.0,7.0,1370.0,0.0,1977.0,0.0,98074.0,47.6127,-122.045,1370.0,40.0
969,4.0,2.25,2360.0,8616.0,2.0,0.0,0.0,4.0,8.0,2360.0,0.0,1974.0,0.0,98033.0,47.6495,-122.198,2360.0,75.25
73,4.0,2.5,2380.0,5000.0,2.0,0.0,0.0,3.0,8.0,2380.0,0.0,2005.0,0.0,98038.0,47.3608,-122.036,2420.0,36.0
797,4.0,2.0,1740.0,9000.0,1.0,0.0,0.0,5.0,8.0,1740.0,0.0,1958.0,0.0,98033.0,47.6815,-122.198,1850.0,71.0
55,4.0,2.5,2830.0,5000.0,2.0,0.0,0.0,3.0,9.0,2830.0,0.0,1995.0,0.0,98105.0,47.6597,-122.29,1950.0,88.5
332,3.0,2.0,1810.0,10530.0,1.0,0.0,2.0,3.0,8.0,1810.0,0.0,1991.0,0.0,98022.0,47.1913,-122.012,1910.0,29.5
685,5.0,2.75,2990.0,6768.0,2.0,0.0,0.0,3.0,9.0,2990.0,0.0,2014.0,0.0,98006.0,47.5462,-122.182,2870.0,80.2541
232,6.0,2.75,2940.0,7350.0,1.0,0.0,0.0,3.0,8.0,1780.0,1160.0,1978.0,0.0,98023.0,47.3103,-122.339,2120.0,31.5


In [4]:
df

Unnamed: 0,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,price
0,3.0,1.00,1180.0,5650.0,1.0,0.0,0.0,3.0,7.0,1180.0,0.0,1955.0,0.0,98178.0,47.5112,-122.257,1340.0,22.190
1,3.0,2.25,2570.0,7242.0,2.0,0.0,0.0,3.0,7.0,2170.0,400.0,1951.0,1991.0,98125.0,47.7210,-122.319,1690.0,53.800
2,2.0,1.00,770.0,10000.0,1.0,0.0,0.0,3.0,6.0,770.0,0.0,1933.0,0.0,98028.0,47.7379,-122.233,2720.0,18.000
3,4.0,3.00,1960.0,5000.0,1.0,0.0,0.0,5.0,7.0,1050.0,910.0,1965.0,0.0,98136.0,47.5208,-122.393,1360.0,60.400
4,3.0,2.00,1680.0,8080.0,1.0,0.0,0.0,3.0,8.0,1680.0,0.0,1987.0,0.0,98074.0,47.6168,-122.045,1800.0,51.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,4.0,2.50,1860.0,6325.0,2.0,0.0,0.0,4.0,7.0,1860.0,0.0,1991.0,0.0,98038.0,47.3492,-122.030,1860.0,29.100
996,2.0,2.75,1590.0,20917.0,1.5,0.0,0.0,3.0,5.0,1590.0,0.0,1920.0,0.0,98001.0,47.2786,-122.250,1310.0,19.995
997,2.0,1.00,850.0,2340.0,1.0,0.0,0.0,3.0,7.0,850.0,0.0,1922.0,0.0,98105.0,47.6707,-122.328,1300.0,55.350
998,2.0,1.00,1030.0,4188.0,1.0,0.0,0.0,3.0,8.0,1030.0,0.0,1981.0,0.0,98038.0,47.3738,-122.057,1450.0,18.995


#TO DO: Interactive plots

#TO DO: Elastic Net Regression

#TO DO: AdaBoost

#TO DO: XGBoost

In [None]:
# %pip install dtreeviz[xgboost] 
# from dtreeviz.trees import dtreeviz # remember to load the package

#trees.dtreeviz(xgb_model, X_, d[target], features, target, tree_index=1)
# dtreeviz(clf_xgb, X_train, y_train,  list(X_train.columns), "Target",  
#           class_names=['Sudden Acceleration', 'Sudden Right Turn', 'Sudden Left Turn', 'Sudden Break'], 
#           tree_index=0, orientation = 'LR', scale = 2)
