# Decision Tree Regressor

## Introduction
Here we got first model to predict the data. In this document we compile basic steps that need to be incorporated in ML model.

## Setup

In [None]:
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
import warnings
warnings.filterwarnings("always")

melbourne_file_path = 'data/mel_housing/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 

## CleanUp and Parameters Selection
In this data there are few rows got non numerical values which could mess up the model we are using. We will remove those rows before feeding it to our model.
We also need to choose what our feature and target parameters.

In [None]:
# removing rows with na
melbourne_data = melbourne_data.dropna(0)

# setting features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
X.head()

# setting target
y = melbourne_data.Price
y.head()

## Splitting data to training and test
We validate the model using separate test data as model might have closely followed behavior specific to training data which might not be true in case real world data.

In [None]:
from sklearn.model_selection import train_test_split

train_X, val_X, train_y, val_y =  train_test_split(X,y)

## Train the model
We use DecisssionTreeRegressor by scikit-learn to generate a model to fit the training data.

In [None]:
# random_state can be any number. It is specified to make sure the results be same in each run
melbourne_model = DecisionTreeRegressor(random_state=1) 
melbourne_model.fit(train_X,train_y)

## Prediction 
Here we do predictions for both training data and test data for later proving a point.

In [None]:
pred_train = melbourne_model.predict(train_X)
pred_val = melbourne_model.predict(val_X)

## Evaluation
We use mean absolute error on our prediction.
We can see that our prediction seems to work better on training data than on validation data which model has never seen.

In [None]:
from sklearn.metrics import mean_absolute_error

print('training error ',str(mean_absolute_error(pred_train, train_y)))
print('test error ',str(mean_absolute_error(pred_val, val_y)))
