# Decision Tree Regression

## Setup

Firstly, a few common modules will be imported.  

In [13]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os
import pandas as pd

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# to make this notebook's output stable across runs
np.random.seed(42)

## Importing the dataset

Use pandas to store the data obtained and store it as Pandas __DataFrame__ using the function _**pd.read_csv**_.

In [14]:
traindataset = pd.read_csv('C:/Users/Shien/OneDrive/Documents/GitHub/DataMiningProject/train.csv')

r = traindataset['R'].values
c = traindataset['C'].values
uin = traindataset['u_in'].values
uout = traindataset['u_out'].values
p = traindataset['pressure'].values

traindataset.head(5)



Unnamed: 0,id,breath_id,R,C,time_step,u_in,u_out,pressure
0,1,1,20,50,0.0,0.083334,0,5.837492
1,2,1,20,50,0.033652,18.383041,0,5.907794
2,3,1,20,50,0.067514,22.509278,0,7.876254
3,4,1,20,50,0.101542,22.808822,0,11.742872
4,5,1,20,50,0.135756,25.35585,0,12.234987


In [15]:
testdataset = pd.read_csv('C:/Users/Shien/OneDrive/Documents/GitHub/DataMiningProject/test.csv')

R = testdataset['R'].values
C = testdataset['C'].values
Uin = testdataset['u_in'].values
Uout = testdataset['u_out'].values

testdataset.head(5)

Unnamed: 0,id,breath_id,R,C,time_step,u_in,u_out
0,1,0,5,20,0.0,0.0,0
1,2,0,5,20,0.031904,7.515046,0
2,3,0,5,20,0.063827,14.651675,0
3,4,0,5,20,0.095751,21.23061,0
4,5,0,5,20,0.127644,26.320956,0


In [16]:
traindataset.shape

(6036000, 8)

In [17]:
testdataset.shape

(4024000, 7)

Delete the column _**'pressure'**_ in traindataset to make both the traindataset and testdataset to have same number of columns.

In [18]:
del(traindataset['pressure'])

Print out the shape to check whether the column is deleted.

In [20]:
traindataset.shape

(6036000, 7)

In [21]:
testdataset.shape

(4024000, 7)

## Training the Decision Tree Regression model on the training set

The __DecisionTreeRegressor__ class is imported from sklearn.tree and it is assigned to a variable '__regressor__'. The data will be fit to the model by using __regression.fit__ function. The __reshape(-1,1)__ function is used to reshape variables into a single column vector.

In [22]:
# Fitting Decision Tree Regression to the dataset
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor()
regressor.fit(traindataset, p.reshape(-1,1))

DecisionTreeRegressor()

Test whether the model is working. The model is working when the real values is the same as the prediscted values

In [23]:
P_predict = regressor.predict(traindataset)

In [24]:
resultdf = pd.DataFrame({'Real Values':p.reshape(-1), 'Predicted Values':P_predict.reshape(-1)})
resultdf

Unnamed: 0,Real Values,Predicted Values
0,5.837492,5.837492
1,5.907794,5.907794
2,7.876254,7.876254
3,11.742872,11.742872
4,12.234987,12.234987
...,...,...
6035995,3.869032,3.869032
6035996,3.869032,3.869032
6035997,3.798729,3.798729
6035998,4.079938,4.079938


## Predicting the pressure using testdata set

In [25]:
test_predict = regressor.predict(testdataset)
test_predict_result = pd.DataFrame({'pressure':test_predict.reshape(-1)})
test_predict_result 

Unnamed: 0,pressure
0,6.329607
1,4.642355
2,5.767190
3,9.844714
4,9.282297
...,...
4023995,5.134470
4023996,6.259305
4023997,5.064168
4023998,6.048398
