# Decision Tree Regression

## Dataset

### Layout

* Columns: 5
	* Engine temperature
	* Exhaust vacuum
	* Ambient pressure
	* Relative humidity
	* Energy output
* Rows: 1000s of observations
	* Each row represents observation of captured metrics at power plant
		* Features:
			* Engine temperature
			* Exhaust vacuum
			* Ambient pressure
			* Relative humidity
		* Dependent variable:
			* Energy output

### Background

* Real world dataset from UCI ML repository
	* Web site that contains many real world datasets in which to practice ML
* Combined cycle power plant dataset

### Goals

* Build regression models to predict energy output

## Import Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Import Dataset

In [2]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [3]:
print(*X[:25], sep='\n')

[  14.96   41.76 1024.07   73.17]
[  25.18   62.96 1020.04   59.08]
[   5.11   39.4  1012.16   92.14]
[  20.86   57.32 1010.24   76.64]
[  10.82   37.5  1009.23   96.62]
[  26.27   59.44 1012.23   58.77]
[  15.89   43.96 1014.02   75.24]
[   9.48   44.71 1019.12   66.43]
[  14.64   45.   1021.78   41.25]
[  11.74   43.56 1015.14   70.72]
[  17.99   43.72 1008.64   75.04]
[  20.14   46.93 1014.66   64.22]
[  24.34   73.5  1011.31   84.15]
[  25.71   58.59 1012.77   61.83]
[  26.19   69.34 1009.48   87.59]
[  21.42   43.79 1015.76   43.08]
[  18.21   45.   1022.86   48.84]
[  11.04   41.74 1022.6    77.51]
[  14.45   52.75 1023.97   63.59]
[  13.97   38.47 1015.15   55.28]
[  17.76   42.42 1009.09   66.26]
[   5.41   40.07 1019.16   64.77]
[   7.76   42.28 1008.52   83.31]
[  27.23   63.9  1014.3    47.19]
[  27.36   48.6  1003.18   54.93]


In [4]:
print(*y[:25], sep='\n')

463.26
444.37
488.56
446.48
473.9
443.67
467.35
478.42
475.98
477.5
453.02
453.99
440.29
451.28
433.99
462.19
467.54
477.2
459.85
464.3
468.27
495.24
483.8
443.61
436.06


## Split Dataset into Training Set and Test Set

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Train Decision Tree Regression Model on Training Set

In [6]:
from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state=0)
regressor.fit(X_train, y_train)

## Predict Test Set Results

In [7]:
y_pred = regressor.predict(X_test)

In [8]:
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

[[431.37 431.23]
 [459.59 460.01]
 [460.06 461.14]
 ...
 [471.46 473.26]
 [437.76 438.  ]
 [462.74 463.28]]


## Evaluate Model Performance

In [9]:
from sklearn.metrics import r2_score

r2_score(y_test, y_pred)

0.9228349015829475