# Decision Tree Regression

## Dataset

### Layout

* Columns: 5
	* Engine temperature
	* Exhaust vacuum
	* Ambient pressure
	* Relative humidity
	* Energy output
* Rows: 1000s of observations
	* Each row represents observation of captured metrics at power plant
		* Features:
			* Engine temperature
			* Exhaust vacuum
			* Ambient pressure
			* Relative humidity
		* Dependent variable:
			* Energy output

### Background

* Real world dataset from UCI ML repository
	* Web site that contains many real world datasets in which to practice ML
* Combined cycle power plant dataset

### Goals

* Build regression models to predict energy output

## Import Libraries

In [11]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Import Dataset

In [12]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [13]:
print(X)

[[  14.96   41.76 1024.07   73.17]
 [  25.18   62.96 1020.04   59.08]
 [   5.11   39.4  1012.16   92.14]
 ...
 [  31.32   74.33 1012.92   36.48]
 [  24.48   69.45 1013.86   62.39]
 [  21.6    62.52 1017.23   67.87]]


In [14]:
print(y)

[463.26 444.37 488.56 ... 429.57 435.74 453.28]


## Split Dataset into Training Set and Test Set

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Train Decision Tree Regression Model on Training Set

In [16]:
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X_train, y_train)

## Predict Test Set Results

In [17]:
y_pred = regressor.predict(X_test)

In [18]:
np.set_printoptions(precision = 2)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

[[431.37 431.23]
 [459.59 460.01]
 [460.06 461.14]
 ...
 [471.46 473.26]
 [437.76 438.  ]
 [462.74 463.28]]


## Evaluate Model Performance

In [19]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.9228349015829475