# Support Vector Regression (SVR)

## Dataset

### Layout

* Columns: 5
	* Engine temperature
	* Exhaust vacuum
	* Ambient pressure
	* Relative humidity
	* Energy output
* Rows: 1000s of observations
	* Each row represents observation of captured metrics at power plant
		* Features:
			* Engine temperature
			* Exhaust vacuum
			* Ambient pressure
			* Relative humidity
		* Dependent variable:
			* Energy output

### Background

* Real world dataset from UCI ML repository
	* Web site that contains many real world datasets in which to practice ML
* Combined cycle power plant dataset

### Goals

* Build regression models to predict energy output

## Import Libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Import Dataset

In [3]:
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [4]:
y = y.reshape(len(y), 1)

In [5]:
print(X)

[[  14.96   41.76 1024.07   73.17]
 [  25.18   62.96 1020.04   59.08]
 [   5.11   39.4  1012.16   92.14]
 ...
 [  31.32   74.33 1012.92   36.48]
 [  24.48   69.45 1013.86   62.39]
 [  21.6    62.52 1017.23   67.87]]


In [6]:
print(y)

[[463.26]
 [444.37]
 [488.56]
 ...
 [429.57]
 [435.74]
 [453.28]]


## Split Dataset into Training Set and Test Set

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Feature Scaling

In [8]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X_train = sc_X.fit_transform(X_train)
y_train = sc_y.fit_transform(y_train)

In [9]:
print(X_train)

[[-1.13572795 -0.88685592  0.67357894  0.52070558]
 [-0.80630243 -0.00971567  0.45145467  0.14531044]
 [ 1.77128416  1.84743445  0.24279248 -1.88374143]
 ...
 [-0.38409993 -1.24886277  0.84522042  0.13092486]
 [-0.9232821  -1.04155299  1.54693117  0.8830852 ]
 [ 1.70136528  1.05824381 -1.20438076 -2.42285818]]


In [10]:
print(X_test)

[[  28.66   77.95 1009.56   69.07]
 [  17.48   49.39 1021.51   84.53]
 [  14.86   43.14 1019.21   99.14]
 ...
 [  12.24   44.92 1023.74   88.21]
 [  27.28   47.93 1003.46   59.22]
 [  17.28   39.99 1007.09   74.25]]


In [11]:
print(y_train)

[[ 1.15069786]
 [ 0.79540777]
 [-1.30936356]
 ...
 [ 0.27595724]
 [ 0.49346982]
 [-1.53508417]]


In [12]:
print(y_test)

[[431.23]
 [460.01]
 [461.14]
 ...
 [473.26]
 [438.  ]
 [463.28]]


## Train SVR Model on Training Set

In [13]:
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


## Predict Test Set Results

In [14]:
y_pred = sc_y.inverse_transform(regressor.predict(sc_X.transform(X_test)).reshape(-1, 1))

In [15]:
np.set_printoptions(precision = 2)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

[[434.05 431.23]
 [457.93 460.01]
 [461.02 461.14]
 ...
 [470.6  473.26]
 [439.42 438.  ]
 [460.92 463.28]]


## Evaluate Model Performance

In [16]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.9480795111869857