# MANU 465 Assignment 1 - Power Plant Energy

### Author:

Liam Bontkes, 25530163

## Project Description

This project uses a dataset of ambient variables from a Combined Cycle Power Plant to predict electrical energy output.
The project uses two different methods to calculate electrical energy output: multiple regression and SVM regression.

## Data Preprocessing

### Import libraries

In [68]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Import the dataset

In [69]:
dataset = pd.read_csv('../Data/Power_Plant_Data.csv')

### View the data

In [70]:
pd.DataFrame(dataset)

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.40,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.50,1009.23,96.62,473.90
...,...,...,...,...,...
9563,16.65,49.69,1014.01,91.00,460.03
9564,13.19,39.18,1023.67,66.78,469.62
9565,31.32,74.33,1012.92,36.48,429.57
9566,24.48,69.45,1013.86,62.39,435.74


### Separate the Input and Output data

In [71]:
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [72]:
# check input data
pd.DataFrame(x)

Unnamed: 0,0,1,2,3
0,14.96,41.76,1024.07,73.17
1,25.18,62.96,1020.04,59.08
2,5.11,39.40,1012.16,92.14
3,20.86,57.32,1010.24,76.64
4,10.82,37.50,1009.23,96.62
...,...,...,...,...
9563,16.65,49.69,1014.01,91.00
9564,13.19,39.18,1023.67,66.78
9565,31.32,74.33,1012.92,36.48
9566,24.48,69.45,1013.86,62.39


### Take care of missing data

In [73]:
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(
    missing_values=np.nan,
    strategy='mean'
)

x = imputer.fit_transform(x)

In [74]:
# check data
print(pd.DataFrame(x))
print(pd.DataFrame(y))

          0      1        2      3
0     14.96  41.76  1024.07  73.17
1     25.18  62.96  1020.04  59.08
2      5.11  39.40  1012.16  92.14
3     20.86  57.32  1010.24  76.64
4     10.82  37.50  1009.23  96.62
...     ...    ...      ...    ...
9563  16.65  49.69  1014.01  91.00
9564  13.19  39.18  1023.67  66.78
9565  31.32  74.33  1012.92  36.48
9566  24.48  69.45  1013.86  62.39
9567  21.60  62.52  1017.23  67.87

[9568 rows x 4 columns]
           0
0     463.26
1     444.37
2     488.56
3     446.48
4     473.90
...      ...
9563  460.03
9564  469.62
9565  429.57
9566  435.74
9567  453.28

[9568 rows x 1 columns]


### Scale data

In [75]:
from sklearn.preprocessing import StandardScaler

sc_y = StandardScaler()
sc_x = StandardScaler()

x = sc_x.fit_transform(x)

In [76]:
# check data
print(pd.DataFrame(x))

             0         1         2         3
0    -0.629519 -0.987297  1.820488 -0.009519
1     0.741909  0.681045  1.141863 -0.974621
2    -1.951297 -1.173018 -0.185078  1.289840
3     0.162205  0.237203 -0.508393  0.228160
4    -1.185069 -1.322539 -0.678470  1.596699
...        ...       ...       ...       ...
9563 -0.402737 -0.363242  0.126450  1.211755
9564 -0.867037 -1.190331  1.753131 -0.447205
9565  1.565840  1.575811 -0.057099 -2.522618
9566  0.647976  1.191778  0.101191 -0.747901
9567  0.261507  0.646419  0.668677 -0.372545

[9568 rows x 4 columns]


### Split dataset into training and test sets

In [77]:
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2, random_state=1)

## Method 1: Multiple Linear Regression

### Train the Linear Regression model

In [78]:
from sklearn.linear_model import LinearRegression

linear_regressor = LinearRegression()
linear_regressor.fit(train_x, train_y)

LinearRegression()

### Test the model

In [79]:
from sklearn.metrics import mean_squared_error

# predict with linear model
linear_y = linear_regressor.predict(test_x)

# evaluate performance
linear_rmse = (mean_squared_error(linear_y, test_y)) ** 0.5
print(f"Linear Model RMSE = {linear_rmse}")

Linear Model RMSE = 4.508879190536166


In [84]:
# check point output
linear_y_test = linear_regressor.predict([test_x[0]])
print(f"Data point 0 prediction = {linear_y_test}")

Data point 0 prediction = [457.25522108]


## Method 2: SVM Regression

### Train the SVM Regression model

In [81]:
from sklearn.svm import SVR

# reshape y to fit StandardScaler
svm_train_y = train_y.reshape(len(train_y), 1)

# scale y to fit the SVM model
svm_train_y = sc_y.fit_transform(svm_train_y)

svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(train_x, svm_train_y)

  return f(*args, **kwargs)


SVR()

### Test the model

In [82]:
from sklearn.metrics import mean_squared_error

# predict with linear model
svm_y = svm_regressor.predict(test_x)

# inverse transform the result
svm_y = svm_y.reshape(len(svm_y), 1)
svm_y = sc_y.inverse_transform(svm_y)

# evaluate performance
svm_rmse = (mean_squared_error(svm_y, test_y)) ** 0.5
print(f"SVM Model RMSE = {svm_rmse}")

SVM Model RMSE = 3.9490472947932282


## Comparing Results

In [83]:
print(f"Linear Model RMSE   = {linear_rmse}")
print(f"SVM Model RMSE      = {svm_rmse}")

Linear Model RMSE   = 4.508879190536166
SVM Model RMSE      = 3.9490472947932282


The root mean squared error of the SVM model is lower than the Linear model, which means that it creates a more accurate
prediction.