# Assignment 1 The Power Plant Energy

## Project Description

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011) when the plant was set to work with a full load. The dataset consists of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH), and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.

Features consist of hourly average ambient variables

- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 millibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in the range 25.36-81.56 cm Hg
- Net hourly electrical energy output (EP) 420.26-495.76 MW

The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.

Prepare a Python notebook (Jupyter NB) report, that contains a brief description of the project, and the features (variables); then split the dataset into train and test sets (80 & 20%), if necessary do the feature scaling, and build a model which can predict the EP (electrical energy output), using two methods and compare the results: Method 1. Multiple Regression and Method 2. SVM Regression.

## Importing Some Basic Libraries

In [210]:
import numpy as np 
import pandas as pd 
from matplotlib import pyplot as plt

## Importing the Dataset

In [211]:
dataset = pd.read_csv('Power Plant Data.csv')

## Showing the Dataset in a Table

In [212]:
dataset.head()

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


## A Quick Review of the Data

In [213]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9568 entries, 0 to 9567
Data columns (total 5 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Ambient Temperature (C)               9568 non-null   float64
 1   Exhaust Vacuum (cm Hg)                9568 non-null   float64
 2   Ambient Pressure (milibar)            9568 non-null   float64
 3   Relative Humidity (%)                 9568 non-null   float64
 4   Hourly Electrical Energy output (MW)  9568 non-null   float64
dtypes: float64(5)
memory usage: 373.9 KB


 ## Separate the Input and Output

In [214]:
X = dataset.iloc[:, 0:4]
y = dataset.iloc[:, -1]

## Showing the Input Data in a Table Format

In [215]:
X.head()

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%)
0,14.96,41.76,1024.07,73.17
1,25.18,62.96,1020.04,59.08
2,5.11,39.4,1012.16,92.14
3,20.86,57.32,1010.24,76.64
4,10.82,37.5,1009.23,96.62


## A Quick Check of the Output Data

In [216]:
y.head()

0    463.26
1    444.37
2    488.56
3    446.48
4    473.90
Name: Hourly Electrical Energy output (MW), dtype: float64

## Splitting the Dataset into the Training Set and Test Set

In [217]:
from sklearn.metrics import accuracy_score


In [218]:
from sklearn.model_selection import train_test_split

In [219]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [220]:
# X_train.shape

In [221]:
# X_test.shape

# Method 1: Training the Multiple Linear Regression Model on the Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

## Checking the Model with the Test set

In [222]:
y_pred = regressor.predict(X_test)
pd.DataFrame([y_pred,y_test])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1904,1905,1906,1907,1908,1909,1910,1911,1912,1913
0,457.255221,466.719274,440.366949,482.57801,474.880547,448.824227,440.149749,478.061406,466.985618,479.480139,...,430.63655,428.319126,487.944282,464.437471,455.293903,445.676877,447.891673,476.405029,424.616097,463.911411
1,458.96,463.29,435.27,484.31,473.55,456.3,436.02,488.75,469.75,482.83,...,438.28,432.19,494.87,463.5,449.33,446.4,457.12,476.22,440.29,467.92


In [223]:
# print(accuracy_score(y_pred, y_test))

## Scaling the Features

In [224]:
# Since the input variables are all in a same range,we can skip this step
# Later, we will come back and add this step to check the difference 
""""
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X[:, 3:6] = sc.fit_transform(X[:, 3:6])
pd.DataFrame(X)
"""""

'"\nfrom sklearn.preprocessing import StandardScaler\nsc = StandardScaler()\nX[:, 3:6] = sc.fit_transform(X[:, 3:6])\npd.DataFrame(X)\n'

## Measuring the Model Performance

In [225]:
from sklearn.metrics import mean_squared_error
RMSE=(mean_squared_error(y_pred,y_test))**0.5
print('RMSE: ', RMSE)

RMSE:  4.508879190536165


## Training and Testing Predictive Models

In [226]:
NewDataPt=np.array([15, 40, 1020, 70])
NewDataPt=NewDataPt.reshape(1, -1)
EP_Prediction= regressor.predict(NewDataPt)
print('The EP Prediction for the New Data Point = ', EP_Prediction)

The EP Prediction for the New Data Point =  [467.83988642]


# Method 2: Training the SVM Regression Model on the Training Set

In [227]:
from sklearn.metrics import accuracy_score

In [228]:
# Support vector machine
from sklearn.svm import SVR
SVRmodel = SVR(kernel = 'rbf')
SVRmodel.fit(X_train, y_train)

SVR()

## Checking the SVM Model with the Test set

In [229]:
y_pred = SVRmodel.predict(X_test)
pd.DataFrame([y_pred,y_test])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1904,1905,1906,1907,1908,1909,1910,1911,1912,1913
0,455.160011,453.105815,446.119499,458.334396,458.266985,453.458997,446.056208,458.896989,456.351999,458.361275,...,446.431644,446.301358,456.922607,456.026035,454.166115,449.596572,450.515153,457.192443,443.79131,455.931133
1,458.96,463.29,435.27,484.31,473.55,456.3,436.02,488.75,469.75,482.83,...,438.28,432.19,494.87,463.5,449.33,446.4,457.12,476.22,440.29,467.92


## Training and Testing Predictive Models

In [230]:
NewDataPt=np.array([15, 40, 1020, 70])
NewDataPt=NewDataPt.reshape(1, -1)
EP_Prediction= SVRmodel.predict(NewDataPt)
print('The EP Prediction for the New Data Point = ', EP_Prediction)

The EP Prediction for the New Data Point =  [456.02608795]


In [231]:
# print(accuracy_score(X_test, y_pred))