## Power Plant Energy Output Prediction

## Dataset Information

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant

## Attributes Information

Features consist of hourly average ambient variables
- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Net hourly electrical energy output (EP) 420.26-495.76 MW

## Objective

Our objective is to predict the net hourly electrical energy output (EP) given the Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vaccum (V) as inputs

## Approach

We shall fit different regression models on the data and choose the one which gives the highest accuracy on the test data

---------------------------------------------------------------

## Required Libraries

In [108]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn import linear_model
from sklearn import svm
from sklearn import tree

from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

## Data Processing

In [109]:
dataset = pd.read_csv('powerplantdata.csv')
dataset.head()

Unnamed: 0,AT,V,AP,RH,PE
0,8.34,40.77,1010.84,90.01,480.48
1,23.64,58.49,1011.4,74.2,445.75
2,29.74,56.9,1007.15,41.91,438.76
3,19.07,49.69,1007.22,76.79,453.09
4,11.8,40.66,1017.13,97.2,464.43


Out of the 5 columns above, the first 4 columns will be the features and the last column (PE) will be the target

In [110]:
# splitting the dataset into features and target
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [111]:
# splitting features and target into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2, random_state = 0)

## Regression Models

### 1. Ordinary Least Squares

In [112]:
def ols(X_train, y_train, X_test):
    
    #fitting model on data
    ols_regressor = linear_model.LinearRegression()
    ols_regressor.fit(X_train, y_train)

    #predicting target for test set features
    y_predicted = ols_regressor.predict(X_test)

    # model performance
    ols_r2 = r2_score(y_test, y_predicted)

    # printing  metrics results
    print('R2 score of OLS: {}'.format(ols_r2))

## 2. Ridge Regression

In [113]:
def ridge_regression(X_train, y_train, X_test):
    
    #fitting model on data
    ridge_regressor = linear_model.Ridge(alpha = .5)
    ridge_regressor.fit(X_train, y_train)

    # predicting target for test set features
    y_predicted = ridge_regressor.predict(X_test)

    # model performance
    ridge_r2 = r2_score(y_test, y_predicted)

    # printing  metrics results
    print('R2 score of Ridge Regression: {}'.format(ridge_r2))

## 3. Lasso Regression

In [114]:
def lasso_regression(X_train, y_train, X_test):
    
    lasso_regressor = linear_model.Lasso(alpha = 0.1)
    lasso_regressor.fit(X_train, y_train)

    y_predicted = lasso_regressor.predict(X_test)

    lasso_r2 = r2_score(y_test, y_predicted)

    # printing  metrics results
    print('R2 score of Lasso Regression: {}'.format(lasso_r2))

## 4. Support Vector Regression

In [115]:
def svr_regression(X_train, y_train, X_test):
    
    svr_regressor = svm.SVR()
    svr_regressor.fit(X_train, y_train)
    
    y_predicted = svr_regressor.predict(X_test)
    
    svr_r2 = r2_score(y_test, y_predicted)
    
    # printing  metrics results
    print('R2 score of SVR: {}'.format(svr_r2))

## 5. Decision Tree Regression


In [116]:
def decision_tree_regression(X_train, y_train, X_test):
    
    decision_tree_regressor = tree.DecisionTreeRegressor()
    decision_tree_regressor.fit(X_train, y_train)
    
    decision_tree_regressor.predict(X_test)
    
    dtr_r2 = r2_score(y_test, y_predicted)
    
    print('R2 score of Decision Tree Regression: {}'.format(dtr_r2))

## Main Block

In [117]:
def main():
    
    ols(X_train, y_train, X_test)
    ridge_regression(X_train, y_train, X_test)
    lasso_regression(X_train, y_train, X_test)
    svr_regression(X_train, y_train, X_test)
    decision_tree_regression(X_train, y_train, X_test)

In [118]:
main()

R2 score of OLS: 0.9298994694436788
R2 score of Ridge Regression: 0.9298994821517521
R2 score of Lasso Regression: 0.9299169910729692
R2 score of SVR: 0.3888654448577421
R2 score of Decision Tree Regression: 0.9298261180304993
