# Power Plant Energy Output Prediction Model

## Project Description

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the plant was set to work with full load.
Data Set Information:

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.
A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance.

Features consist of hourly average ambient variables
- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Net hourly electrical energy output (EP) 420.26-495.76 MW

The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.<br>

You can access all the recorded data in  "Power Plant Data.csv"

Source: https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant 

Objective: Build a model to predict the energy output, based on Temperature, Pressure, Humidity, and exhist vacume. 

## Importing the Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the Dataset

In [2]:
dataset = pd.read_csv('Power Plant Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [3]:
y

array([463.26, 444.37, 488.56, ..., 429.57, 435.74, 453.28])

##### A Quick Review of the dataset

In [4]:
dataset

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.40,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.50,1009.23,96.62,473.90
...,...,...,...,...,...
9563,16.65,49.69,1014.01,91.00,460.03
9564,13.19,39.18,1023.67,66.78,469.62
9565,31.32,74.33,1012.92,36.48,429.57
9566,24.48,69.45,1013.86,62.39,435.74


## Splitting the Dataset into the Training set and Test set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Multiple Linear Regression

### Step 1. Training the Model

In [6]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression()

### Step 2. Predicting the Test set results

In [7]:
y_pred = regressor.predict(X_test)

In [8]:
# Let's make a few random check:
print("The Prediction Energy=",y_pred[13] )
print("The Actual Energy=",y_test[13] )

The Prediction Energy= 440.84294317591593
The Actual Energy= 440.74


### Step 3. Evaluating the Model Performance

In [9]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.9325315554761303

## Using the Model to Predict New Input

In [10]:
# Assume an arbitrary Temperature (C), Exhaust Vacuum (cm Hg), Ambient Pressure (milibar), and Relative Humidity (%) 
X_New=[[15, 40, 1000, 75]]
y_pred_New = regressor.predict(X_New)
print('You would get this much energy under this condition', y_pred_New , 'MW')

You would get this much energy under this condition [465.80771895] MW
