### Gradient Descent - Combined Cycle Power Plant

Combined Cycle Power Plant dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. 
Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.

You are given:
1. A Readme file for more details on dataset. 
2. A Training dataset csv file with X train and Y train data
3. A X test File and you have to predict and submit predictions for this file.

Your task is to:
1. Code Gradient Descent for N features and come with predictions.
2. Try and test with various combinations of learning rates and number of iterations.
3. Try using Feature Scaling, and see if it helps you in getting better results. 

Instructions
 
1. Use Gradient Descent as a training algorithm and submit results predicted.
2. Files are in csv format, you can use genfromtxt function in numpy to load data from csv file. Similarly you can use savetxt function to save data into a file.
3. Submit a csv file with only predictions for X test data. File should not have any headers and should only have one column i.e. predictions. Also predictions shouldn't be in exponential form.
4. Your score is based on coefficient of determination. So it can be possible that nobody gets full score.


In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [7]:
def step_gradient(points, learning_rate, m , c):
    m_slope = 0
    c_slope = 0
    M = len(points)
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        m_slope += (-2/M)* (y - m * x - c)*x
        c_slope += (-2/M)* (y - m * x - c)
    new_m = m - learning_rate * m_slope
    new_c = c - learning_rate * c_slope
    return new_m, new_c

In [8]:
# This function finds the new cost after each optimisation.
def cost(points, m, c):
    total_cost = 0
    M = len(points)
    for i in range(M):
        x = points[i, 0]
        y = points[i, 1]
        total_cost += (1/M)*((y - m*x - c)**2)
    return total_cost

In [9]:
def gd(points, learning_rate, num_iterations):
    m = 0       # Intial random value taken as 0
    c = 0       # Intial random value taken as 0
    for i in range(num_iterations):
        m, c = step_gradient(points, learning_rate, m , c)
        print(i, " Cost: ", cost(points, m, c))
    return m, c

In [17]:

test_data=pd.read_csv('test_ccpp_x_test.csv' ,delimiter=',')

In [18]:
train_data=pd.read_csv('training_ccpp_x_y_train.csv' ,delimiter=',')

In [19]:
def run():
    data=np.loadtxt('training_ccpp_x_y_train.csv',delimiter=',')
    learning_rate=0.0000000000001
    num_iterations=131
    m,c=gd(data,learning_rate,num_iterations)

In [20]:
run()

0  Cost:  3109.7808856633633
1  Cost:  3109.7808851366403
2  Cost:  3109.7808846099338
3  Cost:  3109.780884083208
4  Cost:  3109.780883556488
5  Cost:  3109.780883029777
6  Cost:  3109.7808825030497
7  Cost:  3109.7808819763245
8  Cost:  3109.780881449606
9  Cost:  3109.780880922901
10  Cost:  3109.7808803961757
11  Cost:  3109.780879869477
12  Cost:  3109.780879342746
13  Cost:  3109.780878816011
14  Cost:  3109.780878289304
15  Cost:  3109.7808777625955
16  Cost:  3109.780877235873
17  Cost:  3109.780876709155
18  Cost:  3109.780876182457
19  Cost:  3109.7808756557324
20  Cost:  3109.780875129
21  Cost:  3109.7808746022783
22  Cost:  3109.7808740755404
23  Cost:  3109.7808735488593
24  Cost:  3109.78087302214
25  Cost:  3109.780872495417
26  Cost:  3109.780871968698
27  Cost:  3109.7808714419775
28  Cost:  3109.780870915258
29  Cost:  3109.7808703885444
30  Cost:  3109.7808698618373
31  Cost:  3109.780869335109
32  Cost:  3109.7808688083887
33  Cost:  3109.7808682816612
34  Cost:  3

In [21]:
x = train_data.iloc[0:,:-1]
y = train_data.iloc[0:,-1]

In [22]:
from sklearn.ensemble import GradientBoostingRegressor
algo=GradientBoostingRegressor()
algo.fit(x,y)

GradientBoostingRegressor()

In [23]:
y_pred=algo.predict(test_data)

Feature names unseen at fit time:
- 1017.58000000
- 11.95000000
- 42.03000000
- 90.89000000
Feature names seen at fit time, yet now missing:
-  AP
-  RH
-  V
- # T



In [24]:
y_pred

array([472.66297342, 435.27765416, 457.97819249, ..., 438.71921598,
       452.8264231 , 445.09956345])

In [25]:
algo.score(test_data,y_pred)
# algo.score(x,y)

Feature names unseen at fit time:
- 1017.58000000
- 11.95000000
- 42.03000000
- 90.89000000
Feature names seen at fit time, yet now missing:
-  AP
-  RH
-  V
- # T



1.0

In [26]:
np.savetxt("finalPredictions.csv",y_pred,delimiter=',', fmt='%.8f')