# Gradient Descent - Boston Dataset

1. Code Gradient Descent for N features and come with predictions.
2. Try and test with various combinations of learning rates and number of iterations.
3. Try using Feature Scaling, and see if it helps you in getting better results. 

### Instructions:

1. Use Gradient Descent as a training algorithm and submit results predicted.
2. Files are in csv format, you can use genfromtxt function in numpy to load data from csv file. Similarly you can use savetxt function to save data into a file.
3. Submit a csv file with only predictions for X test data. File name should not have spaces. File should not have any headers and should only have one column i.e. predictions. Also predictions shouldn't be in exponential form. 
4. Your score is based on coefficient of determination.


In [212]:
import numpy as np

In [213]:
training_data = np.genfromtxt("train.csv",delimiter=",")
testing_data = np.genfromtxt("test.csv",delimiter=",")

In [214]:
X = training_data[:,:-1]
Y = training_data[:,-1]
X_test = testing_data

In [215]:
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_transform=sc.fit_transform(X)
X_test_transform = sc.transform(X_test)

In [216]:
# This function finds the new cost after each optimisation.
def cost(points, m):
    rows , features = points.shape
    total_cost = 0
    for i in range(rows):
        mx = 0
        for j in range(features-1):
            x = points[i, j]
            mx += m[j]*x
            
        mx+=m[features-1]
        
        y = points[i, features-1]
        
        
        total_cost += (1/rows)*((y - mx)**2)
    return total_cost

In [217]:
# This function finds the new gradient at each step
def step_gradient(points, learning_rate, m ):
    rows , features = points.shape
    m_slope = [ 0 for i in range(features)]
    
    for i in range(rows):
        mx = 0
        for j in range(features-1):
            x = points[i, j]
            mx += m[j]*x
            
        mx+=m[features-1]
        
        y = points[i, features-1]
        
        for j in range(features-1):
            x=points[i,j]
            m_slope[j] += (-2/rows)* (y - mx)*x
            
        m_slope[features-1] += (-2/rows)* (y - mx)
    
    new_m = [ 0 for i in range(features)]
    for i in range(features):
        new_m[i] = m[i] - learning_rate * m_slope[i]
    
    return new_m

In [218]:
# The Gradient Descent Function
def gd(points, learning_rate, num_iterations):
    rows , features = points.shape
    m = [0 for i in range(features)]       # Intial random value taken as 0
    
    for i in range(num_iterations):
        m = step_gradient(points, learning_rate, m )
#         print(i, " Cost: ", cost(points, m))
    return m

In [219]:
def run(training_data,learning_rate,num_iterations):
    m = gd(training_data, learning_rate, num_iterations)
    intercept = m[-1]
    coeff = m[:-1]
    return intercept,coeff

In [241]:
learning_rate = 0.16043
num_iterations = 5000

intercept,coeff = run(training_data,learning_rate,num_iterations)

print(intercept,coeff)

22.677233263862004 [-0.9380807656720777, 0.74103443456485, 0.01169156972062025, 0.7808737210637124, -2.174557498741871, 2.3542965277166976, 0.12333809739942209, -2.9523235489356594, 2.5329681667540758, -1.702903701014967, -2.251519617368038, 0.5883542859831407, -4.263681547801751]


In [221]:
def predict(final_m, final_c, testing_data):
    y_pred = []
    rows , features = testing_data.shape
    
    for i in range(rows):
        mx=0
        for j in range(features):
            mx+=final_m[j]*testing_data[i][j]
        ans = mx + final_c
        y_pred.append(ans)
    return y_pred

In [222]:
y_predict = predict(coeff,intercept,X_test_transform)

In [223]:
np.savetxt('Predictions.csv',y_predict)

# Fitting the data with SkLearn

In [224]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_transform, Y)
lin_reg.intercept_, lin_reg.coef_

(22.609498680738785,
 array([-1.00007026,  0.74065794,  0.01188043,  0.81805153, -2.17094041,
         2.35394967,  0.12135345, -3.03040197,  2.57076841, -1.73462464,
        -2.24921247,  0.59685962, -4.32352985]))

In [242]:
y_predict = lin_reg.predict(X_test_transform)
