Boston dataset is one of the datasets available in sklearn.
You are given a Training dataset csv file with X train and Y train data. As studied in lecture, your task is to come up with Gradient Descent algorithm and thus predictions for the test dataset given.
Your task is to:
1. Code Gradient Descent for N features and come with predictions.
2. Try and test with various combinations of learning rates and number of iterations.
3. Try using Feature Scaling, and see if it helps you in getting better results. 

### Imports needed

In [1]:
from sklearn import model_selection
from sklearn import preprocessing

import numpy as np

### Gradient descent

In [2]:
def step_gradient(X_train, Y_train, learning_rate, coeff):
    n = len(X_train[0]) # num_features and last is 1 ; last 1 bcz we calculate c(intercept) in this array also
    coefficients = np.zeros(n) # [m1, m2, m3, ... mn, m(n+1)] where m(n+1) is c
    M = len(X_train)
    
    for i in range(M):
        x = X_train[i]
        y = Y_train[i]
        for j in range(n):
            coefficients[j] += (-2/M)*(y - (coeff*x).sum())*x[j]
    new_coeff = coeff - learning_rate*coefficients
    return new_coeff

In [3]:
'''
 2nd Way
def cost(X_train, Y_train, coeff):
    total_cost = 0
    M = len(X_train)
    for i in range(M):
        x = X_train[i]
        y = Y_train[i]
        total_cost += (1/M)*( (y - (coeff*x).sum())**2 )
    return total_cost
'''

def cost(X_train, Y_train, coeff):
    return ((Y_train - np.sum(coeff*X_train, axis = 1))**2).mean()

In [4]:
def gd(X_train, Y_train, learning_rate, num_iterations):
    # append column of 1's in X_train
    ones_col = np.ones(len(X_train)).reshape(-1,1) # reshape bcz we want column of 1's
    X_train = np.append(X_train, ones_col, axis=1)
    
    n = len(X_train[0]) # num_features+1 ; +1 bcz we calculate c(intercept) in this array also
    
    # choose random value for coefficients lets say 0
    coefficients = np.zeros(n) # [m1, m2, m3, ... mn, m(n+1)] where m(n+1) is c
    
    for i in range(num_iterations):
        coefficients = step_gradient(X_train, Y_train, learning_rate, coefficients)
        
        # printing cost after every iteration, so that we can see that after which iteration cost is not decreasing much
        print("After iteration ",i+1, "Cost is:", cost(X_train, Y_train, coefficients))
        
    return coefficients

In [5]:
'''
def predictions(X_test, m, c):
    M = len(X_test)
    y_pred = np.zeros(M)
    for i in range(M):
        x = X_test[i]
        y_pred[i] += ((m*x).sum()+c)
    return y_pred
'''
def predictions(X_test, m, c):
    return (np.sum(m*X_test, axis = 1)+c)


### run function which loads data apply feature scaling on it and calls gradient descent 

In [54]:
def run():
    training_data = np.genfromtxt('boston_traindata.csv', delimiter=',')
    X_train = training_data[:, :-1]
    Y_train = training_data[:, -1]
    
    X_test = np.genfromtxt('boston_testdata.csv', delimiter=',')
    
    
    # Add more features in both X_train and X_test of 2 degree i.e square of each col to make complex boundries in order to reduce cost
    num_col = X_train.shape[1]
    for i in range(num_col):
        ith_col = X_train[:, i]
        new_col = (ith_col*ith_col).reshape(-1,1) # square of each column
        X_train = np.append(X_train, new_col, axis = 1)
        
        ith_test_col = X_test[:, i]
        squared_test_col = (ith_test_col*ith_test_col).reshape(-1, 1)
        X_test = np.append(X_test, squared_test_col, axis=1)
    
    # Appply feature scaling
    scaler = preprocessing.StandardScaler() # create scaler object
    scaler.fit(X_train)
    transformed_X_train = scaler.transform(X_train)
    transformed_X_test = scaler.transform(X_test)
    
    learning_rate = 0.035
    num_iterations = 200
    parameters = gd(transformed_X_train, Y_train, learning_rate, num_iterations)
    m = parameters[:-1]
    c = parameters[-1]
    #print(m, c, sep="\n")
    
    # call prediction
    pred = predictions(transformed_X_test, m, c).reshape(-1,1)
    # Rounding off upto 5 decimal places
    #pred = np.round(pred, decimals=5)
    # Save Predictions
    np.savetxt('predictions2.csv', pred, delimiter=',')
    print(pred)
    

### call run function

In [55]:
run()

After iteration  1 Cost is: 498.9022789927327
After iteration  2 Cost is: 428.19899202751134
After iteration  3 Cost is: 370.5713724040005
After iteration  4 Cost is: 321.7616606315328
After iteration  5 Cost is: 280.0070774429033
After iteration  6 Cost is: 244.17393591277963
After iteration  7 Cost is: 213.37576705890766
After iteration  8 Cost is: 186.87798071258754
After iteration  9 Cost is: 164.061529178858
After iteration  10 Cost is: 144.4014235688864
After iteration  11 Cost is: 127.45088408232984
After iteration  12 Cost is: 112.82873857549718
After iteration  13 Cost is: 100.20914417671267
After iteration  14 Cost is: 89.3131031645834
After iteration  15 Cost is: 79.90140532512179
After iteration  16 Cost is: 71.76872161291999
After iteration  17 Cost is: 64.73863697138505
After iteration  18 Cost is: 58.65945589249208
After iteration  19 Cost is: 53.40064841761528
After iteration  20 Cost is: 48.84983020444732
After iteration  21 Cost is: 44.91019027475349
After iteration  

After iteration  174 Cost is: 16.9367074894935
After iteration  175 Cost is: 16.931835730151246
After iteration  176 Cost is: 16.92701703574453
After iteration  177 Cost is: 16.922250440182417
After iteration  178 Cost is: 16.91753499874433
After iteration  179 Cost is: 16.912869787578497
After iteration  180 Cost is: 16.90825390321258
After iteration  181 Cost is: 16.903686462076067
After iteration  182 Cost is: 16.899166600034263
After iteration  183 Cost is: 16.894693471933472
After iteration  184 Cost is: 16.89026625115716
After iteration  185 Cost is: 16.885884129192835
After iteration  186 Cost is: 16.881546315209313
After iteration  187 Cost is: 16.877252035644133
After iteration  188 Cost is: 16.87300053380101
After iteration  189 Cost is: 16.86879106945676
After iteration  190 Cost is: 16.864622918477817
After iteration  191 Cost is: 16.860495372445897
After iteration  192 Cost is: 16.856407738292578
After iteration  193 Cost is: 16.852359337942744
After iteration  194 Cost is