

# Implementation of linear regression using gradient descent


In [9]:
import numpy as np

class linear_regression:
    def __init__(self, learning_rate, iterations, 
               fit_intercept=True, normalize=False, coef=None):
        self.fit_intercept = fit_intercept
        self.normalize = normalize
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.coef = coef
        self.cost =0
  
    #Normalizing the X values by subracting it with the mean and dividing it with standard deviation.
    def normalize_Data(self,X):
        no_of_features=X.shape[1]
        X_normalized=X
    
        Mean=np.zeros(no_of_features)
        Standard_deviation=np.zeros(no_of_features)
        
        X_normalized= X-np.mean(X,axis=0)/np.std(X,axis=0)
        return X_normalized 

 
    def fit(self, X, y):
       
    #If we put normalize as true then only it will normalize the X values 
        if self.normalize:
            X=self.normalize_Data(X)
            
    #Taking into account the number of columns and rows of the data and also the lenght of them.
        No_of_columns=X.shape[1]
        no_of_rows=X.shape[0]
        length_of_X= len(X)
        length_of_y= len(y)
       
    #If we put intercept to true then add one's column wise; c_ adds values column wise 
    #if we put intercept is false then the X remains the same
        if self.fit_intercept:
            Weights_dimension=No_of_columns + 1
            Modified_X= np.c_[np.ones((length_of_X,1)),X]
        else:
            Weights_dimension=No_of_columns
            Modified_X = X
            
        #M is the weight vector. We are initializing the weight vector by taking zeros. We can also take random values to intialize it.
        self.M=np.zeros(Weights_dimension)
        X_T=np.transpose(Modified_X)
        self.cost=0
       
        
        for i in range(self.iterations):
            #y_hat is the y we predicted which can be obtained by multipying our X and M which is our weight vector
            y_hat=np.dot(Modified_X,self.M)
            error_vector= np.dot(X_T,y_hat-y)
            #implementing the actual formulae weight=weight-(1/number of rows)*learning rate*(summ of ypredicted-actualy)*X
            #the error is the summation of ypred-y actual 
            self.M=self.M-(1/no_of_rows)*self.learning_rate*(error_vector)
            self.cost= np.sum((y_hat-y)**2)/2*no_of_rows
        return self.M,self.cost
    
    


    def predict(self, X):
        length_of_X= len(X)
    
        if self.fit_intercept:
            Modified_X= np.c_[np.ones((length_of_X,1)),X]
        else:
            Modified_X = X
            
            #The predicted value 
            prediction=np.dot(X,self.M)
        return prediction

## Problem 1.2 (10 points)

- Split the Boston Housing dataset into train and test sets (70% and 30%, respectively) (5 points). 
- Fit your linear regression implementation using the training set and print your model's coefficients. Make predictions for the test set using your fitted model (5 points).

In [53]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

#loading the boston data set 
dataset = load_boston()
X = dataset.data
y = dataset.target

#Splitting the data set into 70percent train and 30percent test 
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

#Calling the linear regression class which i have designed above
regresser=linear_regression(learning_rate=0.000001,iterations=5000,fit_intercept=False, normalize=True, coef=None)

#fitting the model 
regresser.fit(X_train,y_train)

#predicted values 
y_pred=regresser.predict(X_test)
y_pred

array([22.05022161, 20.52910395, 21.64194652, 24.02955968, 21.74400224,
        6.55804271, 27.64541769, 21.40999715, 22.20333231, 20.92341822,
       22.78100979, 31.07811375, 23.53969798, 24.33393075, 20.79586313,
       21.5151255 , 20.22217616,  6.74661551, 23.42710707, 18.30922938,
       32.16900166, 23.54177321, 22.33651012, 18.7086258 , 21.5246299 ,
       21.77785827, 21.91653255, 22.74537039, 23.0071897 , 22.54171012,
       20.23008668, 21.53231062, 22.65394127, 21.23086582, 23.92653525,
       23.68685044, 21.44488765, 21.00939822, 22.49707866, 23.9833305 ,
       19.80955846, 23.23436628, 23.04043995, 21.67085787, 16.59595134,
       22.02292817, 21.91530143, 22.25810157, 20.91945964, 23.41097142,
       22.42321506, 22.28111958, 17.37279039, 22.65224749, 26.2221944 ,
       22.35096623, 24.59564973, 29.12692301, 22.23082348, 22.85632858,
       22.81963338, 28.09093858, 20.33912184, 21.73392885, 21.32218863,
       20.37833627, 22.96939231, 21.80390328, 22.90076625, 19.98

## Questions

1. How do you interpret that a variable causes a model's mean square error to increase? 
  - Answer:
  So basically we plot the best fit line for our linear regression model and check the number of data points which are close to the best fit line.Generally the term used to check the closeness is Rsquare. As we check the number of points close to the best fit line there may be some data points which are very far away from the line these are nothing but outliers. The outliers are the points which cause our models mean square error to increase. 
  
  
2. Why we would want to normalize our variables? 
  - Answer: 
  In some cases few points in our data will have very high values which will dominate the other data points which will lead to a confused bad model. Instead if we normalize the data we wont be facing this domination issue. Usually we apply log function for normalization
  
  
  
3. A model fitted using the exact same split dataset with normalized values will generate the same coefficients as a model that was fitted using values that haven't been normalized. Clearly state whether that statement is true or false and explain your reasoning. 
  - Answer:
   False
   reason:- if we dont normalize there may be some points which may show domination which may influence the model.So by normalizing we wont have many dominant points.So as we are normalizing the coeffcients will be differnt.On the other side if the data is already normalized then coefficients wont change.