# General steps of Linera regression


<b>The main steps for Linera Regression are: </b>
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current cost (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

# 1. Importing required packages

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
Booston_housing_price_df = pd.read_csv('Booston_housing_price_prediction.csv')

X = Booston_housing_price_df.drop(["MEDV"],axis=1)
y = Booston_housing_price_df["MEDV"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)





# 2. Initialize weight and bias with zero

In [3]:
def initialization_weight_bias(dim):
    w = np.zeros((1,dim))*.001
    bias=0
    return w,bias

# 3. Forward propogation and backward propogation


<b>Forward Propagation:</b>
- You get X
- You compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
- You calculate the cost function: $J = -\frac{1}{2m}\sum_{i=1}^{m}(y^{(i)}-a^{(i)})^2$


 
 
 
<b>Backward Propagation: </b>

Here are the two formulas you will be using: 

$$ dw =\frac{\partial J}{\partial w} = \frac{1}{m}(A-Y)X^T\tag{7}$$
$$ db = \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$





<b> Regularization: </b>

  Now that we have an understanding of how regularization helps in reducing overfitting, we’ll learn a few different techniques in order to apply regularization in deep learning.

 
L1(i.e Lasso Regression ) and L2(i.e Ridge Regression )are the most common types of regularization. These update the general cost function by adding another term known as the regularization term.

                    Cost function = Loss (say, binary cross entropy) + Regularization term

Due to the addition of this regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.

However, this regularization term differs in L1 and L2.

<b>Ridge Regression (L2):</b> <br>
Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of square of coefficients)<br>
<b>Lasso Regression (L1):</b><br>
Performs L1 regularization, i.e. adds penalty equivalent to absolute value of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of absolute value of coefficients)
</p>

## 3.1  Forward propogation and backward propogation without regularization

In [4]:
def propogate_without_regularization(w,b,X,Y):
   #----Length of input file

    N= X.shape[1]

    #--- forward propogation

    A = np.dot(w,X)+b

    error = Y-A

    #--- backward propagtion

    dw = -(1/N) * np.dot(error,X.T)

    db = -(1/N) * np.sum(error)

    

    grads={ "dw":dw,

            "db":db }
  

    return grads

## 3.2 Forward propogation and backward propogation with Regularization

In [5]:
def propogate_with_regularization(w,b,X,Y,alpha,regularization):

    N= X.shape[1]

    #--- forward propogation

    A = np.dot(w,X)+b

    error = Y-A
    #--- backward propagtion

   

    dw = -(1/N) * ( np.dot(error,X.T) + alpha*w)

    db = -(1/N) * np.sum(error)

    grads={ "dw":dw,

            "db":db }

    
    return grads

# 4. Prediction

In [6]:
def predict(X, w,b):
    predictions = np.dot(w, X) +b
    return predictions

# 5. Optimization

## 5.1 : Optimization without regularization

In [7]:
def optimize_without_regularization(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    N= X.shape[1]
    for i in range(num_iterations):

        grads = propogate_without_regularization(w,b,X,Y)

         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]

        w = w - learning_rate * dw

        b = b - learning_rate * db

        predictions = predict(X, w, b)

        cross_entropy_cost = np.sum(np.square(Y - predictions))/(2*N)

        cost = cross_entropy_cost


        if print_cost and i%10 == 0:

            print ("iter={:d}   cost={:.2}".format(i, cost))

            
    return w,b


## 5.2 : Optimization with regularization

In [8]:
def optimize_with_regularization(w, b, X, Y, num_iterations, learning_rate,alpha,regularization, print_cost = False):
    N= X.shape[1]

    for i in range(num_iterations):
        grads = propogate_with_regularization (w,b,X,Y,alpha,regularization)


         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]


        w = w - learning_rate * dw

        b = b - learning_rate * db

  
        predictions = predict(X, w, b)

        cross_entropy_cost = np.sum(np.square(Y - predictions))/(2*N)

        L2_regularization_cost = np.sum(np.square(w))/(2*N)

        cost = cross_entropy_cost + L2_regularization_cost

   
        if print_cost and i%10 == 0:

            print ("iter={:d}   cost={:.2}".format(i, cost))

  
    return w,b

# 6 : Linear regression model

In [9]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 20, learning_rate = 0.5, alpha=0, regularization="", print_cost = False):
    
    w, b = initialization_weight_bias(X_train.shape[0])
    
    # Gradient descent (≈ 1 line of code)

    if regularization == "L2":

        w ,b = optimize_with_regularization(w, b, X_train, Y_train, num_iterations, learning_rate,alpha, 

                                            regularization, print_cost)

     

    else:

        w ,b = optimize_without_regularization(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

  
    print(w)

    print(b)

    # Predict test/train set examples (≈ 2 lines of code)

    Y_prediction_test  = predict(X_test,w, b, )

    Y_prediction_train = predict(X_train,w, b)

    # Print train/test Errors

    print("train RMSE: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_train - Y_train)))))

    print("test  RMSE: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_test - Y_test)))))

 
    d = {"Y_prediction_test": Y_prediction_test, 

         "Y_prediction_train" : Y_prediction_train}

    

    return d


# 7: Run the model

In [10]:
X_train_val = X_train.values.T
X_test_val  = X_test.values.T
y_train_val = y_train.values.T
y_test_val  = y_test.values.T


d = model(X_train_val, y_train_val, X_test_val, y_test_val, num_iterations = 210, learning_rate = .00001, 
          alpha =.1,regularization=" ",print_cost = True)

iter=0   cost=1.2e+03
iter=10   cost=4.9e+09
iter=20   cost=2.2e+16
iter=30   cost=9.9e+22
iter=40   cost=4.4e+29
iter=50   cost=2e+36
iter=60   cost=8.8e+42
iter=70   cost=4e+49
iter=80   cost=1.8e+56
iter=90   cost=7.9e+62
iter=100   cost=3.6e+69
iter=110   cost=1.6e+76
iter=120   cost=7.1e+82
iter=130   cost=3.2e+89
iter=140   cost=1.4e+96
iter=150   cost=6.4e+102
iter=160   cost=2.9e+109
iter=170   cost=1.3e+116
iter=180   cost=5.8e+122
iter=190   cost=2.6e+129
iter=200   cost=1.2e+136
[[-1.94145047e+66 -4.81324803e+66 -5.65462641e+66 -3.68590721e+64
  -2.65509505e+65 -2.91764745e+66 -3.31911295e+67 -1.64793732e+66
  -5.09945627e+66 -2.04944492e+68 -8.57533844e+66 -1.66523988e+68
  -6.11015347e+66]]
-4.642257691368154e+65
train RMSE: 1.4960687199328814e+71 
test  RMSE: 1.4813005798811312e+71 
