# General Architecture of the learning algorithm

It's time to design a simple algorithm to distinguish cat images from non-cat images.

You will build a Logistic Regression, using a Neural Network mindset. The following Figure explains why **Logistic Regression is actually a very simple Neural Network!**

<img src="LogReg_kiank.png" style="width:650px;height:400px;">

**Mathematical expression of the algorithm**:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

**Key steps**:
In this exercise, you will carry out the following steps: 
    - Initialize the parameters of the model
    - Learn the parameters for the model by minimizing the cost  
    - Use the learned parameters to make predictions (on the test set)
    - Analyse the results and conclude
    


<b>The main steps for building a Neural Network are: </b>
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

<h1> Logistics regression implementation <h1>

# 1. Importing required packages

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
Booston_housing_price_df = pd.read_csv('Breast_cancer.csv')

X = Booston_housing_price_df.drop(["malignant_benign"],axis=1)
y = Booston_housing_price_df["malignant_benign"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)


# 2. Initialize weight and bias with zero

In [3]:
def initialization_weight_bias(dim):
    w = np.zeros((1,dim))*.001
    bias=0
    return w,bias

# 3. Forward propogation and backward propogation


<b>Forward Propagation:</b>
- You get X
- You compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
- You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$


<b>Backward Propagation: </b>

Here are the two formulas you will be using: 

$$ dw =\frac{\partial J}{\partial w} = \frac{1}{m}(A-Y)X^T\tag{7}$$
$$ db = \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$







<b> Regularization: </b>

  Now that we have an understanding of how regularization helps in reducing overfitting, we’ll learn a few different techniques in order to apply regularization in deep learning.

 
L1(i.e Lasso Regression ) and L2(i.e Ridge Regression )are the most common types of regularization. These update the general cost function by adding another term known as the regularization term.

                    Cost function = Loss (say, binary cross entropy) + Regularization term

Due to the addition of this regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.

However, this regularization term differs in L1 and L2.

<b>Ridge Regression (L2):</b> <br>
Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of square of coefficients)<br>
<b>Lasso Regression (L1):</b><br>
Performs L1 regularization, i.e. adds penalty equivalent to absolute value of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of absolute value of coefficients)
</p>

## 3.1  Forward propogation and backward propogation without regularization

In [4]:
def propogate_without_regularization(w,b,X,Y):

    #----Length of input file

    N= X.shape[1]

    #--- forward propogation

    z = np.dot(w, X) +b

    A= 1/(1+np.exp(-z))     

    #--- backward propagtion

    dw = (1/N) * np.dot((A-Y),X.T)

    db = (1/N) * np.sum((A-Y))

    grads={ "dw":dw,

            "db":db }

 
    return grads

## 3.2 Forward propogation and backward propogation with Regularization

In [5]:
def propogate_with_regularization(w,b,X,Y,alpha,regularization):

    #----Length of input file

    N= X.shape[1]

    #--- forward propogation

    z = np.dot(w, X) +b

    A= 1/(1+np.exp(-z)) 

 
    #--- backward propagtion

    dw = (1/N) * ( np.dot((A-Y),X.T) + alpha * w )

    db = (1/N) * np.sum((A-Y))

   

    grads={ "dw":dw,

            "db":db }

  
    return grads


# 4. Prediction

In [6]:
def predict(X, w,b):

    #----Length of input file

    N= X.shape[1]
    Y_prediction = np.zeros((1,N),dtype=np.int)

    z = np.dot(w, X) +b

    predictions= 1/(1+np.exp(-z))

    for i in range(N):

        if  predictions[0,i] >.5:

            

            Y_prediction[0,i] = 1

        else:

            Y_prediction[0,i] = 0

        

    return Y_prediction

# 5. Optimization

## 5.1 : Optimization without regularization

In [7]:
def optimize_without_regularization(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

    N= X.shape[1]

    for i in range(num_iterations):

        grads = propogate_without_regularization(w,b,X,Y)

         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]

  
        w = w - learning_rate * dw

        b = b - learning_rate * db

        z = np.dot(w, X) +b

        A= 1/(1+np.exp(-z)) 

 
        cross_entropy_cost = -(1/N) * np.sum((Y*np.log(A) +(1-Y)*np.log(1-A)))

        if print_cost and i%100 == 0:

            print ("iter={:d}   cost={:f}".format(i, cross_entropy_cost))

 
    return w,b


## 5.2 : Optimization with regularization

In [8]:
def optimize_with_regularization(w, b, X, Y, num_iterations, learning_rate,alpha,regularization, print_cost = False):

    N= X.shape[1]

    for i in range(num_iterations):

        grads = propogate_with_regularization (w,b,X,Y,alpha,regularization)

  
         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]

 
        w = w - learning_rate * dw

        b = b - learning_rate * db

   
        z = np.dot(w, X) +b

        A= 1/(1+np.exp(-z)) 

        cross_entropy_cost = -(1/N) * np.sum((Y*np.log(A) +(1-Y)*np.log(1-A)))

        L2_regularization_cost = np.sum(np.square(w))/(2*N)

        cost = cross_entropy_cost + L2_regularization_cost

  
        if print_cost and i%10 == 0:

            print ("iter={:d}   cost={:f}".format(i, cost))

            

    return w,b


# 6 : Logistics regression model

In [11]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 20, learning_rate = 0.5, alpha=0, regularization="", print_cost = False):

    w, b = initialization_weight_bias(X_train.shape[0])

 

    # Gradient descent (≈ 1 line of code)

    if regularization == "L2":

        w ,b = optimize_with_regularization(w, b, X_train, Y_train, num_iterations, learning_rate,alpha, 

                                            regularization, print_cost)

     

    else:

        w ,b = optimize_without_regularization(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

        

   

    print(w)

    print(b)

    # Predict test/train set examples (≈ 2 lines of code)

    Y_prediction_test  = predict(X_test,w, b, )

    Y_prediction_train = predict(X_train,w, b)

 

   

    # Print train/test Errors

    print("train cross entropy cost: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_train - Y_train)))))

    print("test  cross entropy cost: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_test - Y_test)))))

    d = {"Y_prediction_test": Y_prediction_test, 

         "Y_prediction_train" : Y_prediction_train}

    

    return d


# 7: Run the model

In [13]:
X_train_val = X_train.values.T
X_test_val  = X_test.values.T
y_train_val = y_train.values.T
y_test_val  = y_test.values.T

 

d = model(X_train_val, y_train_val, X_test_val, y_test_val, num_iterations = 400, learning_rate = .0001, 

          alpha =.1,regularization="",

          print_cost = True)

iter=0   cost=4.019401
iter=100   cost=inf
iter=200   cost=0.846546
iter=300   cost=0.881552
[[ 2.62453814e-02  4.49796073e-02  1.57780611e-01  1.13174703e-01
   2.66201517e-04  2.82383346e-05 -2.59760622e-04 -1.22292064e-04
   5.02951898e-04  2.05329709e-04  1.38832835e-04  3.38301547e-03
   3.70426510e-04 -6.03123644e-02  1.98997207e-05  1.97940429e-05
   1.16560968e-05  9.78538404e-06  5.61438228e-05  8.84572518e-06
   2.66739676e-02  5.75536412e-02  1.58307458e-01 -1.30651710e-01
   3.47213334e-04 -5.78467119e-05 -4.38762857e-04 -9.15086837e-05
   7.13489690e-04  2.21703473e-04]]
0.003386123509735405
train cross entropy cost: 0.3144854510165755 
test  cross entropy cost: 0.24779731389167603 


