# 1. Building deep neural network


**Notation  used in this notebook**:
- Superscript $[l]$ denotes a quantity associated with the $l^{th}$ layer. 
    - Example: $a^{[L]}$ is the $L^{th}$ layer activation. $W^{[L]}$ and $b^{[L]}$ are the $L^{th}$ layer parameters.
- Superscript $(i)$ denotes a quantity associated with the $i^{th}$ example. 
    - Example: $x^{(i)}$ is the $i^{th}$ training example.
- Lowerscript $i$ denotes the $i^{th}$ entry of a vector.
    - Example: $a^{[l]}_i$ denotes the $i^{th}$ entry of the $l^{th}$ layer's activations).
    
    
    


**Different steps involved in deep neural network**

- Initialize the parameters for a two-layer network and for an $L$-layer neural network.
- Implement the forward propagation module (shown in purple in the figure below).
     - Complete the LINEAR part of a layer's forward propagation step (resulting in $Z^{[l]}$).
     - We give you the ACTIVATION function (relu/sigmoid).
     - Combine the previous two steps into a new [LINEAR->ACTIVATION] forward function.
     - Stack the [LINEAR->RELU] forward function L-1 time (for layers 1 through L-1) and add a [LINEAR->SIGMOID] at the end (for the final layer $L$). This gives you a new L_model_forward function.
- Compute the loss.
- Implement the backward propagation module (denoted in red in the figure below).
    - Complete the LINEAR part of a layer's backward propagation step.
    - We give you the gradient of the ACTIVATE function (relu_backward/sigmoid_backward) 
    - Combine the previous two steps into a new [LINEAR->ACTIVATION] backward function.
    - Stack [LINEAR->RELU] backward L-1 times and add [LINEAR->SIGMOID] backward in a new L_model_backward function
- Finally update the parameters.

<img src="images/final outline.png" style="width:800px;height:500px;">
<caption><center> **Figure 1**</center></caption><br>


**Note** that for every forward function, there is a corresponding backward function. That is why at every step of your forward module you will be storing some values in a cache. The cached values are useful for computing gradients. In the backpropagation module you will then use the cache to calculate the gradients. This assignment will show you exactly how to carry out each of these steps. 


# 1. Importing required packages

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
Booston_housing_price_df = pd.read_csv('Breast_cancer.csv')

X = Booston_housing_price_df.drop(["malignant_benign"],axis=1)
y = Booston_housing_price_df["malignant_benign"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


# 2. Initialize weight and bias with zero

Here we will use `He Initialization`; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights $W^{[l]}$ of `sqrt(1./layers_dims[l-1])` where He initialization would use `sqrt(2./layers_dims[l-1])`.) 

In [6]:
def initialize_parameters_deep(layer_dims):

    """

    Arguments:

    layer_dims -- python array (list) containing the dimensions of each layer in our network

    

    Returns:

    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":

                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])

                    bl -- bias vector of shape (layer_dims[l], 1)

    """
    np.random.seed(1)

    parameters = {}

    L = len(layer_dims)-1           # number of layers in the network
    for l in range(1, L+1):

        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) / np.sqrt(2 /layer_dims[l-1]) #*0.01

        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

        

        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))

        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))

    return parameters


In [None]:
def initialize_parameters(n_x, n_h, n_y):

    """

    Argument:

    n_x -- size of the input layer

    n_h -- size of the hidden layer

    n_y -- size of the output layer

    

    Returns:

    parameters -- python dictionary containing your parameters:

                    W1 -- weight matrix of shape (n_h, n_x)

                    b1 -- bias vector of shape (n_h, 1)

                    W2 -- weight matrix of shape (n_y, n_h)

                    b2 -- bias vector of shape (n_y, 1)

    """

    

    np.random.seed(1)

    W1 = np.random.randn(n_h, n_x) * 0.01

    b1 = np.zeros(shape=(n_h, 1))

    W2 = np.random.randn(n_y, n_h) * 0.01

    b2 = np.zeros(shape=(n_y, 1))

    assert(W1.shape == (n_h, n_x))

    assert(b1.shape == (n_h, 1))

    assert(W2.shape == (n_y, n_h))

    assert(b2.shape == (n_y, 1))

    

    parameters = {"W1": W1,

                  "b1": b1,

                  "W2": W2,

                  "b2": b2}

    

    return parameters    

In [None]:
def predict(X, y, parameters):

    """

    This function is used to predict the results of a  L-layer neural network.

    

    Arguments:

    X -- data set of examples you would like to label

    parameters -- parameters of the trained model

    

    Returns:

    p -- predictions for the given dataset X

    """

    

    m = X.shape[1]

    n = len(parameters) // 2 # number of layers in the neural network

    p = np.zeros((1,m))

    

    # Forward propagation

    probas, caches = L_model_forward(X, parameters)

 

    

    # convert probas to 0/1 predictions

    for i in range(0, probas.shape[1]):

        if probas[0,i] > 0.5:

            p[0,i] = 1

        else:

            p[0,i] = 0

    

    #print results

    #print ("predictions: " + str(p))

    #print ("true labels: " + str(y))

    print("Accuracy: "  + str(np.sum((p == y)/m)))

        

    return p


The initialization for a deeper L-layer neural network is more complicated because there are many more weight matrices and bias vectors. When completing the `initialize_parameters_deep`, you should make sure that your dimensions match between each layer. Recall that $n^{[l]}$ is the number of units in layer $l$. 

<table style="width:100%">
<tr>
    <td>  </td> 
    <td> **Shape of W** </td> 
    <td> **Shape of b**  </td> 
    <td> **Activation** </td>
    <td> **Shape of Activation** </td> 
<tr>
    
<tr>
    <td> **Layer 1** </td> 
    <td> $(n^{[1]},12288)$ </td> 
    <td> $(n^{[1]},1)$ </td> 
    <td> $Z^{[1]} = W^{[1]}  X + b^{[1]} $ </td> 

    <td> $(n^{[1]},209)$ </td> 
<tr>

<tr>
    <td> **Layer 2** </td> 
    <td> $(n^{[2]}, n^{[1]})$  </td> 
    <td> $(n^{[2]},1)$ </td> 
    <td>$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ </td> 
    <td> $(n^{[2]}, 209)$ </td> 
<tr>

   <tr>
    <td> $\vdots$ </td> 
    <td> $\vdots$  </td> 
    <td> $\vdots$  </td> 
    <td> $\vdots$</td> 
    <td> $\vdots$  </td> 
<tr>

<tr>
    <td> **Layer L-1** </td> 
    <td> $(n^{[L-1]}, n^{[L-2]})$ </td> 
    <td> $(n^{[L-1]}, 1)$  </td> 
    <td>$Z^{[L-1]} =  W^{[L-1]} A^{[L-2]} + b^{[L-1]}$ </td> 
    <td> $(n^{[L-1]}, 209)$ </td> 
<tr>


<tr>
    <td> **Layer L** </td> 
    <td> $(n^{[L]}, n^{[L-1]})$ </td> 
    <td> $(n^{[L]}, 1)$ </td>
    <td> $Z^{[L]} =  W^{[L]} A^{[L-1]} + b^{[L]}$</td>
    <td> $(n^{[L]}, 209)$  </td> 
<tr>

</table>

Remember that when we compute $W X + b$ in python, it carries out broadcasting. For example, if: 

$$ W = \begin{bmatrix}
    j  & k  & l\\
    m  & n & o \\
    p  & q & r 
\end{bmatrix}\;\;\; X = \begin{bmatrix}
    a  & b  & c\\
    d  & e & f \\
    g  & h & i 
\end{bmatrix} \;\;\; b =\begin{bmatrix}
    s  \\
    t  \\
    u
\end{bmatrix}\tag{2}$$

Then $WX + b$ will be:

$$ WX + b = \begin{bmatrix}
    (ja + kd + lg) + s  & (jb + ke + lh) + s  & (jc + kf + li)+ s\\
    (ma + nd + og) + t & (mb + ne + oh) + t & (mc + nf + oi) + t\\
    (pa + qd + rg) + u & (pb + qe + rh) + u & (pc + qf + ri)+ u
\end{bmatrix}\tag{3}  $$

In [None]:
def linear_forward(A, W, b):

    """

    Implement the linear part of a layer's forward propagation.

 

    Arguments:

    A -- activations from previous layer (or input data): (size of previous layer, number of examples)

    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)

    b -- bias vector, numpy array of shape (size of the current layer, 1)

 

    Returns:

    Z -- the input of the activation function, also called pre-activation parameter 

    cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently

    """

    

    ### START CODE HERE ### (≈ 1 line of code)

    Z = np.dot(W, A) + b

    ### END CODE HERE ###

    

    assert(Z.shape == (W.shape[0], A.shape[1]))

    cache = (A, W, b)

    

    return Z, cache


In [None]:
 def linear_activation_forward(A_prev, W, b, activation):

    """

    Implement the forward propagation for the LINEAR->ACTIVATION layer

 

    Arguments:

    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)

    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)

    b -- bias vector, numpy array of shape (size of the current layer, 1)

    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

 

    Returns:

    A -- the output of the activation function, also called the post-activation value 

    cache -- a python dictionary containing "linear_cache" and "activation_cache";

             stored for computing the backward pass efficiently

    """

    

    if activation == "sigmoid":

        Z, linear_cache = linear_forward(A_prev, W, b)

        activation_cache = Z

        A  = 1/(1 + np.exp(-Z))


    elif activation == "relu":

        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".


        Z, linear_cache = linear_forward(A_prev, W, b)

        

        activation_cache = Z

        A  = np.maximum(Z,np.zeros(Z.shape))


    assert (A.shape == (W.shape[0], A_prev.shape[1]))

    cache = (linear_cache, activation_cache)

 

    return A, cache


In [None]:
def L_model_forward(X, parameters):

    """

    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation

    

    Arguments:

    X -- data, numpy array of shape (input size, number of examples)

    parameters -- output of initialize_parameters_deep()

    

    Returns:

    AL -- last post-activation value

    caches -- list of caches containing:

                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)

                the cache of linear_sigmoid_forward() (there is one, indexed L-1)

    """

 

    caches = []

    A = X

    L = len(parameters) // 2                  # number of layers in the neural network

    

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.

    for l in range(1, L):

        A_prev = A 

        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")

        caches.append(cache)

    

    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.

    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")

    caches.append(cache)

    

    assert(AL.shape == (1,X.shape[1]))

            

    return AL, caches

In [None]:
# GRADED FUNCTION: compute_cost

 

def compute_cost(AL, Y):

    """

    Implement the cost function defined by equation (7).

 

    Arguments:

    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)

    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

 

    Returns:

    cost -- cross-entropy cost

    """

    

    m = Y.shape[1]

 

    # Compute loss from aL and y.

    ### START CODE HERE ### (≈ 1 lines of code)

    cost = (-1 / m) * np.sum(np.multiply(Y, np.log(AL)) + np.multiply(1 - Y, np.log(1 - AL)))

    ### END CODE HERE ###

    

    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).

    assert(cost.shape == ())

    

    return cost


In [None]:
# GRADED FUNCTION: linear_backward

 

def linear_backward(dZ, cache):

    """

    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:

    dZ -- Gradient of the cost with respect to the linear output (of current layer l)

    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:

    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev

    dW -- Gradient of the cost with respect to W (current layer l), same shape as W

    db -- Gradient of the cost with respect to b (current layer l), same shape as b

    """

    A_prev, W, b = cache

    m = A_prev.shape[1]

 

    ### START CODE HERE ### (≈ 3 lines of code)

    dW = 1 / m * (np.dot(dZ,A_prev.T))

    db = 1 / m * (np.sum(dZ,axis = 1,keepdims = True))

    dA_prev = np.dot(W.T,dZ)

    ### END CODE HERE ###

    

    assert (dA_prev.shape == A_prev.shape)

    assert (dW.shape == W.shape)

    assert (db.shape == b.shape)

    

    return dA_prev, dW, db

In [None]:
def linear_activation_backward(dA, cache, activation):

    """

    Implement the backward propagation for the LINEAR->ACTIVATION layer.

    

    Arguments:

    dA -- post-activation gradient for current layer l 

    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently

    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    

    Returns:

    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev

    dW -- Gradient of the cost with respect to W (current layer l), same shape as W

    db -- Gradient of the cost with respect to b (current layer l), same shape as b

    """

    linear_cache, activation_cache = cache

    

    if activation == "relu":

        

        Z = activation_cache

        dZ = np.array(dA, copy=True) # just converting dz to a correct object.

    

    # When z <= 0, you should set dz to 0 as well. 

        

        dZ[Z <= 0] = 0

        ### END CODE HERE ###

        

    elif activation == "sigmoid":

        Z = activation_cache

    

        s = 1/(1+np.exp(-Z))

        dZ = dA * s * (1-s)

    

    # Shorten the code

    dA_prev, dW, db = linear_backward(dZ, linear_cache)

    

    return dA_prev, dW, db


In [None]:
def L_model_backward(AL, Y, caches):

    """

    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group

    

    Arguments:

    AL -- probability vector, output of the forward propagation (L_model_forward())

    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)

    caches -- list of caches containing:

                every cache of linear_activation_forward() with "relu" (there are (L-1) or them, indexes from 0 to L-2)

                the cache of linear_activation_forward() with "sigmoid" (there is one, index L-1)

    

    Returns:

    grads -- A dictionary with the gradients

             grads["dA" + str(l)] = ... 

             grads["dW" + str(l)] = ...

             grads["db" + str(l)] = ... 

    """

    grads = {}

    L = len(caches) # the number of layers

    m = AL.shape[1]

    Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL

    

    # Initializing the backpropagation

    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))

    

    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]

    current_cache = caches[L-1]

    grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid")

    

    for l in reversed(range(L-1)):

        # lth layer: (RELU -> LINEAR) gradients.

        current_cache = caches[l]

        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, activation = "relu")

        grads["dA" + str(l + 1)] = dA_prev_temp

        grads["dW" + str(l + 1)] = dW_temp

        grads["db" + str(l + 1)] = db_temp

 

    return grads


In [None]:
# GRADED FUNCTION: update_parameters

 

def update_parameters(parameters, grads, learning_rate):

    """

    Update parameters using gradient descent

    

    Arguments:

    parameters -- python dictionary containing your parameters 

    grads -- python dictionary containing your gradients, output of L_model_backward

    

    Returns:

    parameters -- python dictionary containing your updated parameters 

                  parameters["W" + str(l)] = ... 

                  parameters["b" + str(l)] = ...

    """

    

    L = len(parameters) // 2 # number of layers in the neural network

 

    # Update rule for each parameter. Use a for loop.

   ### START CODE HERE ### (≈ 3 lines of code)

    for l in range(L):

        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l + 1)]

        parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]

    ### END CODE HERE ###

    return parameters


In [None]:
# GRADED FUNCTION: two_layer_model

 

def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):

    """

    Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.

    

    Arguments:

    X -- input data, of shape (n_x, number of examples)

    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)

    layers_dims -- dimensions of the layers (n_x, n_h, n_y)

    num_iterations -- number of iterations of the optimization loop

    learning_rate -- learning rate of the gradient descent update rule

    print_cost -- If set to True, this will print the cost every 100 iterations 

    

    Returns:

    parameters -- a dictionary containing W1, W2, b1, and b2

    """

    

    np.random.seed(1)

    grads = {}

    costs = []                              # to keep track of the cost

    m = X.shape[1]                           # number of examples

    (n_x, n_h, n_y) = layers_dims

    

    # Initialize parameters dictionary, by calling one of the functions you'd previously implemented

    ### START CODE HERE ### (≈ 1 line of code)

    parameters = initialize_parameters(n_x, n_h, n_y)

    ### END CODE HERE ###

    

    # Get W1, b1, W2 and b2 from the dictionary parameters.

    W1 = parameters["W1"]

    b1 = parameters["b1"]

    W2 = parameters["W2"]

    b2 = parameters["b2"]

    

    # Loop (gradient descent)

 

    for i in range(0, num_iterations):

 

        # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".

        ### START CODE HERE ### (≈ 2 lines of code)

        A1, cache1 = linear_activation_forward(X, W1, b1, 'relu')

        A2, cache2 = linear_activation_forward(A1, W2, b2, 'sigmoid')

        ### END CODE HERE ###

        

        # Compute cost

        ### START CODE HERE ### (≈ 1 line of code)

        cost = compute_cost(A2, Y)

        ### END CODE HERE ###

        

        # Initializing backward propagation

        dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))

        

        # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".

        ### START CODE HERE ### (≈ 2 lines of code)

        dA1, dW2, db2 = linear_activation_backward(dA2, cache2, 'sigmoid')

        dA0, dW1, db1 = linear_activation_backward(dA1, cache1, 'relu')

        ### END CODE HERE ###

        

        # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2

        grads['dW1'] = dW1

        grads['db1'] = db1

        grads['dW2'] = dW2

        grads['db2'] = db2

        

        # Update parameters.

        ### START CODE HERE ### (approx. 1 line of code)

        parameters = update_parameters(parameters, grads, learning_rate)

        ### END CODE HERE ###

 

        # Retrieve W1, b1, W2, b2 from parameters

        W1 = parameters["W1"]

        b1 = parameters["b1"]

        W2 = parameters["W2"]

        b2 = parameters["b2"]

        

        # Print the cost every 100 training example

        if print_cost and i % 100 == 0:

            print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))

        if print_cost and i % 100 == 0:

            costs.append(cost)

       

    # plot the cost

 

    plt.plot(np.squeeze(costs))

    plt.ylabel('cost')

    plt.xlabel('iterations (per tens)')

    plt.title("Learning rate =" + str(learning_rate))

    plt.show()

    

    return parameters


In [None]:
# GRADED FUNCTION: L_layer_model

 

def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009

    """

    Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.

    

    Arguments:

    X -- data, numpy array of shape (number of examples, num_px * num_px * 3)

    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)

    layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).

    learning_rate -- learning rate of the gradient descent update rule

    num_iterations -- number of iterations of the optimization loop

    print_cost -- if True, it prints the cost every 100 steps

    

    Returns:

    parameters -- parameters learnt by the model. They can then be used to predict.

    """

 

    np.random.seed(1)

    costs = []                         # keep track of cost

    

    # Parameters initialization.

    ### START CODE HERE ###

    parameters = initialize_parameters_deep(layers_dims)

    ### END CODE HERE ###

    

    # Loop (gradient descent)

    for i in range(0, num_iterations):

 

        # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.

        ### START CODE HERE ### (≈ 1 line of code)

        AL, caches = L_model_forward(X, parameters)

        ### END CODE HERE ###

        

        # Compute cost.

        ### START CODE HERE ### (≈ 1 line of code)

        cost = compute_cost(AL, Y)

        ### END CODE HERE ###

    

        # Backward propagation.

        ### START CODE HERE ### (≈ 1 line of code)

        grads = L_model_backward(AL, Y, caches)

        ### END CODE HERE ###


        # Update parameters.

        ### START CODE HERE ### (≈ 1 line of code)

        parameters = update_parameters(parameters, grads, learning_rate)

        ### END CODE HERE ###

                

        # Print the cost every 100 training example

        if print_cost and i % 100 == 0:

            print ("Cost after iteration %i: %f" %(i, cost))

        if print_cost and i % 100 == 0:

            costs.append(cost)

            

    # plot the cost

    plt.plot(np.squeeze(costs))

    plt.ylabel('cost')

    plt.xlabel('iterations (per tens)')

    plt.title("Learning rate =" + str(learning_rate))

    plt.show()

    

    return parameters

In [None]:
def load_dataset():

    train_dataset = h5py.File('C:\\Users\\sujit.koley\\Desktop\\Regression\\Dataset\\train_catvnoncat.h5', "r")

    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features

    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

 

    test_dataset = h5py.File('C:\\Users\\sujit.koley\\Desktop\\Regression\\Dataset\\test_catvnoncat.h5', "r")

    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features

    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

 

    train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T

    test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T

    

    train_set_y_orig = train_set_y_orig.reshape(1,train_set_y_orig.shape[0])

    test_set_y_orig  = test_set_y_orig.reshape(1,test_set_y_orig.shape[0])

 

    train_set_x = train_set_x_flatten/255

    test_set_x  = test_set_x_flatten/255

    

    return train_set_x , train_set_y_orig , test_set_x , test_set_y_orig

In [None]:
### CONSTANTS DEFINING THE MODEL ####

n_x = 12288     # num_px * num_px * 3

n_h = 7

n_y = 1

layers_dims = (n_x, n_h, n_y)


In [None]:
train_x , train_y , test_x , test_y = load_dataset()

parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)



In [None]:
predictions_train = predict(train_x, train_y, parameters)



In [None]:
predictions_test = predict(test_x, test_y, parameters)

In [None]:
layers_dims = [12288, 20, 7, 5, 1] #  4-layer model

In [None]:
parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)


In [None]:
pred_train = predict(train_x, train_y, parameters)

In [None]:
pred_test = predict(test_x, test_y, parameters)


# 3. Forward propogation and backward propogation

## 3.1  Forward propogation and backward propogation without regularization

In [47]:
def propogate_without_regularization(w,b,X,Y):

    #----Length of input file

    N= X.shape[1]

    #--- forward propogation

    z = np.dot(w, X) +b

    A= 1/(1+np.exp(-z))     

    #--- backward propagtion

    dw = (1/N) * np.dot((A-Y),X.T)

    db = (1/N) * np.sum((A-Y))

    grads={ "dw":dw,

            "db":db }

 
    return grads

## 3.2 Forward propogation and backward propogation with Regularization

In [48]:
def propogate_with_regularization(w,b,X,Y,alpha,regularization):

    #----Length of input file

    N= X.shape[1]

    #--- forward propogation

    z = np.dot(w, X) +b

    A= 1/(1+np.exp(-z)) 

 
    #--- backward propagtion

    dw = (1/N) * ( np.dot((A-Y),X.T) + alpha * w )

    db = (1/N) * np.sum((A-Y))

   

    grads={ "dw":dw,

            "db":db }

  
    return grads


# 4. Prediction

In [49]:
def predict(X, w,b):

    #----Length of input file

    N= X.shape[1]
    Y_prediction = np.zeros((1,N),dtype=np.int)

    z = np.dot(w, X) +b

    predictions= 1/(1+np.exp(-z))

    for i in range(N):

        if  predictions[0,i] >.5:

            

            Y_prediction[0,i] = 1

        else:

            Y_prediction[0,i] = 0

        

    return Y_prediction

# 5. Optimization

## 5.1 : Optimization without regularization

In [50]:
def optimize_without_regularization(w, b, X, Y, num_iterations, learning_rate, print_cost = False):

    N= X.shape[1]

    for i in range(num_iterations):

        grads = propogate_without_regularization(w,b,X,Y)

         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]

  
        w = w - learning_rate * dw

        b = b - learning_rate * db

        z = np.dot(w, X) +b

        A= 1/(1+np.exp(-z)) 

 
        cross_entropy_cost = -(1/N) * np.sum((Y*np.log(A) +(1-Y)*np.log(1-A)))

        cost = cross_entropy_cost

 
        if print_cost and i%100 == 0:

            print ("iter={:d}   cost={:f}".format(i, cost))

 
    return w,b


## 5.2 : Optimization with regularization

In [56]:
def optimize_with_regularization(w, b, X, Y, num_iterations, learning_rate,alpha,regularization, print_cost = False):

    N= X.shape[1]

    for i in range(num_iterations):

        grads = propogate_with_regularization (w,b,X,Y,alpha,regularization)

  
         # Retrieve derivatives from grads

        dw = grads["dw"]

        db = grads["db"]

 
        w = w - learning_rate * dw

        b = b - learning_rate * db

   
        z = np.dot(w, X) +b

        A= 1/(1+np.exp(-z)) 

        cross_entropy_cost = -(1/N) * np.sum((Y*np.log(A) +(1-Y)*np.log(1-A)))

        L2_regularization_cost = np.sum(np.square(w))/(2*N)

        cost = cross_entropy_cost + L2_regularization_cost

  
        if print_cost and i%10 == 0:

            print ("iter={:d}   cost={:f}".format(i, cost))

            

    return w,b


# 6 : Logistics regression model

In [52]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 20, learning_rate = 0.5, alpha=0, regularization="", print_cost = False):

    w, b = initialization_weight_bias(X_train.shape[0])

 

    # Gradient descent (≈ 1 line of code)

    if regularization == "L2":

        w ,b = optimize_with_regularization(w, b, X_train, Y_train, num_iterations, learning_rate,alpha, 

                                            regularization, print_cost)

     

    else:

        w ,b = optimize_without_regularization(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

        

   

    print(w)

    print(b)

    # Predict test/train set examples (≈ 2 lines of code)

    Y_prediction_test  = predict(X_test,w, b, )

    Y_prediction_train = predict(X_train,w, b)

 

   

    # Print train/test Errors

    print("train RMSE: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_train - Y_train)))))

    print("test  RMSE: {} ".format(np.sqrt(np.mean(np.square(Y_prediction_test - Y_test)))))

    d = {"Y_prediction_test": Y_prediction_test, 

         "Y_prediction_train" : Y_prediction_train}

    

    return d


# 7: Run the model

In [57]:
X_train_val = X_train.values.T
X_test_val  = X_test.values.T
y_train_val = y_train.values.T
y_test_val  = y_test.values.T

 

d = model(X_train_val, y_train_val, X_test_val, y_test_val, num_iterations = 200, learning_rate = .0001, 

          alpha =.1,regularization="L2",

          print_cost = True)

iter=0   cost=4.540692
iter=10   cost=0.687406
iter=20   cost=2.518271
iter=30   cost=inf
iter=40   cost=17.946375
iter=50   cost=17.571959
iter=60   cost=15.026559
iter=70   cost=7.003164
iter=80   cost=13.300819
iter=90   cost=8.590800
iter=100   cost=10.290347
iter=110   cost=7.999504
iter=120   cost=8.621208
iter=130   cost=5.988840
iter=140   cost=6.782729
iter=150   cost=4.781518
iter=160   cost=3.772297
iter=170   cost=2.708169
iter=180   cost=2.334029
iter=190   cost=1.485440
[[ 2.26846810e-02  4.20230304e-02  1.38000898e-01  1.46548969e-01
   2.36418253e-04  6.44486358e-05 -1.63368356e-04 -8.19616193e-05
   4.40294027e-04  1.78969705e-04  1.20039879e-04  3.23187711e-03
   6.05852361e-04 -4.78462979e-02  1.98396917e-05  2.74333563e-05
   2.51214391e-05  1.34834221e-05  5.20489687e-05  8.80253673e-06
   2.21741313e-02  5.31963606e-02  1.34140485e-01 -1.46447524e-01
   3.08376286e-04  4.69913842e-05 -2.47260070e-04 -4.88081920e-05
   6.19371726e-04  1.99185340e-04]]
0.00286437039

