# <font color='black'>Deep Neural Networks</font>



## Set-up

Firstly you will import all the packages used through the notebook.  

In [1]:
                                                                                from google.colab import drive
drive.mount('/content/gdrive/')
import sys
sys.path.append('/content/gdrive/My Drive/0IPA/Ma512/TP/TP2/') # The location of the .ipynb file.  
import lr_utils
from utils import *

Mounted at /content/gdrive/


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import h5py

%matplotlib inline
%load_ext autoreload
%autoreload 2

np.random.seed(3)

## Initialization

Start by reading the following function. It allows to initialize the parameters of a deep neural network. The number of units in the different layers are passed as argument with `layer_dims`.


In [3]:
def initialization(layer_dims):
               
    np.random.seed(4870)
    parameters = {}
    L = len(layer_dims) - 1 # integer representing the number of layers
    
    for l in range(1, L + 1):
        ### He's initialization.
        parameters['W' + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1]) * np.sqrt(4./(layer_dims[l-1]+layer_dims[l]))
        parameters['b' + str(l)] = np.zeros((layer_dims[l],1))
        ###

    assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
    assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
    
    return parameters


## Forward propagation

The forward propagation has been split in different steps. Firstly, the linear forward module computes the following equations:

$$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}\tag{4}$$

where $A^{[0]} = X$. 

Define a function to compute $Z^{[l]}$

In [4]:
def linear_forward(A, W, b):

    ### START CODE HERE ### (≈ 1 lines of code)
    Z = np.dot(W, A) + b
    ### END CODE HERE ###

    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    
    return Z, cache

In [5]:
A, W, b = linear_forward_test()

Z, linear_cache = linear_forward(A, W, b)
print("Z = " + str(Z))
print("linear_cache = " + str(linear_cache))

Z = [[-0.67356113  0.67062057]]
linear_cache = (array([[ 1.78862847,  0.43650985],
       [ 0.09649747, -1.8634927 ],
       [-0.2773882 , -0.35475898]]), array([[-0.08274148, -0.62700068, -0.04381817]]), array([[-0.47721803]]))


**Expected output**:

<table style="width:35%">
  
  <tr>
    <td> Z= </td>
    <td> [[ -0.67356113 0.67062057]] </td> 
  </tr>
  
</table>

### Activation Fcuntions

In the first notebook you implemented the sigmoid function:

- **Sigmoid**: $\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$.

In this notebook, you will need to implement the ReLU activation defined as:

- **ReLU**: $A = RELU(Z) = max(0, Z)$. 

Complete the function below that computes the ReLU an activation fucntion.

In [6]:
def relu(Z):

    ### START CODE HERE ###
    A = np.maximum(0,Z)
    assert(A.shape == Z.shape)
    ### END CODE HERE ###
    cache = Z 
    return A, cache

def sigmoid(Z):

    cache=Z
    ### START CODE HERE ###
    A = 1/(1+np.exp(-Z))
    ### END CODE HERE ###
    return A, cache

You have implemented a function that determines the linear foward step. You will now combine the output of this function with either a sigmoid() or a relu() activation function. 

In [7]:
def forward_one(A_prev, W, b, activation):
    Z, linear_cache = linear_forward(A_prev, W, b)
    if activation == "sigmoid":
        A, activation_cache = sigmoid(Z)
    elif activation == "relu":
        A, activation_cache = relu(Z)
    cache = (linear_cache, activation_cache)
    
    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)
    
    return A, cache

### Forward propagation model

The structure you will implement in this exercise consists on $L-1$ layers using a ReLU activation function and a last layer using a sigmoid.
Implement the forward propagation of the above model.

In [8]:
def forward_all(X, parameters):

    caches = []
    A = X
    L = len(parameters) // 2                
    
    for l in range(1, L):
        A_prev = A 
        ### START CODE HERE ###
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
        ### END CODE HERE ###
      
        caches.append(cache)   #
    ### START CODE HERE ###
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
    ### END CODE HERE ###
    caches.append(cache)
    
    assert(AL.shape == (1,X.shape[1]))
            
    return AL, caches

###  Cost function

You will now compute the cross-entropy cost $J$, for all the training set using the following formula: $$-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) \tag{7}$$


In [9]:
def cost_function(AL, Y):
    
    m = Y.shape[1]

    ### START CODE HERE ###
    XX = np.multiply(np.log(AL),Y) +  np.multiply(np.log(1-AL), (1-Y))
    cost = (-1/m)*np.sum(XX)
    ### END CODE HERE ###
    cost = np.squeeze(cost)      #  Eliminates useless dimensionality for the variable cost.
    
    return cost

In [10]:
Y, AL = compute_cost()
print("cost = " + str(cost_function(AL, Y)))

cost = 0.2797765635793422


<table>
    <tr>
    <td>**cost** </td>
    <td> 0.2797765635793422</td> 
    </tr>
</table>

##  Backpropagation 

You will now implement the functions that will help you compute the gradient of the loss function with respect to the different parameters.

To move backward in the computational graph you need to apply the chain rule.

### Linear backward

For each layer $l$, the linear part is: $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ (followed by an activation).

Suppose you have already calculated the derivative $dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}$. You want to get $(dW^{[l]}, db^{[l]}, dA^{[l-1]})$.


The three outputs $(dW^{[l]}, db^{[l]}, dA^{[l-1]})$ are computed using the input $dZ^{[l]}$. The formulas you saw in class are:
$$ dW^{[l]} = \frac{\partial \mathcal{J} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} \tag{8}$$
$$ db^{[l]} = \frac{\partial \mathcal{J} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}\tag{9}$$
$$ dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} \tag{10}$$


In [11]:
def linear_backward(dZ, cache):
    A_prev, W, b = cache
    m = A_prev.shape[1]

    ### START CODE HERE ###  (≈ 3 lines of code)
    dW = (1./m)*(np.dot(dZ, A_prev.T))
    db = (1./m)*(np.sum(dZ, axis = 1, keepdims=True))
    dA_prev = np.dot(W.T, dZ)
    ### END CODE HERE ###
    
    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    
    return dA_prev, dW, db

In [12]:
# Set up some test inputs
dZ, linear_cache = linear_backward_test()

dA_prev, dW, db = linear_backward(dZ, linear_cache)
print ("dA_prev = "+ str(dA_prev))
print ("dW = " + str(dW))
print ("db = " + str(db))

dA_prev = [[ 1.62477986e-01  2.08119187e+00 -1.34890293e+00 -8.08822550e-01]
 [ 1.25651742e-02 -2.21287224e-01 -5.90636554e-01  4.05614891e-03]
 [ 1.98659671e-01  2.39946554e+00 -1.86852905e+00 -9.65910523e-01]
 [ 3.18813678e-01 -9.92645222e-01 -6.57125623e-01 -1.46564901e-01]
 [ 2.48593418e-01 -1.19723579e+00 -4.44132647e-01 -6.09748046e-04]]
dW = [[-1.05705158 -0.98560069 -0.54049797  0.10982291  0.53086144]
 [ 0.71089562  1.01447326 -0.10518156  0.34944625 -0.12867032]
 [ 0.46569162  0.31842359  0.30629837 -0.01104559 -0.19524287]]
db = [[ 0.5722591 ]
 [ 0.04780547]
 [-0.38497696]]


** Expected Output**:
    
```
dA_prev = 
[[  1.62477986e-01   2.08119187e+00  -1.34890293e+00  -8.08822550e-01]
 [  1.25651742e-02  -2.21287224e-01  -5.90636554e-01   4.05614891e-03]
 [  1.98659671e-01   2.39946554e+00  -1.86852905e+00  -9.65910523e-01]
 [  3.18813678e-01  -9.92645222e-01  -6.57125623e-01  -1.46564901e-01]
 [  2.48593418e-01  -1.19723579e+00  -4.44132647e-01  -6.09748046e-04]]
dW = 
[[-1.05705158 -0.98560069 -0.54049797  0.10982291  0.53086144]
 [ 0.71089562  1.01447326 -0.10518156  0.34944625 -0.12867032]
 [ 0.46569162  0.31842359  0.30629837 -0.01104559 -0.19524287]]
db = 
[[ 0.5722591 ]
 [ 0.04780547]
 [-0.38497696]]
```

### Activation Functions

Now you need to write the code that computes the derivatives for the activation functions. You have learned the derivatives for the sigmoid and the ReLU during theory class.
Complete the two function below.

In [13]:
def sigmoid_backward(dA, cache):    
    Z = cache
    ### START CODE HERE ###  (≈ 2 lines of code)
    k = 1/(1+np.exp(-Z))
    dZ=dA*k*(k-1)    # we have : Z=W A+b
    ### END CODE HERE ###
    return dZ

def relu_backward(dA, cache):
    Z = cache 
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    ### START CODE HERE ###  (≈ 1 line of code)
    dZ[Z <= 0] = 0
    assert (dZ.shape == Z.shape)
    ### END CODE HERE ###
    return dZ

### One backpropagation step

Next, you will create a function that implements one step of backpropagation,

In [14]:
def backward_one(dA, cache, activation):
    linear_cache, activation_cache = cache  
    if activation == "sigmoid":
        dZ = sigmoid_backward(dA,activation_cache)
        dA_prev, dW, db = linear_backward(dZ,linear_cache)
    elif activation == "relu":
        dZ = relu_backward(dA,activation_cache)
        dA_prev, dW, db = linear_backward(dZ,linear_cache)
    
    return dA_prev, dW, db

In [15]:
dAL, linear_activation_cache = linear_activation_backward_test()

dA_prev, dW, db = backward_one(dAL, linear_activation_cache, activation = "sigmoid")
print ("sigmoid:")
print ("dA_prev = "+ str(dA_prev))
print ("dW = " + str(dW))
print ("db = " + str(db) + "\n")

dA_prev, dW, db = backward_one(dAL, linear_activation_cache, activation = "relu")
print ("relu:")
print ("dA_prev = "+ str(dA_prev))
print ("dW = " + str(dW))
print ("db = " + str(db))

sigmoid:
dA_prev = [[-0.00401564 -0.0404019 ]
 [ 0.01386864  0.13953419]
 [-0.00747737 -0.07523079]]
dW = [[-0.03615272  0.09887085 -0.03247948]]
db = [[-0.06684355]]

relu:
dA_prev = [[ 0.01679913  0.16610885]
 [-0.05801838 -0.57368247]
 [ 0.031281    0.30930474]]
dW = [[ 0.14820532 -0.40668077  0.13325465]]
db = [[0.27525652]]


**Expected**
sigmoid:
dA_prev = [[ 0.00401564  0.0404019 ]
 [-0.01386864 -0.13953419]
 [ 0.00747737  0.07523079]]
dW = [[ 0.03615272 -0.09887085  0.03247948]]
db = [[0.06684355]]

relu:
dA_prev = [[ 0.01679913  0.16610885]
 [-0.05801838 -0.57368247]
 [ 0.031281    0.30930474]]
dW = [[ 0.14820532 -0.40668077  0.13325465]]
db = [[0.27525652]]

### Backpropagation model

Now you will put all together to compute the backward function for the whole network. 
In the backpropagation step, you will use the variables you stored in cache in the `forward_all` function to compute the gradients. You will iterate from the last layer backwards to layer $1$.

You need to start by computing the derivative of the loss function with respect to $A^{[L]}$. And propagate this gradient backward thourgh all the layers in the network.

You need to save each dA, dW and db in the grads dictionary. 

In [16]:
def backward_all(AL, Y, caches):
    grads = {}
    L = len(caches) 
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) 

    ### START CODE HERE ###  (≈ 1 line of code)
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
    ### END CODE HERE ###


    current_cache = caches[L-1]
    grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = backward_one(dAL, current_cache,activation="sigmoid")
    
    for l in reversed(range(L-1)):
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = backward_one(grads["dA" + str(l + 1)],current_cache,activation="relu")
        grads["dA" + str(l)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp
    return grads

In [17]:
AL, Y_assess, caches = backward_all_test()
grads = backward_all(AL, Y_assess, caches)
print_grads(grads)

dW1 = [[-0.41010002 -0.07807203 -0.13798444 -0.10502167]
 [ 0.          0.          0.          0.        ]
 [-0.05283652 -0.01005865 -0.01777766 -0.0135308 ]]
db1 = [[0.22007063]
 [0.        ]
 [0.02835349]]
dA1 = [[-0.12913162  0.44014127]
 [ 0.14175655 -0.48317296]
 [-0.01663708  0.05670698]]


**Expected Output**

<table style="width:60%">
  <tr>
    <td > dW1 </td> 
           <td > [[ 0.41010002  0.07807203  0.13798444  0.10502167]
 [ 0.          0.          0.          0.        ]
 [ 0.05283652  0.01005865  0.01777766  0.0135308 ]] </td> 
  </tr>  
    <tr>
    <td > db1 </td> 
           <td > [[-0.22007063]
 [ 0.        ]
 [-0.02835349]] </td> 
  </tr>   
  <tr>
  <td > dA1 </td> 
           <td > [[ 0.12913162 -0.44014127]
 [-0.14175655  0.48317296]
 [ 0.01663708 -0.05670698]] </td> 
  </tr> 
</table>



### Gradient Descent

Finally you can update the parameters of the model according: 

$$ W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]} $$
$$ b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]} $$

where $\alpha$ is the learning rate. After computing the updated parameters, store them in the parameters dictionary. 

In [18]:
def gradient_descent(parameters, grads, learning_rate):
    L = len(parameters) // 2 

    for l in range(L):
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - (learning_rate * grads["dW" + str(l+1)])
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - (learning_rate * grads["db" + str(l+1)])
    return parameters

In [19]:
parameters, grads = gradient_descent_test_case()
parameters = gradient_descent(parameters, grads, 0.1)

print ("W1 = "+ str(parameters["W1"]))
print ("b1 = "+ str(parameters["b1"]))
print ("W2 = "+ str(parameters["W2"]))
print ("b2 = "+ str(parameters["b2"]))

W1 = [[-0.59562069 -0.09991781 -2.14584584  1.82662008]
 [-1.76569676 -0.80627147  0.51115557 -1.18258802]
 [-1.0535704  -0.86128581  0.68284052  2.20374577]]
b1 = [[-0.04659241]
 [-1.28888275]
 [ 0.53405496]]
W2 = [[-0.55569196  0.0354055   1.32964895]]
b2 = [[-0.84610769]]


**Expected Output**:

<table style="width:100%"> 
    <tr>
    <td > W1 </td> 
           <td > [[-0.59562069 -0.09991781 -2.14584584  1.82662008]
 [-1.76569676 -0.80627147  0.51115557 -1.18258802]
 [-1.0535704  -0.86128581  0.68284052  2.20374577]] </td> 
  </tr> 
    <tr>
    <td > b1 </td> 
           <td > [[-0.04659241]
 [-1.28888275]
 [ 0.53405496]] </td> 
  </tr> 
  <tr>
    <td > W2 </td> 
           <td > [[-0.55569196  0.0354055   1.32964895]]</td> 
  </tr> 
    <tr>
    <td > b2 </td> 
           <td > [[-0.84610769]] </td> 
  </tr> 
</table>

You can now create a deep neural network  combining all the functions defined above.

**Loading the dataset**

In [20]:
def load_dataset():
#    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_dataset = h5py.File("/content/gdrive/My Drive/0IPSA/Ma512/TP/TP2/datasets/train_catvnoncat.h5", "r")  #You need to upload the dataset in the right folder.
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

#    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_dataset = h5py.File("/content/gdrive/My Drive/0IPSA/Ma512/TP/TP2/datasets/test_catvnoncat.h5", "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

#Exercise 1, Predictions on a dataset.
Create a function that takes as argument the images, the labels and the parameters to create a prediction using the forward_all function. 

In [21]:
def predict(X, y, parameters):

    m = X.shape[1]
    n = len(parameters) // 2 # number of layers in the neural network
    p = np.zeros((1,m))
    
    # Forward propagation
    probas, caches = forward_all(X, parameters)
    # convert probas to 0/1 predictions, if probas [0,i] > 0.5-> p[0,i]=1 else p[0,i]=0
    for i in range(0, probas.shape[1]):
      ### START CODE HERE ###  (≈ 2 lines of code)
      if probas[0,i] > 0.5:
        p[0,i] = 1
      else:
        p[0,i] = 0

      ### END CODE HERE ###

    acc = np.sum((p == y)/m)        
    return acc