# Assignment 2 - Two-layer neural network

In this assignment, you will learn how to:
- Build a 2-class classification neural network with a single hidden layer
- Compute the cross-entropy loss 
- Implement forward and backward propagation

Let's first import all the packages that you will need during this assignment.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Dataset

Let's get the dataset you will work on. The following code will load a "flower" 2-class dataset into variables `X` and `Y`.

In [None]:
def load_planar_dataset():
    np.random.seed(1)
    m = 400 # number of examples
    N = int(m/2) # number of points per class
    D = 2 # dimensionality
    X = np.zeros((m,D)) # data matrix where each row is a single example
    Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
    a = 4 # maximum ray of the flower

    for j in range(2):
        ix = range(N*j,N*(j+1))
        t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
        r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
        Y[ix] = j
        
    X = X.T
    Y = Y.T

    return X, Y

X, Y = load_planar_dataset()

Visualize the data. They look like a "flower" with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data. 

In [None]:
# Visualize the data:
plt.scatter(X[0, :], X[1, :], c=Y.flatten(), s=40, cmap=plt.cm.Spectral);

You have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).

Lets first get a better sense of what our data is like. 

---
**Exercise**: How many training examples do you have? In addition, what is the `shape` of the variables `X` and `Y`? 

**Hint**: How do you get the shape of a numpy array? [(help)](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html)

In [None]:
### START CODE HERE ### (≈ 3 lines of code)
shape_X = None
shape_Y = None
N       = None  # training set size
### END CODE HERE ###

print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have N = %d training examples!' % (N))

**Expected Output**:
       
<table style="width:20%">
  
  <tr>
    <td>**shape of X**</td>
    <td> (2, 400) </td> 
  </tr>
  
  <tr>
    <td>**shape of Y**</td>
    <td>(1, 400) </td> 
  </tr>
  
    <tr>
    <td>**N**</td>
    <td> 400 </td> 
  </tr>
  
</table>

In [None]:
assert shape_X == (2,400)
assert shape_Y == (1,400)
assert N == 400

## Neural Network model

Now, you are going to train a Neural Network with a single hidden layer. The general methodology to build a Neural Network is to:
 1. Define the neural network structure (number of input units, number of hidden units, etc). 
 2. Initialize the model's parameters
 3. Loop:
    - Perform forward propagation
    - Compute the loss function
    - Perform backward propagation to get the gradients
    - Update the parameters (one iteration of gradient descent)

You will build helper functions to compute steps 1-3, and then merge them into one function `nn_model()`.

---

**Two-layer network for binary classification**

For each input vector ${\rm x}$, the network performs the followin chain of operations.

- The hidden layer transform the input's network into a vector of size $M_h$
$$ 
\begin{aligned}
\\
{\rm z}^{[1]} &=  W^{[1]} {\rm x} + {\rm b}^{[1]}\\ 
{\rm a}^{[1]} &= \tanh\big({\rm z}^{[1]}\big)\\
\\
\end{aligned}
$$
where the matrix $W^{[1]}$ and the column vector ${\rm b}^{[1]}$ gather the hidden layer's parameters
$$
W^{[1]} = 
\begin{bmatrix}
\_\!\_\; {\rm w}_1^\top \_\!\_ \\
\vdots\\
\_\!\_\; {\rm w}_{M_1}^\top \_\!\_ \\
\end{bmatrix}
\qquad\qquad
{\rm b}^{[1]} = 
\begin{bmatrix} 
b_1 \\
\vdots\\
b_{M_1}
\end{bmatrix}
$$


- The output layer transforms the hidden representation into the network's output
$$ 
\begin{aligned}
\\
{\rm z}^{[2]} &=  W^{[2]} {\rm a}^{[1]} + {\rm b}^{[2]}\\ 
{\rm a}^{[2]} &= \sigma\big({\rm z}^{[2]}\big)\\
\\
\end{aligned}
$$
where the matrix $W^{[2]}$ and the column vector ${\rm b}^{[2]}$ gather the output layer's parameters
$$
W^{[2]} = 
\begin{bmatrix}
\_\!\_\; {\rm w}_1^\top \_\!\_ \\
\vdots\\
\_\!\_\; {\rm w}_{M_2}^\top \_\!\_ \\
\end{bmatrix}
\qquad\qquad
{\rm b}^{[2]} = 
\begin{bmatrix} 
b_1 \\
\vdots\\
b_{M_2}
\end{bmatrix}
$$

In our specific case of binary classification, the network's output is a scalar. Hence, $M_2=1$ and $\sigma$ denotes the sigmoid function. 

### 1 - Defining the neural network structure ####

**Exercise**: Define three variables:
 - n_x: the size of the input layer
 - n_h: the size of the hidden layer (set this to 4) 
 - n_y: the size of the output layer

**Hint**: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.

In [None]:
# GRADED FUNCTION: layer_sizes

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    ### START CODE HERE ### (≈ 3 lines of code)
    n_x = None # size of input layer
    n_h = None
    n_y = None # size of output layer
    ### END CODE HERE ###
    
    return (n_x, n_h, n_y)

In [None]:
np.random.seed(1)
X_assess = np.random.randn(5, 3)
Y_assess = np.random.randn(2, 3)
    
(n_x, n_h, n_y) = layer_sizes(X_assess, Y_assess)

print("The size of the input  layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))

**Expected Output** (these are not the sizes you will use for your network, they are just used to assess the function you've just coded).

<table style="width:20%">
  <tr>
    <td>**n_x**</td>
    <td> 5 </td> 
  </tr>
  
    <tr>
    <td>**n_h**</td>
    <td> 4 </td> 
  </tr>
  
    <tr>
    <td>**n_y**</td>
    <td> 2 </td> 
  </tr>
  
</table>

In [None]:
assert n_x == 5
assert n_h == 4
assert n_y == 2

### 2 - Initialize the model's parameters ####

**Exercise**: Implement the function `initialize_parameters()`.

**Instructions**:
- Make sure your parameters' sizes are right.
- You will initialize the weights matrices with random values. 
    - Use: `np.random.randn(a,b) * 0.01` to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros. 
    - Use: `np.zeros((a,b))` to initialize a matrix of shape (a,b) with zeros.

In [None]:
# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2) # we set up a seed so that your output matches ours although the initialization is random.
    
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = None
    b1 = None
    W2 = None
    b2 = None
    ### END CODE HERE ###
    
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [None]:
n_x, n_h, n_y = 2, 4, 1

parameters = initialize_parameters(n_x, n_h, n_y)

print("W1 = \n" + str(parameters["W1"]))
print("b1 = \n" + str(parameters["b1"]))
print("W2 = \n" + str(parameters["W2"]))
print("b2 = \n" + str(parameters["b2"]))

**Expected Output**:

<table style="width:90%">
  <tr>
    <td>**W1**</td>
    <td> [[-0.00416758 -0.00056267]
 [-0.02136196  0.01640271]
 [-0.01793436 -0.00841747]
 [ 0.00502881 -0.01245288]] </td> 
  </tr>
  
  <tr>
    <td>**b1**</td>
    <td> [[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]] </td> 
  </tr>
  
  <tr>
    <td>**W2**</td>
    <td> [[-0.01057952 -0.00909008  0.00551454  0.02292208]]</td> 
  </tr>
  

  <tr>
    <td>**b2**</td>
    <td> [[ 0.]] </td> 
  </tr>
  
</table>



In [None]:
W1 = parameters["W1"]
np.testing.assert_almost_equal(W1[0,0], -0.00416758)
np.testing.assert_almost_equal(W1[0,1], -0.00056267)
np.testing.assert_almost_equal(W1[1,0], -0.02136196)
np.testing.assert_almost_equal(W1[1,1],  0.01640271)
np.testing.assert_almost_equal(W1[2,0], -0.01793436)
np.testing.assert_almost_equal(W1[2,1], -0.00841747)
np.testing.assert_almost_equal(W1[3,0],  0.00502881)
np.testing.assert_almost_equal(W1[3,1], -0.01245288)

b1 = parameters["b1"]
np.testing.assert_almost_equal(b1[0,0], 0.0)
np.testing.assert_almost_equal(b1[1,0], 0.0)
np.testing.assert_almost_equal(b1[2,0], 0.0)
np.testing.assert_almost_equal(b1[3,0], 0.0)

W2 = parameters["W2"]
np.testing.assert_almost_equal(W2[0,0], -0.01057952)
np.testing.assert_almost_equal(W2[0,1], -0.00909008)
np.testing.assert_almost_equal(W2[0,2],  0.00551454)
np.testing.assert_almost_equal(W2[0,3],  0.02292208)

b2 = parameters["b2"]
np.testing.assert_almost_equal(b2[0,0], 0.0)

### 3 - Forward propagation 

Ghatering the training examples $\big(x^{(n)},y^{(n)}\big)_{1\le n\le N}$ into a matrix $X$ and a row vector $Y$:

$$
X = 
\begin{bmatrix}
| & & |\\[-1em]
{\rm x}^{(1)} & \dots & {\rm x}^{(N)}\\
| & & |\\
\end{bmatrix}
\qquad\qquad
Y = \left[ y^{(1)} \;\dots\; y^{(N)} \right]
$$

the vectorialized implementation of the forward propagation boils down to 

$$
\begin{aligned}
Z^{[1]} &=  W^{[1]} X + {\rm b}^{[1]}\\ 
A^{[1]} &= \tanh(Z^{[1]})\\
Z^{[2]} &=  W^{[2]} A^{[1]} + {\rm b}^{[2]}\\ 
A^{[2]} &= \operatorname{sigmoid}(Z^{[2]})
\end{aligned}
$$


**Question**: Implement `forward_propagation()`.

**Instructions**:
- Look above at the mathematical representation of your classifier.
- You can use the function `sigmoid()` defined below.
- You can use the function `np.tanh()`. It is part of the numpy library.
- The steps you have to implement are:
    1. Retrieve each parameter from the dictionary "parameters" (which is the output of `initialize_parameters()`) by using `parameters[".."]`.
    2. Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in "`cache`". The `cache` will be given as an input to the backpropagation function.

In [None]:
def sigmoid(x):
    s = 1/(1+np.exp(-x))
    return s

In [None]:
# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = None
    b1 = None
    W2 = None
    b2 = None
    ### END CODE HERE ###
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    ### START CODE HERE ### (≈ 4 lines of code)
    Z1 = None
    A1 = None
    Z2 = None
    A2 = None
    ### END CODE HERE ###
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [None]:
np.random.seed(1)
X_assess = np.random.randn(2, 3)

parameters = {
    'W1': np.array([[-0.00416758, -0.00056267],
                    [-0.02136196,  0.01640271],
                    [-0.01793436, -0.00841747],
                    [ 0.00502881, -0.01245288]]),
     'W2': np.array([[-0.01057952, -0.00909008,  0.00551454,  0.02292208]]),
     'b1': np.random.randn(4,1),
     'b2': np.array([[ -1.3]])
}

A2, cache = forward_propagation(X_assess, parameters)

# Note: we use the mean here just to make sure that your output matches ours. 
print(np.mean(cache['Z1']) ,np.mean(cache['A1']),np.mean(cache['Z2']),np.mean(cache['A2']))

**Expected Output**:
<table style="width:50%">
  <tr>
    <td> 0.262818640198 0.091999045227 -1.30766601287 0.212877681719 </td> 
  </tr>
</table>

In [None]:
np.testing.assert_almost_equal( np.mean(cache['Z1']),  0.262818640198)
np.testing.assert_almost_equal( np.mean(cache['A1']),  0.091999045227)
np.testing.assert_almost_equal( np.mean(cache['Z2']), -1.30766601287)
np.testing.assert_almost_equal( np.mean(cache['A2']),  0.212877681719)

### 4 - Cost function

Now that you have computed $A^{[2]}$ (in the Python variable "`A2`"), which contains $a^{[2](n)}$ for every example $\big(x^{(n)},y^{(n)}\big)$, you can compute the cost function $J$ as follows:

$$J(\theta) = -\frac{1}{N} \sum_{n = 1}^{N} \Big( \small y^{(n)}\log\left(a^{[2](n)}\right) + (1-y^{(n)})\log\left(1- a^{[2](n)}\right) \Big)$$

where $\theta=(W^{[1]},b^{[1]},W^{[2]},b^{[2]})$ contains all the network parametrs.

**Exercise**: Implement `compute_cost()` to compute the value of the cost $J$.

**Instructions**: There are many ways to implement the cross-entropy loss. For example, you could implement the term $-\frac{1}{N}\sum_{n = 1}^{N} \small y^{(n)}\log\left(a^{[2](n)}\right)$ as follows:

```python
logprobs = np.multiply(np.log(A2),Y)
cost = -np.mean(logprobs)                # no need to use a loop!
```

Don't forget to add the other term.

In [None]:
# GRADED FUNCTION: compute_cost

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost
    """
    
    # Compute the cross-entropy cost
    ### START CODE HERE ### (≈ 2 lines of code)
    logprobs = None
    cost = None
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

In [None]:
np.random.seed(1)
Y_assess = (np.random.randn(1, 3) > 0)
parameters = {
    'W1': np.array([[-0.00416758, -0.00056267],
                    [-0.02136196,  0.01640271],
                    [-0.01793436, -0.00841747],
                    [ 0.00502881, -0.01245288]]),
     'W2': np.array([[-0.01057952, -0.00909008,  0.00551454,  0.02292208]]),
     'b1': np.array([[ 0.],[ 0.],[ 0.],[ 0.]]),
     'b2': np.array([[ 0.]])
}
a2 = (np.array([[ 0.5002307 ,  0.49985831,  0.50023963]]))
    
cost = compute_cost(a2, Y_assess, parameters)

print("cost = " + str(cost))

**Expected Output**:
<table style="width:20%">
  <tr>
    <td>**cost**</td>
    <td> 0.693058761... </td> 
  </tr>
  
</table>

In [None]:
np.testing.assert_almost_equal(cost, 0.693058761039)

### 5 - Backward propagation

Using the cache computed during forward propagation, you can now implement backward propagation.

**Question**: Implement the function `backward_propagation()`.

**Instructions**:
Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you, you'll find below the six equations describing the vectorialized implementation of backpropagation. 

$$
\begin{aligned}
dZ^{[2]} &= A^{[2]} - Y \\
dZ^{[1]} &= \big( {W^{[2]}}^\top dZ^{[2]} \big) \circ \tanh'\big(Z^{[1]}\big)\\
\\
dW^{[2]} &= \frac{1}{N} dZ^{[2]} \, {A^{[1]}}^T \\
db^{[2]} &= \frac{1}{N} dZ^{[2]} \mathbb{1}_N\\
\\
dW^{[1]} &= \frac{1}{N} dZ^{[1]} \, X^T\\
db^{[1]} &= \frac{1}{N} dZ^{[1]} \mathbb{1}_N
\end{aligned}
$$

Note that $\mathbb{1}_N=[1\;1\;\dots\;1]^\top$, and $\circ$ denotes the elementwise multiplication.

**Tips:**
 
 
 - To compute $dZ^{[1]}$, you'll need to compute $\tanh'(Z^{[1]})$, that is the derivative of $\tanh$ in $Z^{[1]})$. It can be shown that, if $a = \tanh(z)$ then $\tanh'(z) = 1-a^2$. So you can compute $\tanh'(Z^{[1]})$ using `(1 - np.power(A1, 2))`.
 
 
 - To compute $db^{[2]}$ and $db^{[1]}$, you may find useful the command `np.mean(..., axis=1, keepdims=True)`

In [None]:
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    
    N = X.shape[1] # numer of examples
    
    # First, retrieve W1 and W2 from the dictionary "parameters".
    ### START CODE HERE ### (≈ 2 lines of code)
    W1 = None
    W2 = None
    ### END CODE HERE ###
    
    # Retrieve also A1 and A2 from dictionary "cache".
    ### START CODE HERE ### (≈ 2 lines of code)
    A1 = None
    A2 = None
    ### END CODE HERE ###
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    ### START CODE HERE ### (≈ 6 lines of code, corresponding to the 6 equations given above)
    dZ2 = None
    dZ1 = None
    dW2 = None
    db2 = None
    dW1 = None
    db1 = None
    ### END CODE HERE ###
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [None]:
np.random.seed(1)
    
X_assess = np.random.randn(2, 3)
Y_assess = (np.random.randn(1, 3) > 0)
    
parameters = {
    'W1': np.array([[-0.00416758, -0.00056267],
                    [-0.02136196,  0.01640271],
                    [-0.01793436, -0.00841747],
                    [ 0.00502881, -0.01245288]]),
    'W2': np.array([[-0.01057952, -0.00909008,  0.00551454,  0.02292208]]),
    'b1': np.array([[ 0.],
                    [ 0.],
                    [ 0.],
                    [ 0.]]),
    'b2': np.array([[ 0.]])
}
cache = {
    'A1': np.array([[-0.00616578,  0.0020626 ,  0.00349619],
                    [-0.05225116,  0.02725659, -0.02646251],
                    [-0.02009721,  0.0036869 ,  0.02883756],
                    [ 0.02152675, -0.01385234,  0.02599885]]),
    'A2': np.array([[ 0.5002307 ,  0.49985831,  0.50023963]]),
    'Z1': np.array([[-0.00616586,  0.0020626 ,  0.0034962 ],
                    [-0.05229879,  0.02726335, -0.02646869],
                    [-0.02009991,  0.00368692,  0.02884556],
                    [ 0.02153007, -0.01385322,  0.02600471]]),
    'Z2': np.array([[ 0.00092281, -0.00056678,  0.00095853]])
}

grads = backward_propagation(parameters, cache, X_assess, Y_assess)

print ("dW1 =\n"+ str(grads["dW1"]))
print ("db1 =\n"+ str(grads["db1"]))
print ("dW2 =\n"+ str(grads["dW2"]))
print ("db2 =\n"+ str(grads["db2"]))

**Expected output**:



<table style="width:80%">
  <tr>
    <td>**dW1**</td>
    <td> [[ 0.00301023 -0.00747267]
 [ 0.00257968 -0.00641288]
 [-0.00156892  0.003893  ]
 [-0.00652037  0.01618243]] </td> 
  </tr>
  
  <tr>
    <td>**db1**</td>
    <td>  [[ 0.00176201]
 [ 0.00150995]
 [-0.00091736]
 [-0.00381422]] </td> 
  </tr>
  
  <tr>
    <td>**dW2**</td>
    <td> [[ 0.00078841  0.01765429 -0.00084166 -0.01022527]] </td> 
  </tr>
  

  <tr>
    <td>**db2**</td>
    <td> [[-0.16655712]] </td> 
  </tr>
  
</table>  

In [None]:
dW1 = grads["dW1"]
np.testing.assert_almost_equal(dW1[0,0],  0.00301023)
np.testing.assert_almost_equal(dW1[0,1], -0.00747267)
np.testing.assert_almost_equal(dW1[1,0],  0.00257968)
np.testing.assert_almost_equal(dW1[1,1], -0.00641288)
np.testing.assert_almost_equal(dW1[2,0], -0.00156892)
np.testing.assert_almost_equal(dW1[2,1],  0.003893)
np.testing.assert_almost_equal(dW1[3,0], -0.00652037)
np.testing.assert_almost_equal(dW1[3,1],  0.01618243)

db1 = grads["db1"]
np.testing.assert_almost_equal(db1[0,0],  0.00176201)
np.testing.assert_almost_equal(db1[1,0],  0.00150995)
np.testing.assert_almost_equal(db1[2,0], -0.00091736)
np.testing.assert_almost_equal(db1[3,0], -0.00381422)

dW2 = grads["dW2"]
np.testing.assert_almost_equal(dW2[0,0],  0.00078841)
np.testing.assert_almost_equal(dW2[0,1],  0.01765429)
np.testing.assert_almost_equal(dW2[0,2], -0.00084166)
np.testing.assert_almost_equal(dW2[0,3], -0.01022527)

db2 = grads["db2"]
np.testing.assert_almost_equal(db2[0,0], -0.16655712)

### 6 - Gradient descent

**Question**: Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).

**General gradient descent rule**: $ \theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter.

In [None]:
# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate = 1.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = None
    b1 = None
    W2 = None
    b2 = None
    ### END CODE HERE ###
    
    # Retrieve each gradient from the dictionary "grads"
    ### START CODE HERE ### (≈ 4 lines of code)
    dW1 = None
    db1 = None
    dW2 = None
    db2 = None
    ## END CODE HERE ###
    
    # Update rule for each parameter
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 
    b1 
    W2  
    b2  
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [None]:
parameters = {
    'W1': np.array([[-0.00615039,  0.0169021 ],
                    [-0.02311792,  0.03137121],
                    [-0.0169217 , -0.01752545],
                    [ 0.00935436, -0.05018221]]),
    'W2': np.array([[-0.0104319 , -0.04019007,  0.01607211,  0.04440255]]),
    'b1': np.array([[ -8.97523455e-07],
                    [  8.15562092e-06],
                    [  6.04810633e-07],
                    [ -2.54560700e-06]]),
    'b2': np.array([[  9.14954378e-05]])
}
grads = {
    'dW1': np.array([[ 0.00023322, -0.00205423],
                     [ 0.00082222, -0.00700776],
                     [-0.00031831,  0.0028636 ],
                     [-0.00092857,  0.00809933]]),
    'dW2': np.array([[ -1.75740039e-05,   3.70231337e-03,  -1.25683095e-03, -2.55715317e-03]]),
    'db1': np.array([[  1.05570087e-07],
                     [ -3.81814487e-06],
                     [ -1.90155145e-07],
                     [  5.46467802e-07]]),
    'db2': np.array([[ -1.08923140e-05]])}

parameters = update_parameters(parameters, grads)

print("W1 =\n" + str(parameters["W1"]))
print("b1 =\n" + str(parameters["b1"]))
print("W2 =\n" + str(parameters["W2"]))
print("b2 =\n" + str(parameters["b2"]))

**Expected Output**:


<table style="width:80%">
  <tr>
    <td>**W1**</td>
    <td> [[-0.00643025  0.01936718]
 [-0.02410458  0.03978052]
 [-0.01653973 -0.02096177]
 [ 0.01046864 -0.05990141]]</td> 
  </tr>
  
  <tr>
    <td>**b1**</td>
    <td> [[ -1.02420756e-06]
 [  1.27373948e-05]
 [  8.32996807e-07]
 [ -3.20136836e-06]]</td> 
  </tr>
  
  <tr>
    <td>**W2**</td>
    <td> [[-0.01041081 -0.04463285  0.01758031  0.04747113]] </td> 
  </tr>
  

  <tr>
    <td>**b2**</td>
    <td> [[ 0.00010457]] </td> 
  </tr>
  
</table>  

In [None]:
W1 = parameters["W1"]
np.testing.assert_almost_equal(W1[0,0], -0.00643025)
np.testing.assert_almost_equal(W1[0,1],  0.01936718)
np.testing.assert_almost_equal(W1[1,0], -0.02410458)
np.testing.assert_almost_equal(W1[1,1],  0.03978052)
np.testing.assert_almost_equal(W1[2,0], -0.01653973)
np.testing.assert_almost_equal(W1[2,1], -0.02096177)
np.testing.assert_almost_equal(W1[3,0],  0.01046864)
np.testing.assert_almost_equal(W1[3,1], -0.05990141)

b1 = parameters["b1"]
np.testing.assert_almost_equal(b1[0,0], -1.02420756e-06)
np.testing.assert_almost_equal(b1[1,0],  1.27373948e-05)
np.testing.assert_almost_equal(b1[2,0],  8.32996807e-07)
np.testing.assert_almost_equal(b1[3,0], -3.20136836e-06)

W2 = parameters["W2"]
np.testing.assert_almost_equal(W2[0,0], -0.01041081)
np.testing.assert_almost_equal(W2[0,1], -0.04463285)
np.testing.assert_almost_equal(W2[0,2],  0.01758031)
np.testing.assert_almost_equal(W2[0,3],  0.04747113)

b2 = parameters["b2"]
np.testing.assert_almost_equal(b2[0,0], 0.00010457)

### 7 - Integrate all the pieces in nn_model() ####

**Question**: Build your neural network model in `nn_model()`.

**Instructions**: The neural network model has to use the previous functions in the right order.

In [None]:
# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = None
    W1 = None
    b1 = None
    W2 = None
    b2 = None
    ### END CODE HERE ###
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):
         
        ### START CODE HERE ### (≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        #A2, cache = None
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        #cost = None
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        #grads = None
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        #parameters = None
        
        ### END CODE HERE ###
        
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

In [None]:
np.random.seed(1)
X_assess = np.random.randn(2, 3)
Y_assess = (np.random.randn(1, 3) > 0)

parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True)

print("W1 =\n" + str(parameters["W1"]))
print("b1 =\n" + str(parameters["b1"]))
print("W2 =\n" + str(parameters["W2"]))
print("b2 =\n" + str(parameters["b2"]))

**Expected Output**:

<table style="width:90%">

<tr> 
    <td> 
        **cost after iteration 0**
    </td>
    <td> 
        0.692739
    </td>
</tr>

<tr> 
    <td> 
        <center> $\vdots$ </center>
    </td>
    <td> 
        <center> $\vdots$ </center>
    </td>
</tr>

  <tr>
    <td>**W1**</td>
    <td> [[-0.65848169  1.21866811]
 [-0.76204273  1.39377573]
 [ 0.5792005  -1.10397703]
 [ 0.76773391 -1.41477129]]</td> 
  </tr>
  
  <tr>
    <td>**b1**</td>
    <td> [[ 0.287592  ]
 [ 0.3511264 ]
 [-0.2431246 ]
 [-0.35772805]] </td> 
  </tr>
  
  <tr>
    <td>**W2**</td>
    <td> [[-2.45566237 -3.27042274  2.00784958  3.36773273]] </td> 
  </tr>
  

  <tr>
    <td>**b2**</td>
    <td> [[ 0.20459656]] </td> 
  </tr>
  
</table>  

In [None]:
W1 = parameters["W1"]
np.testing.assert_almost_equal(W1[0,0], -0.65848169)
np.testing.assert_almost_equal(W1[0,1],  1.21866811)
np.testing.assert_almost_equal(W1[1,0], -0.76204273)
np.testing.assert_almost_equal(W1[1,1],  1.39377573)
np.testing.assert_almost_equal(W1[2,0],  0.5792005)
np.testing.assert_almost_equal(W1[2,1], -1.10397703)
np.testing.assert_almost_equal(W1[3,0],  0.76773391)
np.testing.assert_almost_equal(W1[3,1], -1.41477129)

b1 = parameters["b1"]
np.testing.assert_almost_equal(b1[0,0],  0.287592)
np.testing.assert_almost_equal(b1[1,0],  0.3511264)
np.testing.assert_almost_equal(b1[2,0], -0.2431246)
np.testing.assert_almost_equal(b1[3,0], -0.35772805)

W2 = parameters["W2"]
np.testing.assert_almost_equal(W2[0,0], -2.45566237)
np.testing.assert_almost_equal(W2[0,1], -3.27042274)
np.testing.assert_almost_equal(W2[0,2],  2.00784958)
np.testing.assert_almost_equal(W2[0,3],  3.36773273)

b2 = parameters["b2"]
np.testing.assert_almost_equal(b2[0,0], 0.20459656)

### 8 - Predictions

**Question**: Use your model to predict by building predict().
Use forward propagation to predict results.

**Reminder**: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise}
    \end{cases}$  
    
As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: ```X_new = (X > threshold)```

In [None]:
# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    #A2, cache = None
    #predictions = None
    ### END CODE HERE ###
    
    return predictions

In [None]:
np.random.seed(1)
X_assess = np.random.randn(2, 3)
parameters = {
    'W1': np.array([[-0.00615039,  0.0169021 ],
                    [-0.02311792,  0.03137121],
                    [-0.0169217 , -0.01752545],
                    [ 0.00935436, -0.05018221]]),
    'W2': np.array([[-0.0104319 , -0.04019007,  0.01607211,  0.04440255]]),
    'b1': np.array([[ -8.97523455e-07],
                    [  8.15562092e-06],
                    [  6.04810633e-07],
                    [ -2.54560700e-06]]),
     'b2': np.array([[  9.14954378e-05]])
}

predictions = predict(parameters, X_assess)

print("predictions mean = " + str(np.mean(predictions)))

**Expected Output**: 


<table style="width:40%">
  <tr>
    <td>**predictions mean**</td>
    <td> 0.666666666667 </td> 
  </tr>
  
</table>

In [None]:
np.testing.assert_almost_equal(np.mean(predictions), 0.666666666667)

## Performance on the dataset

It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of $n_h$ hidden units.

In [None]:
# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

**Expected Output**:

<table style="width:40%">
  <tr>
    <td>**Cost after iteration 9000**</td>
    <td> 0.218607 </td> 
  </tr>
  
</table>


Run the following code to plot the decision boundary.

In [None]:
def plot_decision_boundary(model, X, y):
    
    # Set min and max values and give it some padding
    x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1
    y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1
    h = 0.01
    
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    
    # Predict the function value for the whole grid
    Z = model(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.5)
    plt.ylabel('x2')
    plt.xlabel('x1')
    plt.scatter(X[0, :], X[1, :], c=y.flatten(), cmap=plt.cm.Spectral)

In [None]:
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))

In [None]:
# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

**Expected Output**: 

<table style="width:15%">
  <tr>
    <td>**Accuracy**</td>
    <td> 90% </td> 
  </tr>
</table>

Accuracy is really high, as the model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression. 

Now, let's try out several hidden layer sizes.

### Tuning hidden layer size

Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.

In [None]:
# This may take about 2 minutes to run

plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 20, 50]
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    parameters = nn_model(X, Y, n_h, num_iterations = 5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))

**Interpretation**:
- The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data. 
- The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to  fits the data well without also incurring noticable overfitting.
- You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting. 

**You've learnt to:**
- Build a complete neural network with a hidden layer
- Implement forward propagation and backpropagation, and trained a neural network
- See the impact of varying the hidden layer size, including overfitting.

**Further reading:**
- http://scs.ryerson.ca/~aharley/neural-networks/
- http://cs231n.github.io/neural-networks-case-study/

**Credits:** 
- This assignment is partly based on Andrew Ng's course on Coursera.