# Single Hidden Layer Neural Network

**You will learn how to:**
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh 
- Compute the cross entropy loss 
- Implement forward and backward propagation


## 1 - Packages ##

Let's first import all the packages that you will need during this assignment.
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [sklearn](http://scikit-learn.org/stable/) provides simple and efficient tools for data mining and data analysis. 
- [matplotlib](http://matplotlib.org) is a library for plotting graphs in Python.

In [6]:
# Package imports
import numpy as np
import matplotlib.pyplot as plt

import sklearn
from sklearn.datasets import load_breast_cancer
import sklearn.linear_model
%matplotlib inline

np.random.seed(1) # set a seed so that the results are consistent

## 2 - Dataset ##

First, let's get the dataset you will work on. The following code will load a "flower" 2-class dataset into variables `X` and `Y`.

In [7]:
dir(sklearn.datasets)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_svmlight_format',
 'base',
 'california_housing',
 'clear_data_home',
 'covtype',
 'dump_svmlight_file',
 'fetch_20newsgroups',
 'fetch_20newsgroups_vectorized',
 'fetch_california_housing',
 'fetch_covtype',
 'fetch_kddcup99',
 'fetch_lfw_pairs',
 'fetch_lfw_people',
 'fetch_mldata',
 'fetch_olivetti_faces',
 'fetch_rcv1',
 'fetch_species_distributions',
 'get_data_home',
 'kddcup99',
 'lfw',
 'load_boston',
 'load_breast_cancer',
 'load_diabetes',
 'load_digits',
 'load_files',
 'load_iris',
 'load_linnerud',
 'load_mlcomp',
 'load_sample_image',
 'load_sample_images',
 'load_svmlight_file',
 'load_svmlight_files',
 'load_wine',
 'make_biclusters',
 'make_blobs',
 'make_checkerboard',
 'make_circles',
 'make_classification',
 'make_friedman1',
 'make_friedman2',
 'make_friedman3',
 'make_gaussian_quantiles',
 'make_hastie_10_2',
 'make_low_rank

In [8]:
# For help on the dataset of sklearn library
help(sklearn.datasets.load_breast_cancer)

Help on function load_breast_cancer in module sklearn.datasets.base:

load_breast_cancer(return_X_y=False)
    Load and return the breast cancer wisconsin dataset (classification).
    
    The breast cancer dataset is a classic and very easy binary classification
    dataset.
    
    Classes                          2
    Samples per class    212(M),357(B)
    Samples total                  569
    Dimensionality                  30
    Features            real, positive
    
    Parameters
    ----------
    return_X_y : boolean, default=False
        If True, returns ``(data, target)`` instead of a Bunch object.
        See below for more information about the `data` and `target` object.
    
        .. versionadded:: 0.18
    
    Returns
    -------
    data : Bunch
        Dictionary-like object, the interesting attributes are:
        'data', the data to learn, 'target', the classification labels,
        'target_names', the meaning of the labels, 'feature_names', the
        m

In [9]:
"""
dataset = load_iris()
X = dataset.data[0:100][:]
Y = dataset.target[0:100]
idx = np.random.permutation(X.shape[0])
X, Y = X[idx], Y[idx]

X = (X-np.mean(X, axis=1, keepdims = True))/(np.max(X, axis=1, keepdims = True)-np.min(X, axis = 1, keepdims = True))
print(X)
Y = np.asarray([0 if i == 0 else 1 for i in Y])
print(Y)
X = X.T
"""
dataset = load_breast_cancer()
X = dataset.data
Y = dataset.target

# Generate random permutation of the dataset
idx = np.random.permutation(X.shape[0])
X, Y = X[idx], Y[idx]
#Normalize the dataset
X = X.T
X = (X-np.mean(X, axis=1, keepdims = True))/(np.max(X, axis=1, keepdims = True)-np.min(X, axis = 1, keepdims = True))
print(X.shape)


(30, 569)


In [10]:

x = np.asarray([[1, 1.2], [2, 2.2], [3, 3.2], [4, 4.2], [5, 5.2]])
x = x.T
y = np.asarray([10, 20, 30, 40, 50])
print(x.shape, y.shape)
mean = np.mean(x, axis=1, keepdims = True)
print(mean.shape)
print(x)
result = x- mean
print(result)

(2, 5) (5,)
(2, 1)
[[1.  2.  3.  4.  5. ]
 [1.2 2.2 3.2 4.2 5.2]]
[[-2. -1.  0.  1.  2.]
 [-2. -1.  0.  1.  2.]]


In [11]:


shape_X = X.shape
Y = Y.reshape(1,Y.shape[0])
shape_Y = Y.shape
m = X.shape[1]  # training set size

# Type: numpy array, validate using function type(X), type(Y)
print ('The shape of X(Features) is: ' + str(shape_X))
print ('The shape of Y(Target values) is: ' + str(shape_Y))
print ('Number of training examples:', m)

The shape of X(Features) is: (30, 569)
The shape of Y(Target values) is: (1, 569)
Number of training examples: 569


You have:
    - a numpy-array (matrix) X that contains your 30 features
    - a numpy-array (vector) Y that contains your labels (malignant:0, benign:1).

## 4 - Neural Network model

Logistic regression did not work well on the "flower dataset". You are going to train a Neural Network with a single hidden layer.

**Here is our model**:
<img src="images/classification_kiank.png" style="width:600px;height:300px;">

**Mathematically**:

For one example $x^{(i)}$:
$$z^{[1] (i)} =  W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}$$ 
$$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$
$$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}$$
$$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$
$$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$

Given the predictions on all the examples, you can also compute the cost $J$ as follows: 
$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right)  \large  \right) \small \tag{6}$$

**Reminder**: The general methodology to build a Neural Network is to:
    1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    2. Initialize the model's parameters
    3. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call `nn_model()`. Once you've built `nn_model()` and learnt the right parameters, you can make predictions on new data.

### 4.1 - Defining the neural network structure ####

**Exercise**: Define three variables:
    - n_x: the size of the input layer
    - n_h: the size of the hidden layer (set this to 4) 
    - n_y: the size of the output layer

**Hint**: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.

In [12]:

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    ### START CODE HERE ### (≈ 3 lines of code)
    n_x = X.shape[0] # size of input layer
    n_h = 4
    n_y = Y.shape[0] # size of output layer
    ### END CODE HERE ###
    return (n_x, n_h, n_y)

(n_x, n_h, n_y) = layer_sizes(X, Y)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))

The size of the input layer is: n_x = 30
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 1


### 4.2 - Initialize the model's parameters ####

**Exercise**: Implement the function `initialize_parameters()`.

**Instructions**:
- You will initialize the weights matrices with random values. 
    - Use: `np.random.randn(a,b) * 0.01` to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros. 
    - Use: `np.zeros((a,b))` to initialize a matrix of shape (a,b) with zeros.

In [13]:
# GRADED FUNCTION: initialize_parameters

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x) 
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2) # we set up a seed so that your output matches ours although the initialization is random.
    
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = np.random.randn(n_h,n_x)*0.005
    b1 = np.zeros((n_h,1))
    W2 = np.random.randn(n_y,n_h)*0.005
    b2 = np.zeros((n_y,1))
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    #parameters["W1"]
    

    return parameters

In [14]:
parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[-2.08378924e-03 -2.81334136e-04 -1.06809805e-02  8.20135404e-03
  -8.96717793e-03 -4.20873683e-03  2.51440709e-03 -6.22644043e-03
  -5.28976109e-03 -4.54503807e-03  2.75727022e-03  1.14610401e-02
   2.07696965e-04 -5.58962723e-03  2.69529160e-03 -2.98079850e-03
  -9.56524826e-05  5.87500610e-03 -3.73935475e-03  4.51262549e-05
  -4.39053947e-03 -7.82170852e-04  1.28285226e-03 -4.94389524e-03
  -1.69410983e-03 -1.18092015e-03 -3.18827506e-03 -5.93806143e-03
  -7.10608614e-03 -7.67475978e-04]
 [-1.34528480e-03  1.11568339e-02 -1.21738379e-02  5.63632524e-04
   1.85222268e-03  6.79816931e-03  2.50928603e-03 -4.22106852e-03
   4.88073580e-08  2.71176286e-03 -1.56754098e-03  3.85505869e-03
  -9.34045327e-03  8.65592333e-03  7.33839005e-03 -1.67838669e-03
   3.05670390e-03  2.39852959e-04 -4.14567645e-03  4.38551092e-04
   5.00182943e-03 -1.90546259e-03 -1.87834712e-03 -3.72353814e-04
   2.16748165e-03  6.39189615e-03 -3.17339653e-03  2.54198121e-03
   1.08058003e-03 -9.29306193e-03]
 

### 4.3 - Forward and Backward Propagation ####

**Question**: Implement `forward_propagation()`.

**Instructions**:
- Look above at the mathematical representation of your classifier.
- You can use the function `np.tanh()`. It is part of the numpy library.
- The steps you have to implement are:
    1. Retrieve each parameter from the dictionary "parameters" (which is the output of `initialize_parameters()`) by using `parameters[".."]`.
    2. Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in "`cache`". The `cache` will be given as an input to the backpropagation function.

In [15]:
# GRADED FUNCTION: forward_propagation
def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1.0/(1+np.exp(-x))
    ### END CODE HERE ###
    
    return s
    
def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    ### END CODE HERE ###
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    ### START CODE HERE ### (≈ 4 lines of code)
    Z1 = np.dot(W1,X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2,A1) + b2
    A2 = sigmoid(Z2)
    ### END CODE HERE ###
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [16]:
A2, cache = forward_propagation(X, parameters)
print(parameters["W2"].shape, cache['A1'].shape)

# Note: we use the mean here just to make sure that your output matches ours. 
print(np.mean(cache['Z1']) ,np.mean(cache['A1']),np.mean(cache['Z2']),np.mean(cache['A2']))

(1, 4) (4, 569)
-8.175913324605979e-18 2.2013241489508947e-08 -4.6128319782634504e-10 0.4999999998846783


Now that you have computed $A^{[2]}$ (in the Python variable "`A2`"), which contains $a^{[2](i)}$ for every example, you can compute the cost function as follows:

$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$

**Exercise**: Implement `compute_cost()` to compute the value of the cost $J$.

**Instructions**:
- There are many ways to implement the cross-entropy loss. To help you, we give you how we would have implemented
$- \sum\limits_{i=0}^{m}  y^{(i)}\log(a^{[2](i)})$:
```python
logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs)                # no need to use a for loop!
```

(you can use either `np.multiply()` and then `np.sum()` or directly `np.dot()`).


In [17]:
# GRADED FUNCTION: compute_cost

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost given equation (13)
    """
    
    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    ### START CODE HERE ### (≈ 2 lines of code)
    logprobs = np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2))
    cost = -np.sum(logprobs) / m
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

In [18]:
print("cost = " + str(compute_cost(A2, Y, parameters)))

cost = 0.6931607447723531


Using the cache computed during forward propagation, you can now implement backward propagation.

**Question**: Implement the function `backward_propagation()`.

**Instructions**:
Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you, here again is the slide from the lecture on backpropagation. You'll want to use the six equations on the right of this slide, since you are building a vectorized implementation.  

<img src="images/grad_summary.png" style="width:600px;height:300px;">

<!--
$\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})$

$\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} $

$\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}$

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } =  W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} }  X^T $

$\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}$

- Note that $*$ denotes elementwise multiplication.
- The notation you will use is common in deep learning coding:
    - dW1 = $\frac{\partial \mathcal{J} }{ \partial W_1 }$
    - db1 = $\frac{\partial \mathcal{J} }{ \partial b_1 }$
    - dW2 = $\frac{\partial \mathcal{J} }{ \partial W_2 }$
    - db2 = $\frac{\partial \mathcal{J} }{ \partial b_2 }$
    
!-->

- Tips:
    - To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute 
    $g^{[1]'}(Z^{[1]})$ using `(1 - np.power(A1, 2))`.

In [19]:
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # First, retrieve W1 and W2 from the dictionary "parameters".
    ### START CODE HERE ### (≈ 2 lines of code)
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    ### END CODE HERE ###
        
    # Retrieve also A1 and A2 from dictionary "cache".
    ### START CODE HERE ### (≈ 2 lines of code)
    A1 = cache["A1"]
    A2 = cache["A2"]
    ### END CODE HERE ###
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above)
    dZ2 = A2-Y
    dW2 = 1/m*(np.dot(dZ2,A1.T))
    db2 = 1/m*(np.sum(dZ2,axis=1, keepdims=True))
    dZ1 = np.multiply(np.dot(W2.T,dZ2),(1-np.power(A1,2)))
    dW1 = 1/m*(np.dot(dZ1,X.T))
    db1 = 1/m*(np.sum(dZ1,axis=1, keepdims=True))
    ### END CODE HERE ###
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [20]:

grads = backward_propagation(parameters, cache, X, Y)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))

dW1 = [[-3.00370034e-04 -1.48981225e-04 -3.07622434e-04 -2.61080412e-04
  -1.12300866e-04 -2.38378265e-04 -3.20862934e-04 -3.69477426e-04
  -1.12877277e-04  4.71770543e-06 -1.40495383e-04  2.49285682e-06
  -1.30700597e-04 -1.14912228e-04  1.68742191e-05 -9.72107350e-05
  -4.77112682e-05 -1.17654164e-04  1.87504353e-06 -1.75819720e-05
  -3.29341801e-04 -1.84645037e-04 -3.23218735e-04 -2.53316447e-04
  -1.56762477e-04 -2.22545533e-04 -2.71139754e-04 -4.42198757e-04
  -1.25235899e-04 -9.46445965e-05]
 [ 1.27164784e-04  6.30705258e-05  1.30235566e-04  1.10531350e-04
   4.75471862e-05  1.00925123e-04  1.35844795e-04  1.56425345e-04
   4.77920331e-05 -1.99250503e-06  5.94805339e-05 -1.05744974e-06
   5.53340715e-05  4.86497646e-05 -7.14356232e-06  4.11582710e-05
   2.02002515e-05  4.98112584e-05 -7.91847424e-07  7.44493893e-06
   1.39430422e-04  7.81689592e-05  1.36838645e-04  1.07244560e-04
   6.63699122e-05  9.42209874e-05  1.14793024e-04  1.87212775e-04
   5.30234898e-05  4.00717642e-05]


In [21]:
# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate = 0.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    ### END CODE HERE ###
    
    # Retrieve each gradient from the dictionary "grads"
    ### START CODE HERE ### (≈ 4 lines of code)
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    ## END CODE HERE ###
    
    # Update rule for each parameter
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [22]:

parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[-2.02371523e-03 -2.51537891e-04 -1.06194560e-02  8.25357012e-03
  -8.94471775e-03 -4.16106118e-03  2.57857967e-03 -6.15254495e-03
  -5.26718564e-03 -4.54598162e-03  2.78536930e-03  1.14605415e-02
   2.33837084e-04 -5.56664478e-03  2.69191676e-03 -2.96135635e-03
  -8.61102290e-05  5.89853693e-03 -3.73972976e-03  4.86426493e-05
  -4.32467111e-03 -7.45241845e-04  1.34749601e-03 -4.89323195e-03
  -1.66275733e-03 -1.13641105e-03 -3.13404711e-03 -5.84962168e-03
  -7.08103896e-03 -7.48547059e-04]
 [-1.37071776e-03  1.11442198e-02 -1.21998850e-02  5.41526254e-04
   1.84271325e-03  6.77798429e-03  2.48211707e-03 -4.25235359e-03
  -9.50959927e-06  2.71216136e-03 -1.57943709e-03  3.85527018e-03
  -9.35152009e-03  8.64619338e-03  7.33981877e-03 -1.68661835e-03
   3.05266385e-03  2.29890708e-04 -4.14551808e-03  4.37062104e-04
   4.97394335e-03 -1.92109638e-03 -1.90571484e-03 -3.93802727e-04
   2.15420767e-03  6.37305195e-03 -3.19635513e-03  2.50453866e-03
   1.06997533e-03 -9.30107628e-03]
 

### 4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model() ####

**Question**: Build your neural network model in `nn_model()`.

**Instructions**: The neural network model has to use the previous functions in the right order.

In [23]:
# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = initialize_parameters(n_x,n_h,n_y)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    ### END CODE HERE ###
    
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):
         
        ### START CODE HERE ### (≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X,parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2,Y,parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters,cache,X,Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters,grads)
        
        ### END CODE HERE ###
        
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    return parameters

In [24]:
parameters = nn_model(X, Y, 5, num_iterations=10000, print_cost=True)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

Cost after iteration 0: 0.693138
Cost after iteration 1000: 0.067387
Cost after iteration 2000: 0.057015
Cost after iteration 3000: 0.051639
Cost after iteration 4000: 0.048568
Cost after iteration 5000: 0.046482
Cost after iteration 6000: 0.044684
Cost after iteration 7000: 0.042865
Cost after iteration 8000: 0.040966
Cost after iteration 9000: 0.038963
W1 = [[-5.45022991e-01 -4.62933828e-01 -5.04799273e-01 -5.90138238e-01
  -9.86000770e-02  8.08527998e-01 -1.28054853e+00 -1.30630394e+00
  -1.07138570e-02  4.93154011e-01 -2.02431182e+00  1.11604829e+00
  -1.14870374e+00 -1.31152427e+00 -8.51952385e-01  7.02545953e-01
   7.77182426e-02 -3.06646900e-01  1.36383135e-01  9.60651806e-01
  -1.89347449e+00 -1.89075710e+00 -1.38907472e+00 -1.56262618e+00
  -1.16727723e+00 -7.71593015e-02 -1.34691174e+00 -1.06200983e+00
  -1.69039007e+00 -5.25468565e-01]
 [-2.12929729e-03  1.15472006e-02 -1.30911244e-02  3.45268131e-04
   3.18137224e-03  3.26381692e-03  3.89128360e-03 -1.37099483e-03
  -1.0562

### 4.5 Predictions

**Question**: Use your model to predict by building predict().
Use forward propagation to predict results.

**Reminder**: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise}
    \end{cases}$  
    
As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: ```X_new = (X > threshold)```

In [25]:
# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    A2, cache = forward_propagation(X,parameters)
    predictions = 1*(A2>0.5)
    ### END CODE HERE ###
    
    return predictions

In [26]:
# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 20000, print_cost=True)

Cost after iteration 0: 0.693161
Cost after iteration 1000: 0.067768
Cost after iteration 2000: 0.057075
Cost after iteration 3000: 0.051400
Cost after iteration 4000: 0.047852
Cost after iteration 5000: 0.045233
Cost after iteration 6000: 0.043007
Cost after iteration 7000: 0.040994
Cost after iteration 8000: 0.039065
Cost after iteration 9000: 0.037014
Cost after iteration 10000: 0.034888
Cost after iteration 11000: 0.032920
Cost after iteration 12000: 0.031132
Cost after iteration 13000: 0.029489
Cost after iteration 14000: 0.027964
Cost after iteration 15000: 0.026537
Cost after iteration 16000: 0.025186
Cost after iteration 17000: 0.023891
Cost after iteration 18000: 0.022635
Cost after iteration 19000: 0.021406


In [27]:
# Print accuracy
predictions = predict(parameters, X)
print(predictions)
print( X.shape, predictions.shape)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

[[1 0 1 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 1 1
  0 1 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1
  1 0 1 0 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
  0 0 0 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0
  1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 1
  1 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0
  1 0 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1
  1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 0
  0 1 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 1 0
  1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0
  0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 1 0 0 0
  0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 0 0
  1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 0 0 0 

**Optional questions**:

Some optional/ungraded questions that you can explore if you wish: 
- What happens when you change the tanh activation for a sigmoid activation or a ReLU activation?
- Play with the learning_rate. What happens?
- What if we change the dataset? (See part 5 below!)

In [40]:
from sklearn.datasets import fetch_olivetti_faces
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import random
import math

data = fetch_olivetti_faces()

X = data.data
Y = data.target
n_train = math.ceil(len(X)*60/100)

train_indices = random.sample(range(0, len(X)), n_train)
test_indices = [ i for i in range(0, len(X)) if i not in train_indices ]

model = LogisticRegression()
model.fit(X[train_indices],Y[train_indices])
print("LogisticRegression", model.score(X[test_indices],Y[test_indices])*100)

classifier = OneVsRestClassifier(LinearSVC(random_state=0))
classifier.fit(X[train_indices],Y[train_indices])
print("OneVsRestClassifier", classifier.score(X[test_indices],Y[test_indices])*100 )

nn_parmeters = nn_model(X.T, Y.reshape(1, Y.shape[0]), n_h = 4, num_iterations = 20000, print_cost=False)
predictions = predict(nn_parmeters, X.T)

print("", accuracy_score(Y, predictions.reshape(Y.shape[0])) )

LogisticRegression 95.0
OneVsRestClassifier 95.625




 0.025
