# Classification with one hidden layer

Welcome to this case study of implementing a **2-Layer Neural Network(1-hidden layer) with numpy package.**

--------------------------------------------------------------------------------
You will do the following tasks in this case study.


- Implement a Binary classification neural network with a single hidden layer Neural Network.
- Implement the forward progation and backward propagation of network.
- Implement the non-linear activation functions


------------------------------------------------


**Note:**: Don't delete the instructions or any of the cells



### 1. Import the Packages

In [0]:
## Import the packages



import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
#from IPython.display import display, Math, Latex

np.random.seed(1) # set a seed so that the results are consistent
  






### 2. Loading and reshaping the Dataset ##

First, let's get the dataset you will work on. The following code will load a "Moon" 2-class dataset into variables `X` and `Y`.

In [0]:
#Load the Dataset
noisy_moons = sklearn.datasets.make_moons(n_samples=400, noise=.2)
X, Y = noisy_moons

"""
Reshape the dataset such that 
X has shape (2,400)
Y has shape (1,200)
"""
#START YOUR CODE HERE
X=
Y=
#END YOUR CODE HERE







### 3. Neural Network model



The general methodology to build a Neural Network is to:

    1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    2. Initialize the model's parameters
    3. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call `nn_model()`. Once you've built `nn_model()` and learnt the right parameters, you can make predictions on new data.

### 3. Defining the neural network structure ####


This function will help you in defining the following three variables:


- **n_x**: The size of the input layer (No. of neurons in Input layer representing the features of a single training example)
- **n_h**: The size of the hidden layer (No. of neurons in hidden layer)
- **n_y**: The size of the output layer(No. of Neurons representing the output classes)


Note: Refer the picture in question for more details






In [0]:
def layer_sizes(X, Y):
    """
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    # START CODE 
    n_x =
    n_y = 
    # END YOUR CODE
    n_h = 4
   
    return (n_x, n_h, n_y)

### 4. Initialize the weights and biases for the layers (hidden layer and output layer)####

In this function you will initialise the weights and biases for hidden layer and output layer.

```
W1,b1 - Weight and Bias of Hidden Layer
W2,b2 - weight and Bias of Output Layer
```
NOTE:

1. Weights has to be initialised with random values.`np.random.randn(rows,columns) * 0.01`
2. Biases has to be initialised with  numpy Zeros `np.zeros((rows,columns))`



In [0]:

def initialize_parameters(n_x, n_h, n_y):
    """
    Input:
    Layer sizes

    Output:
    parameters -- dictionary containing W1,b1,W2,b2
                    
    """
    
    #Please don't remove this line of code
    np.random.seed(2) 
    
    

    #Initialise the parameters
    '''
    Hidden Layer 
    W1 -- weight matrix of shape (n_h, n_x)
    b1 -- bias vector of shape (n_h, 1)

    Output Layer
    W2 -- weight matrix of shape (n_y, n_h)
    b2 -- bias vector of shape (n_y, 1)
    ''''
    #START CODE HERE
    W1 = 
    b1 = 
    W2 = 
    b2 = 
    #END CODE HERE
    
    #Adding the parameters to dictionary    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

### 5. Define the Non-Linear Activation Function 
```
- Activation Function for Hidden layer : Sigmoid
- Activation Function for Output layer : tanh (Numpy package: `np.tanh())
```
Let's define the sigmoid function

Given an input `x`, `sigmoid(x)` can be calulated with the formula $\frac{1}{1+e^{-x}}$






In [0]:
#Sigmoid activation function
def sigmoid(x):
    """
    Input:
    x -- An array of any size.
    Return:
    y -- sigmoid(x)
    """

    #START YOUR CODE

    y = 

    #END YOUR CODE


    return y

### 6. Implement the Forward Propagation

In this function you will have to compute the values of 

$Z^{[1]},A^{[1]},Z^{[2]},A^{[2]}$

and return 

- The  activation ouput of ouput layer A2
- A dictionary which stores the values of Z1,A1,Z2,A2



**Formulae:**

1. $Z^{[1]}=W^{[1]}*X+b^{[1]}$
2. $A^{[1]}=tan(Z^{[1]})$

3. $Z^{[2]}=W^{[2]}*A^{[1]}+b^{[2]}$
4. $A^{[2]}=sigmoid(Z^{[2]})$


*Note : The superscript value denotes the layer number*

*Hint : Use the function np.dot() for multiply the matrices*



In [0]:
#Implement the Forward Propagation

def forward_propagation(X, parameters):
    """
    Input 
    X : Input of shape(2,400)
    parameters -- Dictionary containing the weights and biases
    
    Output:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing the values of "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve the weights and biases from the dictionary parameters
    # START CODE HERE 
    W1 = 
    b1 = 
    W2 = 
    b2 = 
    #END CODE HERE
    
    # Calculate the following 
    #Hint : Use the function np.dot() for multiply the matrices
    # START CODE HERE
    Z1 = 
    A1 = 
    Z2 = 
    A2 = 
    ### END CODE HERE ###
    
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache



### 7. Compute the cost

In this function you will compute the cost with the formula
 $$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small$$

 Here 
 
 - $a^{[2](i)}$ denotes the predicted output of training example `i`
 - $y^{(i)}$ denotes the true labels of training example `i`
 - $m$ denotes the total number of training examples


**Note:** Here `i` denotes the number of training example

**Hint**:
Use np.sum() and np.multiply()







In [0]:
# compute_cost

def compute_cost(A2, Y, parameters):
    """
    
    Input:
    A2 : The sigmoid output of the second activation (Predicted labels)
    Y : True labels
    parameters -- Dictionary containing weights and biases W1, b1, W2 and b2
    
    Returns:
    cost 
    """
    
    #-----------------Assign the total number of examples to variable "m"--------------
    #START YOUR CODE
    m = 

    #END YOUR CODE


    
       
    #-------------------Compute the cost-------------------------------
    #START YOUR CODE
    cost = 
    #END CODE HERE
    
    #--------------make sure to check the Datatype-------------  
    cost = np.squeeze(cost)            
    assert(isinstance(cost, float))
    
    return cost

### 8. Implement the Backward Propagation

In this function you will be finding the derivatives (gradients) to update the parameters



Formulae:

$dZ^{[2]}=A^{[2]}-Y$

$dW^{[2]}=\frac{1}{m}dZ^{[2]}A^{[1]T}$

$db^{[2]} = \frac{1}{m}np.sum(dZ^{[2]}, axis=1, keepdims=True)$

$dz^{[1]}=W^{[2]T}dZ^{[2]}* (1-(A^{[1]})^{2})$


$dW^{[1]}=\frac{1}{m}dZ^{[1]}X^{T}$

$db^{[1]} = \frac{1}{m}np.sum(dZ^{[1]}, axis=1, keepdims=True)$



In [0]:
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
       
    Input:
    parameters - Dictionary containing weights and biases
    cache - Dictionary containing Z1,A1,Z2,A2
    X -- Input data 
    Y -- true labels 
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    #----------Assign the number of training examples------------
    m = 


    
    #-------Retrieve W1 and W2--------------------"
    #START YOUR CODE
    W1 =
    W2 =
    #END YOUR CODE

        
    # ------------Retrieve A1 and A2 --------------.
    #START YOUR CODE
    A1 = 
    A2 = 
    #END YOUR CODE
    
    # ----------calculate dW1, db1, dW2, db2-----------------
    #START CODE HERE 
    dZ2= 
    dW2 = 
    db2 = 
    dZ1 = 
    dW1 = 
    db1 = 
    #END CODE HERE
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

### 9. Update the weights and biases


**Update rule:**

If W is the parameter to be updated then

$W=W-\alpha*dW$

where

- $\alpha$ is the learning rate

- $dW$ is the derivative w.r to cost J (i.e)$\frac{dJ}{dW}$

In [0]:

def update_parameters(parameters, grads, learning_rate=1.2):
    """
  
    
    Input:
    parameters : python dictionary containing your parameters 
    grads: python dictionary containing your gradients 
    
    Output:
    parameters: dictionary of updated parameters
    """
    
    
    # Update 
    ### START CODE HERE ### (â‰ˆ 4 lines of code)
    W1 = 
    b1 = 
    W2 = 
    b2 = 
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

### 10. Combine them all in right order



In [0]:
# GRADED FUNCTION: nn_model

def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False):
    """
        
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)

    #-----------------Call "layer_sizes" function to assign the values----------------------
    n_x = 
    n_y =
    
    # ----------------Call "initialize_parameters" functions to assign the weights and biases---------
    #START CODE
    
    W1 = 
    b1 = 
    W2 = 
    b2 = 
    #END CODE
    

    for i in range(0, num_iterations):
         
        #---------------START CODE HERE ----------------------


        # -----------Call forward_propagation function-------
        A2, cache = 
        
        # ------------Call compute_cost function----------
        cost = 
 
        # -----------Call backward_propagation function-----------
        grads = 
 
        # --------Call update_parameters function---------
        parameters =
        
        ### END CODE HERE ###
        
        # Print the cost every 500 iterations
        if print_cost and i % 500 == 0:
            print ("Cost after iteration %i: %f" % (i, cost))

    return parameters

### Run the below cells to check the accuracy of predictions made



In [0]:

def predict(parameters, X):
       
    
    ### START CODE HERE 
    A2, cache = forward_propagation(X, parameters)
    predictions = np.round(A2)
    ### END CODE HERE 
    
    return predictions


parameters = nn_model(X, Y, n_h = 4, num_iterations=10000, print_cost=True)
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
