# Deep Neural Network Implementation
The following Jupyter Notebook is an implementation of a Deep Neural Network using Python. The architecture that we will use is as follows:

![Neural Network Architecture](nn_full.svg)

We will have two inputs, and two hidden layers of 5 neurons each with a ReLu activation function, at the output we will have just one neuron with a sigmoid activation function. 

## Load library dependencies
First we load our library dependencies, in this case we use three python libraries:
 * NumPy
 * Scikit-learn
 * Matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt

We will use the [moons dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html) which is a simple toy dataset to visualize clustering and classification algorithms with the `get_dataset` function from the `dataset` library. 

In [None]:
from dataset import get_dataset

In [None]:
random_state = 123
# Get training and test data from moons dataset
X, Y, X_val, Y_val, X_test, Y_test = get_dataset(random_state=random_state)

## Forward Pass Functions
In this part we will define the forward pass functions needed for our neural network.

As a first step, we need to create our two activation functions:
* Sigmoid
$$f(x) = \frac{1}{1+e^{-x}}$$
* ReLu
$$f(x) = \max{(0, x)}$$

In [None]:
def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    cache = Z
    
    return A, cache

def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    
    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache

Now we need to define the `linear_forward` function, which performs the following calculation in the forward pass:
$$ Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$$

In [None]:
def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter
    cache -- a python tuple containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """
    
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    
    assert (Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)

    return Z, cache

The `linear_activation_forward` function performs the following two steps:
 * the `linear_forward` calculation and then, 
 * the activation function calculation, which can be a ReLu or sigmoid in our case.

In [None]:
def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation
    to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value
    cache -- a python tuple containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """

    if activation == "sigmoid":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ## START WRITING CODE HERE

        
        ## FINISH WRITING CODE

    elif activation == "relu":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ## START WRITING CODE HERE

        
        ## FINISH WRITING CODE

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

Finally, for the forward pass, we need to define the cost function. Our cost function is given by the [Cross Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression), therefore remember to average your result at the end.

In [None]:
def compute_cost(AL, Y):
    """
    Implement the cost function defined by
    https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """

    m = Y.shape[1]

    # Compute loss from AL and Y.
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE

    return cost

## Backward Pass Functions
For the backward propagation part, we first need to define the derivatives for both the sigmoid and ReLu functions, and with this implement the backward propagation of the input `dA` to obtain the output `dZ`.

The equations for de derivatives are as follows:
* Sigmoid
$$ \frac{\partial f(x)}{\partial x} = \frac{1}{1+e^{-x}}\left(1-\frac{1}{1+e^{-x}}\right)$$

* ReLu
$$ \frac{\partial f(x)}{\partial x} = \begin{cases}
        0 & \text{if } x \lt 0\\
        1 & \text{if } x \geq 0
    \end{cases}$$

In [None]:
def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    
    ## START WRITING CODE HERE

    
    ## FINISH WRITING CODE
    
    assert (dZ.shape == Z.shape)
    
    return dZ


def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well.
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    
    assert (dZ.shape == Z.shape)
    
    return dZ



In the `linear_backward` function, you will need to implement the linear portion of backward propagation for a single layer.

In [None]:
def linear_backward(dZ, cache):
    """
    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]

    ## START WRITING CODE HERE

    
    ## FINISH WRITING CODE

    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)

    return dA_prev, dW, db

For the `linear_activation_backward` you will implement the backward pass of linear layer and the derivative of the activation function. Depending on the activation function, you will need to implement different pieces of code.

In [None]:
def linear_activation_backward(dA, cache, activation):
    """
    Implement the backward propagation for the LINEAR->ACTIVATION layer.

    Arguments:
    dA -- post-activation gradient for current layer l
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache

    if activation == "relu":
        ## START WRITING CODE HERE

        
        ## FINISH WRITING CODE

    elif activation == "sigmoid":
        ## START WRITING CODE HERE

        
        ## FINISH WRITING CODE

    return dA_prev, dW, db

Finally you will have to create a function called `cost_gradient` that will calculate the gradient of the cost.

In [None]:
def cost_gradient(AL, Y):
    """
    Implement the gradient of the cost function defined 
    
    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    dAL -- gradient of cross-entropy cost
    """
    m = Y.shape[1]
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    return dAL

## Initialize Parameters
We will implement an auxiliary function called `initialize_parameters_deep` which will take a list of values containing the dimensions of each layer in our network. This function will return a dictionary of `parameters` where each key is either a weight or a bias initialized accordingly to a random value or zero.

In [None]:
def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """

    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)  # number of layers in the network

    ## START WRITING CODE HERE

    
    ## FINISH WRITING CODE

        assert (parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert (parameters['b' + str(l)].shape == (layer_dims[l], 1))

    return parameters

## Putting all Together
After defining all the necesary blocks to perform forward and backward pass, we will implement three functions:
* `forward_propagation` which will calculate the forward pass of our neural network model
* `backward_propagation` which will be in charge of performing the backward calculation to obtain the gradients
* `update` which will update the new weights and bias according the the gradients

A function `accuracy` is provided to evaluate our model's performance.

In [None]:
def forward_propagation(X, parameters, inference=True):
    ## START WRITING CODE HERE

    
    ## FINISH WRITING CODE
    if(inference==True):
        return AL
    else:
        return AL, cache1, cache2, cache3


def backward_propagation(dAL, cache1, cache2, cache3):
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    return gradients


def update(parameters, gradients, learning_rate):
    ## START WRITING CODE HERE

    ## FINISH WRITING CODE
    return parameters


def accuracy(Y_pred, Y, threshold=0.5):
    """
    Function that calculates the accuracy of our model.
    
    Arguments:
    Y_pred -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
    threshold -- float number that sets the limit to select between true and false in our prediction. 

    Returns:
    accuracy -- ratio between correct and total samples
    
    """
    
    total_samples = Y.shape[1]
    correct = np.sum((Y_pred > threshold).astype('int') == Y)
    incorrect = total_samples - correct
    accuracy = correct / total_samples
    
    return accuracy

## Implement your Neural Network
Now you will implement your neural network. You can try to train it with different `epochs` and `learning_rate` values. You can also try to change the dimensions of your network!

In [None]:
# Hyperparameters


# Initialize parameters


# Keep track of accuracy and cost
epoch_train_accuracy = []
epoch_val_accuracy = []
total_cost = []


for j in range(epochs):

    ####### FORWARD PASS
    # Linear activations

    
    # Compute cost

    
    ####### BACKWARD PASS
    # Gradient of cost with respect of AL

    
    # Backpropagation

    
    ####### UPDATE

    
    
    ####### EVALUATE MODEL        
    # Test results on every iteration on training data

    
    
    # Test results on every iteration on validation data

    

    ####### TRACK COST
    # Calculate cost on every iteration

    

In [None]:
plt.plot(total_cost)
plt.title('Cost vs. epoch')
plt.ylabel('Cost')
plt.xlabel('epoch')
plt.grid('on')
plt.show()

## Analysis of our Results
In this part you will see the error analysis on our train and evaluation data.

In [None]:
# Error analysis
train_error = 1 - np.array(epoch_train_accuracy)
val_error = 1 - np.array(epoch_val_accuracy)
print('\nTraining information: ')
print('Min Error = {:.2f}%'.format(100 * train_error.min()))
print('Max Error = {:.2f}%'.format(100 * train_error.max()))

plt.plot(train_error * 100, label='train error')
plt.plot(val_error * 100, label='validation error')
plt.title('Error vs Training Epochs')
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.grid('on')
plt.legend()
plt.show()

In [None]:
# Test inference on test data
AL = forward_propagation(X_test, parameters)
test_accuracy = accuracy(AL, Y_test)

#correct_test, incorrect_test, epoch_inference_test, AL_test = inference(X_test, Y_test, parameters)
error_test = 1 - np.array(test_accuracy)
print('\nInference on test data: ')
print('Error = {:.2f}%'.format(100 * error_test))