# (Optional) Colab Setup
If you aren't using Colab, you can delete the following code cell. This is just to help students with mounting to Google Drive to access the other .py files and downloading the data, which is a little trickier on Colab than on your local machine using Jupyter. 

# Implement a Neural Network

This notebook contains testing code to help you develop a neural network by implementing the forward pass and backpropagation algorithm in the `models/neural_net.py` file. 

You will implement your network in the class `NeuralNetwork` inside the file `models/neural_net.py` to represent instances of the network. The network parameters are stored in the instance variable `self.params` where keys are string parameter names and values are numpy arrays.

In [84]:
import numpy as np

# from models.neural_net import NeuralNetwork
# Note: neural_net_solution folder does not exist. This is supposed to be a copy of the models/neural_net.py. This may help you to verify your implementation.
from neural_net import NeuralNetwork


# For auto-reloading external modules
# See http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """Returns relative error"""
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The cell below initializes a toy dataset and corresponding model which will allow you to check your forward and backward pass by using a numeric gradient check. Note that we set a random seed for repeatable experiments.

In [112]:
input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 15
optimizer = 'SGD'


def init_toy_model(num_layers):
    """Initializes a toy model"""
    np.random.seed(9)
    hidden_sizes = [hidden_size] * (num_layers - 1)
    return NeuralNetwork(input_size, hidden_sizes, num_classes, num_layers, optimizer)

def init_toy_data():
    """Initializes a toy dataset"""
    np.random.seed(9)
    X = np.random.randn(num_inputs, input_size)
    y = np.random.randn(num_inputs, num_classes)
    return X, y


# Implement forward and backward pass

The first thing you will do is implement the forward pass of your neural network. The forward pass should be implemented in the `forward` function. You can use helper functions like `linear`, `relu`, and `sigmoid` to help organize your code.

Next, you will implement the backward pass using the backpropagation algorithm. Backpropagation will compute the gradient of the loss with respect to the model parameters `W1`, `b1`, ... etc. Use a sigmoid fuction with mse loss for loss calcuation. Fill in the code blocks in `NeuralNetwork.backward`. 

# Gradient  check

If you have implemented your forward pass through the network correctly, you can use the following cell to debug your backward pass with a numeric gradient check. This function assumes that your gradient is divided by y.shape[0] * y.shape[1], where y is the first input to the backward function. You should use mse loss after the sigmoid layer for this gradient check. If your backward pass has been implemented correctly, the max relative error between your analytic solution and the numeric solution should be around 1e-7 or less for all parameters.


In [117]:
from copy import deepcopy

from utils.gradient_check import eval_numerical_gradient

X, y = init_toy_data()


def f(W):
    output = net.forward(X)
    return net.backward(y)

for num in [2, 3, 4, 5, 6, 7, 8, 9, 10]:
    net = init_toy_model(num)
    net.forward(X)
    output = deepcopy(net.outputs)
    net.backward(y)
    gradients = deepcopy(net.gradients)

    # for param_name in ["W1"]:
    for param_name in net.params:
        # print('Running numeric gradient check with %s' % param_name)
        
        param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
        # print("groud truth: ", (param_grad_num - gradients[param_name]) / gradients[param_name])
        # print((X - output["0"]) / output["0"])
        print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, gradients[param_name])))

W1 max relative error: 8.760971e-08
b1 max relative error: 6.866175e-10
W2 max relative error: 3.124941e-09
b2 max relative error: 1.813309e-10
W1 max relative error: 4.001882e-08
b1 max relative error: 9.167005e-09
W2 max relative error: 2.905011e-08
b2 max relative error: 8.754421e-10
W3 max relative error: 2.740952e-09
b3 max relative error: 3.852252e-10
W1 max relative error: 3.576200e-07
b1 max relative error: 2.963421e-09
W2 max relative error: 6.820597e-08
b2 max relative error: 4.624271e-09
W3 max relative error: 2.589016e-08
b3 max relative error: 2.023740e-09
W4 max relative error: 9.619314e-09
b4 max relative error: 2.223423e-10
W1 max relative error: 4.552642e-08
b1 max relative error: 8.146830e-09
W2 max relative error: 1.269329e-07
b2 max relative error: 1.590510e-09
W3 max relative error: 2.194944e-07
b3 max relative error: 3.565703e-09
W4 max relative error: 5.175667e-06
b4 max relative error: 4.836786e-09
W5 max relative error: 4.162575e-07
b5 max relative error: 1.013

In [120]:
X.shape


(15, 4)