
# Building a NN using numpy

A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x using Euclidean error.

The model that we want to build has the following structure:
$$\hat{y}(x) = \text{relu}(x.w_1).w_2,$$
where $x$ and $y$ are the input and output features (of dimension 1000 and 10, respectively). Here the relu activation function is used and $w_1$ and $w_2$ are weight matrices.

This implementation uses numpy to manually compute the forward pass, loss, and
backward pass. A numpy array is a generic n-dimensional array; it does not know anything about
deep learning or gradients or computational graphs, and is just a way to perform
generic numeric computations.

This example is adapted from: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

### Forward and backward pass

Given the forward pass: $x \rightarrow h= x.w_1 \rightarrow \hat{y} = \text{relu}(h).w_2$ and the following cost and loss functions:
* Cost: $E(W) = \sum_{i=1}^N (\hat{y} - y)^2$
* $\text{loss}: \ell(\hat{y},W) = (\hat{y} - y)^2,$

calculate (analytically) the derivatives of loss function $\frac{\partial \ell}{\partial w_2}$ and $\frac{\partial \ell}{\partial w_1}$.

### Initialisation of data and weights

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

N = 64      # N: input batch size
D_in = 1000 # D_in: input dimension
H = 100     # H: hidden layer dimension;
D_out = 10  # D_out: output dimension

# Create random input and output (target) training data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights (no bias terms)
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6

### First let's look at the dimension of data and weights matrices
For this use e.g. print(x.shape)

### Forward pass: compute predicted y

We want to compute: $\hat{y} = \text{relu}(x.w_1).w_2$, where the relu activation function is used.

For this calculate (be careful of the matrix dimensions):
* h = dot product of x and w1 (use .dot() function)
* h_relu: $\text{relu}(h)$ relu activation function (using np.maximum() function)
* compute $\hat{y}$

In [3]:
h = x.dot(w1)       # .dot() = matrix multiplication               
h_relu = np.maximum(# FILL HERE #)    
y_pred = h_relu.dot(# FILL HERE #)      

### Calculate and print the cost function
* Cost: $E(w_1,w_2) = \sum_{i=1}^N (\hat{y_i} - y_i)^2$
* $\text{loss}: \ell(\hat{y},W) = (\hat{y} - y)^2.$


### Backward pass
Compute gradients of $w_1$ and $w_2$ with respect to loss.

Beware of matrices dimensions !

Hint: use .dot() for matrix multiplication and .T to take the transpose of a matrix

In [None]:
grad_y_pred =   # FILL HERE #
grad_w2 =       # FILL HERE #
grad_h_relu =   # FILL HERE #
grad_h = grad_h_relu.copy()
grad_h[# CONDITION #] = # FILL VALUE #   # Here we use this trick to change grad_h values when a condition is realized
grad_w1 =       # FILL HERE #

### Update weights $w_1$ and $w_2$

### Repeat procedure 500 times

In [7]:
niteration = 500
for t in range(niteration):
    # COPY ABOVE CODE HERE


### Plot model performance (optional)
Using matplotlib plot the evolution of cost as a function of the number of iterations