# Neural Networks

In this part of the exercise, you will implement a neural network to recognize handwritten digits using the MNIST dataset. The neural network will be able to represent complex models that form non-linear hypotheses.

## Dataset

You are given a data set in [`data_nn.npy`](https://goo.gl/GhGLAQ) that contains 5000 training examples of handwritten digits (This is a subset of the [MNIST](http://yann.lecun.com/exdb/mnist/) handwritten digit dataset. The `.npy` format means that that the data has been saved in a native Numpy matrix format, instead of a text (ASCII) format like a csv-file. These matrices can be read directly into your program by using the load command. After loading, matrices of the correct dimensions and values will appear in your program's memory.

There are 5000 training examples in `data_nn.npy`, where each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is "unrolled" into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix `X`. This gives us a 5000 by 400 matrix `X` where every row is a training example for a handwritten digit image.

The second part of the training set is a 5000-dimensional vector `y` that contains labels for the training set.

In [None]:
# Load dataset into matrices X and y
import numpy as np

def load_dataset(filedata):
    """ Load the content from data_nn.npy """
    data = np.load(filedata).item()
    X = data['X']
    y = data['y']
    return X, y

X, y = load_dataset('data/data_nn.npy')
X = np.matrix(X)
y = np.matrix(y)
print(X.shape)
print(y.shape)

## Preprocessing labels

As classification predicts scores for each class, first we have to convert the value of each class into a one-hot-encoding vector containing the value of the class. Thus, for a class `0` in our dataset, we represent it as one hot vector:

```
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
```

Class `1` in our dataset is represented as:


```
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
```

and so on.

In [None]:
# Convert labels into one hot encoding
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False) # sparse=False return an array
y_onehot = encoder.fit_transform(y)
y_onehot = np.matrix(y_onehot)
y_onehot

## Visualizing the data

First thing you should do, is visualize the data loaded into `X`. To do so, select randomly 100 rows from `X` and map each row to a 20 pixel by 20 pixel grayscale image and display the images together. The resulting image should be like:

<img src='data/mnist.png' width="40%"/>

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

def display_image(X):
    """ From matrix X, display 100 random images into a single figure"""
    # YOUR CODE HERE
    pass

display_image(X)

## Model representation

Our neural network is shown in the image below. It has 3 layers - an input layer, a hidden layer and an output layer. Recall that our inputs are pixel values of digit images. Since the images are of size 20×20, this gives us 400 input layer units (excluding the extra bias unit which always outputs +1). 

You have been provided with a set of network parameters ($\theta^{(1)}$ , $\theta^{(2)}$) already trained. These are stored in [`weights_nn.npy`](https://goo.gl/grDJAS) and will be loaded into `theta1` and `theta2`. The parameters have dimensions that are sized for a neural network with 25 units in the second layer and 10 output units (corresponding to the 10 digit classes).

<img src="data/neuralnet.png" width="40%"/>

In [None]:
def load_weights(filedata):
    """ Load the content from data_nn.npy """
    data = np.load(filedata).item()
    theta1 = data['theta1']
    theta2 = data['theta2']
    return theta1, theta2

theta1, theta2 = load_weights('data/weights_nn.npy')
theta1 = np.matrix(theta1)
theta2 = np.matrix(theta2)
print(theta1.shape)
print(theta2.shape)

## Feedforward Propagation and Prediction

Now you will implement feedforward propagation for the neural network. You should implement the feedforward computation that computes $h_\theta(x^{(i)})$ for every example `i` and returns the associated predictions, where the prediction from the neural network will be the label that has the largest output $(h_\theta(x))_k$.

**Implementation Note**: The matrix `X` contains the examples in rows, then you will need to add the column of 1's to the matrix. The matrices `Theta1` and `Theta2` contain the parameters for each unit in rows. Specifically, the first row of `Theta1` corresponds to the first hidden unit in the second layer. When you compute $z^{(2)} = \theta^{(1)}a^{(1)}$, be sure that you index (and if necessary, transpose) `X` correctly so that you get a (`l`) as a column vector.

If your predict function using the loaded set of parameters for `Theta1` and `Theta2` is correct, you should see that the accuracy is about 97.5%.

In [None]:
def sigmoid(z):
    """ Activation function """
    # YOUR CODE HERE
    pass

def forward(X, theta1, theta2):
    """ Apply the forward propagation """
    # YOUR CODE HERE
    pass

In [None]:
# Test the forward function
a1, z2, a2, z3, h = forward(X, theta1, theta2)
h.shape

In [None]:
y_pred = np.array(np.argmax(h, axis=1)+1)
correct = [1 if a == b else 0 for (a, b) in zip(y_pred, y)]
accuracy = (sum(map(int, correct)) / float(len(correct)))

print('Accuracy with loaded weights must be 97.52%')
print('Predicted accuracy = %.2f' % (accuracy * 100))

## Cost Function

You should write a vectorized version of the cost function. The cost function is

$$J(\theta) = \frac{1}{m} \sum_{i=1}^{m} [-y^{(i)}log(h_\theta(x^{(i)})-(1 - y^{(i)})log(1 - h_\theta(x^{(i)}))]$$


In [None]:
def cost_function(X, y, theta1, theta2):
    """ Verify the cost using theta1 and theta2 """
    # YOUR CODE HERE
    pass

In [None]:
# Test cost function
J = cost_function(X, y_onehot, theta1, theta2)
print('Your current cost should be: 0.287629')
print('Current cost : %f' % J)

In [1]:
# initial setup :: set constants
NB_HIDDEN = 25
LEARNING_RATE = 1.0