# Neural Networks
In this exercise, we implement a neural network to recognize handwritten digits using the MNIST training set. 
The neural network will be able to represent complex models that form non-linear hypotheses. 
For this week, we will be using parameters from a neural network that we have already trained. 
The goal is to implement the feedforward propagation algorithm to use our weights for prediction.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import scipy

In [3]:
data1 = scipy.io.loadmat('./data/ex3data1.mat')
X1 = data1['X']
y1 = data1['y']
y1 = y1.T[0]      # Convert vertical vector to horizontal array
X1.shape, y1.shape

((5000, 400), (5000,))

## Model representation
This neural network has 3 layers – an input layer, a hidden layer and an output layer. 
Recall that our inputs are pixel values of digit images. Since the images are of size 20×20, this gives us 400 input layer units (excluding the extra bias unit which always outputs +1). 

We have been provided with a set of network parameters (Θ(1) ,Θ(2))
already trained by us. These are stored in `ex3weights.mat` and will be
loaded into `Theta1` and `Theta2`.
 
The parameters have dimensions that are sized for a neural network with 25 units in the second 
layer and 10 output units (corresponding to the 10 digit classes).

![Neural Network Model](./data/neuralnetwork.png)

In [4]:
weights = scipy.io.loadmat('./data/ex3weights.mat')
Theta1 = weights['Theta1']
Theta2 = weights['Theta2']
Theta1.shape, Theta2.shape

((25, 401), (10, 26))

## Feedforward Propagation and Prediction
Now we implement feedforward propagation for the neural network. 
The code in `predict` returns the neural network’s prediction.

We implement the feedforward computation that computes $ (h_\theta(x^{(i)}) )$ for every example $( i)$ and returns the associated predictions. Similar to the one-vs-all classification strategy, the prediction from the neural network will be the label that has the largest output, $( (h_\theta(x))_k )$. 

Finally, we call  `predict` function using the loaded set of parameters for `Theta1` and `Theta2` on all the data, and we calculate the accurasy of the model. We should see that the accuracy is about 97.5%. 

In [5]:
def activation(x, Theta):
    return 1 / (1 + np.exp(-Theta @ x))

def predict(x, Thetas=[Theta1, Theta2]):
    layer = x
    for Theta in Thetas:
        layer = np.hstack((1.0, layer.copy()))
        layer = activation(layer, Theta)
    return layer

In [6]:
n = 1000 # let's predict the 1000th element
print(predict(X1[n]))
print(np.round(predict(X1[n])))
y1[n], np.argmax(predict(X1[n])) + 1

[3.85613838e-04 9.68544083e-01 1.92134752e-03 1.38526834e-04
 3.20810992e-03 7.01713717e-04 6.45235792e-04 1.66993752e-02
 1.00700486e-01 3.25396474e-03]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]


(2, 2)

In [7]:
# Note that the NN is trained with 1 indexing, and zero digit mapping to 10
# Hence need to add one to our predicted indices
y_true = y1
y_hat = np.array([np.argmax(predict(X1[i]))+1 for i in range(len(X1))])

In [8]:
accuracy = sum(y_hat == y_true) / len(y_true)
accuracy

0.9752