# Neural Networks 
In the first part of this exercise we will be implementing feedforward propagation as we did in the second part of the previous exercise. Let's first load in the data set

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat
from scipy import optimize

In [2]:
data = loadmat('ex4data1.mat')
X = data['X']
X = np.insert(X, 0, 1, axis=1) # insert column of 1s into X to account for bias
y = data['y']
y[y == 10] = 0 # replace all of the 10s with 0s
display(X.shape, y.shape)

(5000, 401)

(5000, 1)

Just like the previous exercise, we are provided the initial weights. There are two sets of weights, meaning that there are a total of three layers in our neural network. Let's load these weights in. 

In [3]:
weights = loadmat('ex4weights.mat')
theta1 = weights['Theta1']
theta2 = weights['Theta2']
display(theta1.shape, theta2.shape)

(25, 401)

(10, 26)

Now, we will implement the cost function and gradient for the neural network. The function `feedForward` is used as a helper function to get the predicted output.

In [4]:
def sigmoid(x):
    return 1/(1 + np.e**(-x))

In [27]:
def feedForward(theta1, theta2, X):
    input = X.T
    second_layer = sigmoid(theta1 @ input)
    second_layer = np.insert(second_layer, 0, 1, axis=0)
    output = sigmoid(theta2 @ second_layer)
    return np.roll(output.T, 1, axis=1) # use np.roll since 0 is treated as 10 in MATLAB

In [44]:
def cost(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam):
    m = np.size(X, 0)
    theta1 = np.reshape(nn_params[:hidden_layer_size*(input_layer_size+1)], (hidden_layer_size, input_layer_size+1))
    theta2 = np.reshape(nn_params[hidden_layer_size*(input_layer_size+1):], (num_labels, hidden_layer_size+1))
    
    h = feedForward(theta1, theta2, X)
    
    J = 0
    for k in range(num_labels):
        y_class = ((y == k).astype(int)).T
        J += (y_class*np.log(h[:,k])+(1-y_class)*np.log(1-h[:,k])).sum()
    
    theta1 = np.delete(theta1, 0, 1) # remove bias term
    theta2 = np.delete(theta2, 0, 1) # remove bias term
    reg = (regParam/(2*m))*((theta1**2).sum()+(theta2**2).sum())

    return -J/m + reg

In order to test out our cost function, we need to do a little bit of initialization first.

In [49]:
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])
input_layer_size = 400 # 20x20 matrix of pixels 
hidden_layer_size = 25 # 25 hidden layer units 
num_labels = 10 # 10 output units
regParam = 0

In [50]:
cost(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam)

0.2876291651613189

Our cost function correctly gives a 