# Neural Networks 
In the first part of this exercise we will be implementing feedforward propagation as we did in the second part of the previous exercise. Let's first load in the data set

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat
from scipy import optimize

In [60]:
data = loadmat('ex4data1.mat')
X = data['X']
X = np.insert(X, 0, 1, axis=1) # insert column of 1s into X to account for bias
y = data['y']
y[y == 10] = 0 # replace all of the 10s with 0s
display(X.shape, y.shape)
y

(5000, 401)

(5000, 1)

array([[0],
       [0],
       [0],
       ...,
       [9],
       [9],
       [9]], dtype=uint8)

Just like the previous exercise, we are provided the initial weights. There are two sets of weights, meaning that there are a total of three layers in our neural network. Let's load these weights in. 

In [3]:
weights = loadmat('ex4weights.mat')
theta1 = weights['Theta1']
theta2 = weights['Theta2']
display(theta1.shape, theta2.shape)

(25, 401)

(10, 26)

Now, we will implement the cost function and gradient for the neural network. The function `feedForward` is used as a helper function to get the predicted output.

In [4]:
def sigmoid(x):
    return 1/(1 + np.e**(-x))

In [5]:
def feedForward(theta1, theta2, X):
    input = X.T
    second_layer = sigmoid(theta1 @ input)
    second_layer = np.insert(second_layer, 0, 1, axis=0)
    output = sigmoid(theta2 @ second_layer)
    return np.roll(output.T, 1, axis=1) # use np.roll since 0 is treated as 10 in MATLAB

In [6]:
def cost(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam):
    m = np.size(X, 0)
    theta1 = np.reshape(nn_params[:hidden_layer_size*(input_layer_size+1)], (hidden_layer_size, input_layer_size+1))
    theta2 = np.reshape(nn_params[hidden_layer_size*(input_layer_size+1):], (num_labels, hidden_layer_size+1))
    
    h = feedForward(theta1, theta2, X)
    
    J = 0
    for k in range(num_labels):
        y_class = ((y == k).astype(int)).T
        J += (y_class*np.log(h[:,k])+(1-y_class)*np.log(1-h[:,k])).sum()
    
    theta1 = np.delete(theta1, 0, 1) # remove bias term
    theta2 = np.delete(theta2, 0, 1) # remove bias term
    reg = (regParam/(2*m))*((theta1**2).sum()+(theta2**2).sum())

    return -J/m + reg

In order to test out our cost function, we need to do a little bit of initialization first.

In [7]:
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])
input_layer_size = 400 # 20x20 matrix of pixels 
hidden_layer_size = 25 # 25 hidden layer units 
num_labels = 10 # 10 output units
regParam = 0

In [8]:
cost(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam)

0.2876291651613189

Our cost function correctly gives a correct cost of about 0.287629.

Time to move onto implementing the steps necessary for backpropagation!

# Backpropagation

In order to implement the backpropagation algorithm, we have to implement the gradient for the neural network. Once we have computed the gradient, we will be able to train the neural network by minimizing the cost function $J(\Theta)$ using an advanced optimizer.

To get started, we will first have to implement the sigmoid gradient function. This function will be used when we are computing the "error" at each layer.

The gradient for the sigmoid function is as follows: 

$$ g'(z) = \frac{d}{dz}g(z) = g(z)(1-g(z)) $$

In [9]:
def sigmoidGrad(z):
    return sigmoid(z)*(1-sigmoid(z))

One last step we need to do before we start implementing the backpropagation algorithm is to do some intialization. In particular we need to randomely initialize the weights for $\Theta$, which will be in the range $[-\epsilon_{init}, \epsilon_{init}]$. Random initialization is important for symmetry breaking, if all of the initial values are the same the algorithm won't improve on each iteration.

In [10]:
epsilon_init = 0.12

# L_in is the number of incoming connections
# L_out is the number of outgoing connections
def randInitWeights(L_in, L_out): 
    W = np.random.rand(L_out, L_in+1)
    W = W * 2 * epsilon_init
    W = W - epsilon_init
    return W

theta1 = randInitWeights(input_layer_size, hidden_layer_size)
theta2 = randInitWeights(hidden_layer_size, num_labels)
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()]) # unroll the parameters

Alright, our time has finally come, time to implement the backpropagation algorithm!

In [108]:
def backProp(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam):
    m = np.size(X, 0)
    theta1 = np.reshape(nn_params[:hidden_layer_size*(input_layer_size+1)], (hidden_layer_size, input_layer_size+1))
    theta2 = np.reshape(nn_params[hidden_layer_size*(input_layer_size+1):], (num_labels, hidden_layer_size+1))
    D1 = np.zeros(hidden_layer_size, input_layer_size+1)
    D2 = np.zeros(num_labels, hidden_layer_size+1)
    
    # loop through every training example
    for t in range(m): 
        a1 = X[t,:]
        z2 = theta1 @ a1
        a2 = sigmoid(z2)
        a2 = np.insert(a2, 0, 1, axis=0)
        z3 = theta2 @ a2
        a3 = sigmoid(z3)
        
        yt = np.zeros(num_labels)
        for k in range(num_labels):
            if(k == y[t][0]):
                yt[k] = 1
            else:
                yt[k] = 0
        d3 = a3 - yt

        z2 = np.insert(z2, 0, 1, axis=0)
        d2 = (theta2.T @ d3)*sigmoidGrad(z2)
        
        
        
    t = 0
    a1 = X[t,:]
    display(a1.shape)
    z2 = theta1 @ a1
    a2 = sigmoid(z2)
    a2 = np.insert(a2, 0, 1, axis=0)
    z3 = theta2 @ a2
    a3 = sigmoid(z3)
    yt = np.zeros(num_labels)
    for k in range(num_labels):
        if(k == y[t][0]):
            yt[k] = 1
        else:
            yt[k] = 0
    d3 = a3 - yt
    
    z2 = np.insert(z2, 0, 1, axis=0)
    d2 = (theta2.T @ d3)*sigmoidGrad(z2)
    
    D1 = D1 + (d2[1:] @ a1.T)
    return D1

In [109]:
result = backProp(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, regParam)
result

(401,)

array([ 6.05985714e-03,  1.16142137e-02,  8.42203200e-03, -3.12018664e-02,
       -7.18254844e-03,  1.33327371e-02,  4.81154560e-02,  3.60730908e-02,
        2.28414718e-02,  2.12211484e-02,  3.05915495e-02, -3.60848244e-02,
       -2.42928066e-02, -6.08275021e-02, -2.35944417e-02,  3.75753882e-02,
        3.17463613e-02,  1.55713532e-02,  1.13364609e-05, -2.48798623e-02,
        1.45048585e-02,  2.55911104e-02,  4.54155638e-03, -5.42448479e-03,
        1.17954599e-02,  9.51671065e-05])

UnboundLocalError: local variable 'D1' referenced before assignment