# 10/1: Neural Networks

Hi everyone! In this notebook we'll be looking at Neural Networks and their applications in Machine Learning and Natural Language Processing

To complete this notebook, we have the following methods for you to complete:

1. `sigmoid()`
2. `sigmoid_derivative()`


We'll start by building neural networks to decide if two boolean inputs are equivalent, and then extend it to recognizing handwritten digits!

In [131]:
import numpy as np

## Part 1: Sigmoid and the Derivative of Sigmoid

The sigmoid function and its derivative are essential for the functionality of neural networks. To start, complete the function `sigmoid()`

Recall from a couple of weeks ago that the formula for `sigmoid()` is:

<img src="https://latex.codecogs.com/gif.latex?\dpi{150}&space;\large&space;\frac{1}{1&space;&plus;&space;e^{-x}}" title="\large \frac{1}{1 + e^{-x}}" />

In [132]:
def sigmoid(x):
    # calculate the output of the sigmoid function and return it
    sigmoid_val = 1 / (1 + np.exp(-x))
    return sigmoid_val

In [133]:
# run this cell to test sigmoid
if (sigmoid(0) == 0.5 and sigmoid(3.1415) == 0.9585724885979936):
    print("Correct value for sigmoid!")
else:
    print("I think you gotta fix that pal")

Correct value for sigmoid!


Next we'll calculate the derivative of `sigmoid()` in the function `sigmoid_derivative()`. The derivative of `sigmoid()` is:

<img src="https://latex.codecogs.com/gif.latex?\dpi{150}&space;\large&space;\newline&space;=&space;\frac{d}{dx}\frac{1}{1&space;&plus;&space;e^{-x}}&space;\newline&space;\newline&space;=&space;-1&space;(1&space;&plus;&space;e^{-x})^{-2}&space;\newline&space;\newline&space;=&space;sigmoid(x)&space;*&space;(1&space;-&space;sigmoid(x))" title="\large \newline = \frac{d}{dx}\frac{1}{1 + e^{-x}} \newline \newline = -1 (1 + e^{-x})^{-2} \newline \newline = sigmoid(x) * (1 - sigmoid(x))" />

In [134]:
def sigmoid_derivative(x):
    # calculate the derivative and return the value
    sigmoid_deriv = np.multiply(sigmoid(x), 1 - sigmoid(x))
    return sigmoid_deriv

In [135]:
# run this cell to check your derivative of sigmoid
if (sigmoid_derivative(0) == 0.25 and sigmoid_derivative(3.1415) == 0.03971127270104303):
    print("Good job!")
else:
    print("bad bad job >:(")

Good job!


## Part 2: Creating our Neural Network Methods

Our neural network system depends on two main functions:
1. `Forward Propogation`
2. `Back Propogation`

We'll start by implementing `forward_prop()` for our neural network

You will need to complete the following steps in the method:

1. Set `ones_col` equal to the correctly-sized matrix of ones
2. Calculate `pred_val` from `formatted_inputs` and `theta`
3. Set `curr_inputs` to the `sigmoid()` of `pred_val`

In [136]:
def forward_prop(inputs, thetas, m):
    # declare the values we need
    outputs = []
    curr_inputs = inputs
    ones_col = np.ones((m, 1))
    for theta in thetas:
        # format the inputs by adding the column of ones
        formatted_inputs = np.hstack((ones_col, curr_inputs))
        # calculate the predicted value, and append it to the list of outputs
        pred_val = formatted_inputs @ theta.T
        outputs.append(pred_val)
        # set curr_inputs to the the sigmoid of our predicted value
        curr_inputs = sigmoid(pred_val)
    # return our list of outputs
    return outputs

In [137]:
# run this cell to test your implementation of forward_prop
np.random.seed(123456789)
test_thetas = [np.random.random((2, 3)), np.random.random((1, 3))]
test_inputs = np.random.random((4, 2))

test_forward_prop = forward_prop(test_inputs, test_thetas, 4)
correct_forward_prop_val = [np.array([[1.24524706, 1.42359831], [1.34984787, 1.52210102], [0.58694045, 0.74140142], [0.66236153, 0.78210622]]), np.array([[1.54454925], [1.55730176], [1.43758967], [1.44616478]])]

if not (test_forward_prop[0] - correct_forward_prop_val[0]).all():
    print("Your first layer values are incorrect, sport")
elif not (test_forward_prop[1] - correct_forward_prop_val[1]).all():
    print("Your second layer values are incorrect, ace")
else:
    print("You did it, champ!")

You did it, champ!


The next method, `back_prop()` is an absolute banger, but easily the most important for neural networks to function. This method will take the derivative of each layer and then compute the gradients for each set of thetas

You will need to complete the following steps:
1. Calculate `diff3`, the difference of the `sigmoid()` of the last entry of `y_predictions` and `y_actual`
2. Calculate `diff2`, the multiplication of `diff2_unadjusted` and the `sigmoid_derivative()` of the first entry of 
`y_predictions`
3. Calulate `delta_one`, the matrix multiplication of `diff2` and `format_partial_one` (Note the dimensions!)
4. Calulate `delta_two`, the matrix multiplication of `diff3` and `format_partial_two` (Note the dimensions!)

Note: Unlike `forward_prop()`, this method is very difficult to code so that it works for any-sized neural network. Hence, we will hard code it with the assumption that there are 2 layers for our network

In [138]:
def back_prop(y_predictions, y_actual, inputs, thetas, m, num_classifications):
    # sets the constant for ones_col
    ones_col = np.ones((m, 1))
    
    # adjusts the value of actual_val if there are more than 2 classifications
    if (num_classifications > 2):
        y_actual = (np.eye(num_classifications))[actual_val]
    
    # calculates the "difference" for the final layer
    diff3 = sigmoid(y_predictions[-1]) - y_actual
    
    # calculates the "difference" for the penultimate layer
    diff2_unadjusted = diff3 @ thetas[1][:, 1:]
    diff2 = np.multiply(diff2_unadjusted, sigmoid_derivative(y_predictions[0]))
    
    # formats the partial derivatives
    format_partial_one = np.hstack((ones_col, np.asarray(inputs)))
    format_partial_two = np.hstack((ones_col, np.asarray(sigmoid(y_predictions[0]))))
    
    # calculates the unadjusted partial derivatives
    delta_one = diff2.T @ format_partial_one
    delta_two = diff3.T @ format_partial_two
    
    # returns our partial derivatives
    return [delta_one / m, delta_two / m]

In [139]:
# run this cell to test back_prop
np.random.seed(123456789)

test_thetas = [np.random.random((2, 3)), np.random.random((1, 3))]
test_inputs = np.random.random((4, 2))
test_y_actual = np.random.random((4, 1))

test_back_prop = back_prop(test_forward_prop, test_y_actual, test_inputs, test_thetas, 4, 2)
correct_back_prop = [np.array([[0.00907003, 0.0054814 , 0.00563069], [0.03581553, 0.02146392, 0.021947  ]]), np.array([[0.33095398, 0.25353556, 0.26271239]])]

if not (test_back_prop[0] - correct_back_prop[0]).any():
    print("Homie, your first partial derivative is wrong")
elif not (test_back_prop[1] - correct_back_prop[1]).any():
    print("Oh no no your second partial derivative is wrong")
else:
    print("I can't believe you actually did it you're insane!")

I can't believe you actually did it you're insane!


## Part 3: Using our Neural Network Architecture

### Note: You don't have to code for the rest of the notebook! You can relax and watch your hard work pay off

Now that we have the two needed functions for neural networks, we can start predicting stuff!

We'll start by predicting the `xnor` operator, which essentially checks if two boolean inputs are equivalent

We can use the following truth table to describe `xnor`:


| a | b | out |
| - | - | --- |
| 1 | 1 |  1  |
| 1 | 0 |  0  |
| 0 | 1 |  0  |
| 0 | 0 |  1  |

Furthermore, the picture below describes our neural network architecture:

First, we need to define our inputs and outputs, as well as our values for theta

In [140]:
# defining inputs and outputs
xnor_inputs = np.array([[1, 1, 0, 0], [1, 0, 1, 0]]).T
xnor_outputs = np.array([[1, 0, 0, 1]]).T

In [141]:
# defining theta values with correct dimensions
xnor_thetas = [np.random.random((2, 3)), np.random.random((1, 3))]

Next, I'll define constants for our gradient descent algorithm

You might notice that `learning_rate` $ > 1$ Why?

This is undoubtably because I messed up my code somewhere but I have no idea why this happens all I know is that this is the value that makes gradient descent work

It probably has to do with the fact that the sample size is so small, so we need to converge much quicker

In [142]:
# defining constants
sample_size = xnor_inputs.shape[0]
num_classifications = 2
learning_rate = 5
num_iterations = 10000

Finally, we can run gradient descent for our algorithm

In [146]:
# gradient descent
for iteration in range(num_iterations):
    # calculate the outputs for the iteration
    outputs = forward_prop(xnor_inputs, xnor_thetas, sample_size)
    # calculate the gradients for the iteration
    gradients = back_prop(outputs, xnor_outputs, xnor_inputs, xnor_thetas, sample_size, num_classifications)
    # adjust both of our thetas, taking a small step towards the minimum
    xnor_thetas[0] = xnor_thetas[0] - learning_rate * gradients[0]
    xnor_thetas[1] = xnor_thetas[1] - learning_rate * gradients[1]

Hopefully, you'll see that the outputs read

$\begin{bmatrix}1 \\ 0 \\ 0 \\ 1\end{bmatrix}$

when you run the cell below

In [149]:
# run this cell to check your code!
print("Inputs:")
print(str(xnor_inputs))
print("--------")
print("Outputs (rounded):")
print(str(np.round(sigmoid(outputs[-1]), 3)))

Inputs:
[[1 1]
 [1 0]
 [0 1]
 [0 0]]
--------
Outputs (rounded):
[[1.]
 [0.]
 [0.]
 [1.]]
