# The Hypothesis function of a Neural Network

### Introduction

Now let's try to build the weight and bias matrices of a neural network for the mnist dataset.  The mnist dataset is a classic dataset for practicing with a neural network.  We'll see that it is a dataset of images of handwritten digits.  The task of a neural network, is to train a neural network that can predict the associated digit for each handwritten image.

In this lesson, we won't be training the network, but will focus on constructing the weight matrices and bias vecotrs for a two layer neural network.

### Architecting our Network

In this lesson, we'll build weight matrices and bias vectors for a network that looks like the following:

$$
\begin{aligned}
z_1 & = xW_1 + b_1 \\
a_1 & = \sigma(z_1) \\
z_2 & = a_1W_2 + b_2 \\
\end{aligned}
$$

In [5]:
from numpy import random

import numpy as np
random.seed(3)
x = np.random.randn(784)
# x

1. Let's start with the initial layer 
* $z_1 = xW_1 + b_1$

Build the layer with 25 neurons, and have each neuron in the first layer take in the 784 features of an observation from the MNIST dataset. 

In [6]:
import numpy as np

W1 = random.randn(784, 25)
b1 = random.randn(25)

In [7]:
W1.shape, b1.shape

# ((784, 25), (25,))

((784, 25), (25,))

2. Define the sigmoid function

Next define the sigmoid function.  That is, our non-linear layer so that the outputs from our first layer are between $0$ and $1$.

In [19]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [22]:
z1 = (x.dot(W1) + b1)
a1 = sigmoid(z1)
# a1.shape
# z1
# (25,)
# x.shape, W1.shape
a1

array([1.44631072e-15, 1.00000000e+00, 5.33569122e-06, 1.00000000e+00,
       1.02691168e-10, 2.54148468e-10, 2.31156000e-12, 2.13338573e-12,
       9.99948431e-01, 8.16353635e-01, 1.63316773e-19, 9.99999971e-01,
       1.93166217e-10, 5.03752390e-08, 3.01203801e-19, 3.73025522e-08,
       2.22675450e-02, 1.00000000e+00, 1.59492818e-11, 9.99606207e-01,
       9.99999994e-01, 9.99999998e-01, 9.99569598e-01, 4.39663915e-15,
       9.76029457e-01])

In [24]:
W2 = random.randn(25, 5)
b = random.randn(5)

In [26]:
a1.dot(W2)

array([-1.84065415, -0.56606298, -1.25128396, -8.52773275,  3.87730373])

2. The second linear layer

Next let's feed the inputs from the previous layer into a second linear layer.  This linear layer should have 10 outputs, one for each digit.  

> And, as *it's inputs*, it should take in the outputs from the previous layer.

In [49]:
W2 = None
b2 = None

In [62]:
l2 = l1.dot(W2) + b2

array([ 121.5086, -108.1936,   -4.2547,   23.0828,  -58.7143])

3. The output layer

Finally, we need to return an output from our neural network.  There should be ten outputs, each between 0 and 1, and we want our neural network to predict the probability of one of the classes significantly higher than the others.

For this we'll use the softmax function.  Remember that the softmax function applies the exponent to each of it's (here 10) inputs and then normalizes by the sum of these exponents. 

In [27]:
z3 = np.array([ 121.5086, 108.1936,   4.2547,   23.0828,  58.7143])

In [None]:
[1, 0, 0, 0]

[.99999, .000001, .000001, .00001]

In [30]:
(z3/z3.sum())

array([0.38482046, 0.34265156, 0.01347473, 0.07310375, 0.1859495 ])

In [38]:
softmax = np.exp(z3)/np.sum(np.exp(z3))

In [42]:
np.log(softmax)

array([-1.64956189e-06, -1.33150016e+01, -1.17253902e+02, -9.84258016e+01,
       -6.27943016e+01])

In [37]:
# 148.4131591/22026.46579481

$softmax(x) = \frac{e^x}{\sum e^x}$

In [72]:
np.set_printoptions(precision=4, suppress=True) # turn off scientific notation

def softmax(layer):
    return np.exp(layer)/np.sum(np.exp(layer))

In [63]:
softmax(l2)

array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [73]:
def init_model():
    W1 = random.randn(784, 25)
    b = random.randn(25)
    W2 = random.randn(25, 10)
    b2 = random.randn(10) 
    return {'W1': W1, 'W2': W2, 'b1':b1, 'b2': b2}

In [74]:
def forward(model, x):
    W1, W2, b1, b2 = model.values()
    a1 = sigmoid(x.dot(W1) + b1)
    return softmax(a1.dot(W2) + b2)

In [75]:
forward(model, x)

array([0.0015, 0.0193, 0.9699, 0.0015, 0.0003, 0.0051, 0.0001, 0.0001,
       0.001 , 0.0012])

In [67]:
x.shape

(784,)

In [59]:
model = init_model()

In [60]:
W1, W2, b1, b2 = model.values()

In [63]:
x.shape, W1.shape, b

((784,), (784, 25))

In [61]:
x.dot(W1) + b

ValueError: operands could not be broadcast together with shapes (25,) (5,) 

In [55]:

# model.values()


ValueError: operands could not be broadcast together with shapes (25,) (5,) 

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/jigsaw-labs.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [55]:
W1 = np.random.randn(784, 25)
b1 = np.random.randn(25)

In [56]:
W2 = np.random.randn(25, 10)
b2 = np.random.randn(10)

In [57]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [58]:
def softmax(layer):
    return np.exp(layer)/np.sum(np.exp(layer))