# Building Layers Lab

### Introduction

In this lab, we'll reconstruct the hypothesis and forward functions for a neural network, all using Pytorch.  At the end we'll have functions that can initialize a neural network, and use the network to make predictions for our MNIST dataset.  Let's get started.

### Loading our Data

In [1]:
from sklearn.datasets import fetch_openml

X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

In [6]:
X.shape

(70000, 784)

In [4]:
first_observation = X[:1]
first_observation.shape

(1, 784)

And we can see that this represents the number 5.

In [5]:
y[0]

'5'

### Working with a Linear Layer

Now let's initialize our first linear layer.  Initialize a `Linear` layer with 64 neurons each of which take in the features of an observation from our MNIST dataset.

In [2]:
import torch
from torch.nn import Linear
torch.manual_seed(123)

W1 = None

In [11]:
W1.in_features

# 784

784

In [12]:
W1.out_features

# 8

8

Now let's take a look at the shape of the weight matrix and the bias vector initialized in our layer.

> First look at the shape of the weight matrix.

In [14]:


# torch.Size([8, 784])

torch.Size([8, 784])

> And then let's return the shape of the bias vector.

In [15]:


# torch.Size([8])

torch.Size([8])

Now let's pass through some data through the linear layer.  To do so, we'll first have to translate our numpy array into a tensor.

In [37]:
import torch
X_tensor = None

> We need to convert the tensor to be of type float.

In [35]:
first_two_observations = X_tensor[:2].float()
first_two_observations

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

Now let's pass these two observations through the linear layer.

In [43]:
W1()

# tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
#           -14.1126,  -45.3884],
#         [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
#            10.7071,  -31.4456]], grad_fn=<AddmmBackward>)

tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
          -14.1126,  -45.3884],
        [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
           10.7071,  -31.4456]], grad_fn=<AddmmBackward>)

Notice that we have 8 outputs for each observation.  Now, reproduce the same numbers using matrix multiplication.

In [53]:


# tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
#           -14.1126,  -45.3884],
#         [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
#            10.7071,  -31.4456]], grad_fn=<PermuteBackward>)

tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
          -14.1126,  -45.3884],
        [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
           10.7071,  -31.4456]], grad_fn=<PermuteBackward>)

### Initializing a Model

Ok, enough playing around.  Now it's time to initialize our linear layers.

> Write a function called `init_model`.  The function should return a dictionary with keys of `W1` and `W2`, to represent our two layers.  The layers should take in data observations each with 784 features.  The first layer should have 64 neurons, and the second layer should return a vector of length 10 for each observation. 

In [3]:
import torch.nn as nn

def init_model():
    pass

In [77]:
model = init_model()
model
# {'W1': Linear(in_features=784, out_features=64, bias=True),
#  'W2': Linear(in_features=64, out_features=10, bias=True)}

{'W1': Linear(in_features=784, out_features=64, bias=True),
 'W2': Linear(in_features=64, out_features=10, bias=True)}

Now we could take our data and pass it through these linear layers, but that would not be a valid neuron.  We need our activation layers.  We'll use two activation functions: the sigmoid function and the softmax function.

> As we saw previously the softmax function can be used to return an output from our last layer.  The function exaggerates the preference from the linear layer, and returns a set of probabilities that add up to one.

In [73]:
import torch.nn.functional as F

predictions = None

predictions

# tensor([[6.9724e-19, 1.0000e+00, 4.3300e-42, 9.8924e-34, 0.0000e+00, 2.1019e-44,
#          3.9829e-26, 1.0406e-39],
#         [6.8365e-05, 0.0000e+00, 9.9993e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00,
#          6.6412e-38, 0.0000e+00]], grad_fn=<SoftmaxBackward>)

tensor([[6.9724e-19, 1.0000e+00, 4.3300e-42, 9.8924e-34, 0.0000e+00, 2.1019e-44,
         3.9829e-26, 1.0406e-39],
        [6.8365e-05, 0.0000e+00, 9.9993e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         6.6412e-38, 0.0000e+00]], grad_fn=<SoftmaxBackward>)

And we can see that the probabilities of each row add up to one.

In [74]:
predictions.sum(axis = 1)

# tensor([1.0000, 1.0000], grad_fn=<SumBackward1>)

tensor([1.0000, 1.0000], grad_fn=<SumBackward1>)

Ok, we should now be ready to write a `forward` function that takes in our data X, and returns a set of 10 observations using our two linear layers and activation layers of sigmoid and softmax. 

$$
\begin{aligned}
z_1 & = xW_1 + b_1 \\
a_1 & = \sigma(z_1) \\
z_2 & = a_1W_2 + b_2 \\
\text{predictions} & = \text{softmax(z_2)}
\end{aligned}
$$

In [4]:
def forward(model, X):
    pass

In [81]:
forward(model, first_two_observations)

# tensor([[0.0739, 0.0943, 0.0598, 0.0721, 0.1460, 0.0820, 0.2049, 0.0732, 0.1394,
#          0.0546],
#         [0.1017, 0.0406, 0.0614, 0.0767, 0.2207, 0.0680, 0.1502, 0.0989, 0.1457,
#          0.0360]], grad_fn=<SoftmaxBackward>)

tensor([[0.0739, 0.0943, 0.0598, 0.0721, 0.1460, 0.0820, 0.2049, 0.0732, 0.1394,
         0.0546],
        [0.1017, 0.0406, 0.0614, 0.0767, 0.2207, 0.0680, 0.1502, 0.0989, 0.1457,
         0.0360]], grad_fn=<SoftmaxBackward>)

And with that, we've built the hypothesis function of a neural network using Pytorch.

### Summary

In this lesson, we saw how we can work with the Pytorch library to both initialize our linear layers and then write a `forward` function that takes in our data and returns a set of predictions for each observation.  

### Resources

[Towards data science Pytorch Gradients](https://towardsdatascience.com/understanding-pytorch-with-an-example-a-step-by-step-tutorial-81fc5f8c4e8e)

[Pytorch viz](https://github.com/szagoruyko/pytorchviz)