# Building Layers Lab

### Introduction

In this lab, we'll reconstruct the hypothesis and forward functions for a neural network, all using Pytorch.  At the end we'll have functions that can initialize a neural network, and use the network to make predictions for our MNIST dataset.  Let's get started.

### Loading our Data

In [1]:
from sklearn.datasets import fetch_openml

X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

In [6]:
X.shape

(70000, 784)

In [4]:
first_observation = X[:1]
first_observation.shape

(1, 784)

And we can see that this represents the number 5.

In [5]:
y[0]

'5'

### Working with a Linear Layer

Now let's initialize our first linear layer.  Initialize a `Linear` layer with 64 neurons each of which take in the features of an observation from our MNIST dataset.

In [42]:
from torch.nn import Linear
torch.manual_seed(123)

W1 = Linear(784, 8)

In [11]:
W1.in_features

# 784

784

In [12]:
W1.out_features

# 8

8

Now let's take a look at the shape of the weight matrix and the bias vector initialized in our layer.

> First look at the shape of the weight matrix.

In [14]:
W1.weight.shape

# torch.Size([8, 784])

torch.Size([8, 784])

> And then let's return the shape of the bias vector.

In [15]:
W1.bias.shape

# torch.Size([8])

torch.Size([8])

Now let's pass through some data through the linear layer.  To do so, we'll first have to translate our numpy array into a tensor.

In [37]:
import torch
X_tensor = torch.tensor(X)

> We need to convert the tensor to be of type float.

In [35]:
first_two_observations = X_tensor[:2].float()
first_two_observations

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

Now let's pass these two observations through the linear layer.

In [43]:
W1(first_two_observations)

# tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
#           -14.1126,  -45.3884],
#         [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
#            10.7071,  -31.4456]], grad_fn=<AddmmBackward>)

tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
          -14.1126,  -45.3884],
        [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
           10.7071,  -31.4456]], grad_fn=<AddmmBackward>)

Notice that we have 8 outputs for each observation.  Now, reproduce the same numbers using matrix multiplication.

In [53]:
(W1.weight @ first_two_observations.T + W1.bias.view(-1, 1)).T

tensor([[   2.5655,   44.3726,  -50.8704,  -31.6235,  -75.0541,  -56.1966,
          -14.1126,  -45.3884],
        [  86.7214,  -24.9196,   96.3119,  -63.9667,  -69.7478, -144.3202,
           10.7071,  -31.4456]], grad_fn=<PermuteBackward>)

### Initializing a Model

Ok, enough playing around.  Now it's time to initialize our linear layers.

> Write a function called `init_model`.  The function should return a dictionary with keys of `W1` and `W2`, to represent our two layers.  The layers should take in data observations each with 784 features.  The first layer should have 64 neurons, and the second layer should return a vector of length 10 for each observation. 

In [59]:
import torch.nn as nn

def init_model():
    W1 = nn.Linear(28*28, 64)
    W2 = nn.Linear(64, 10)
    return {'W1': W1, 'W2': W2}

In [77]:
model = init_model()
model
# {'W1': Linear(in_features=784, out_features=64, bias=True),
#  'W2': Linear(in_features=64, out_features=10, bias=True)}

{'W1': Linear(in_features=784, out_features=64, bias=True),
 'W2': Linear(in_features=64, out_features=10, bias=True)}

Now we could take our data and pass it through these linear layers, but that would not be a valid neuron.  We need our activation layers.  We'll use two activation functions: the sigmoid function and the softmax function.

> As we saw previously the softmax function can be used to return an output from our last layer.  The function exaggerates the preference from the linear layer, and returns a set of probabilities that add up to one.

In [73]:
import torch.nn.functional as F

predictions = F.softmax(W1(first_two_observations), dim = 1)

predictions

tensor([[6.9724e-19, 1.0000e+00, 4.3300e-42, 9.8924e-34, 0.0000e+00, 2.1019e-44,
         3.9829e-26, 1.0406e-39],
        [6.8365e-05, 0.0000e+00, 9.9993e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         6.6412e-38, 0.0000e+00]], grad_fn=<SoftmaxBackward>)

And we can see that the probabilities of each row add up to one.

In [74]:
predictions.sum(axis = 1)

tensor([1.0000, 1.0000], grad_fn=<SumBackward1>)

Ok, we should now be ready to write a `forward` function that takes in our data X, and returns a set of 10 observations using our two linear layers and activation layers of sigmoid and softmax. 

$$
\begin{aligned}
z_1 & = xW_1 + b_1 \\
a_1 & = \sigma(z_1) \\
z_2 & = a_1W_2 + b_2 \\
\text{predictions} & = \text{softmax(z_2)}
\end{aligned}
$$

In [80]:
def forward(model, X):
    W1, W2 = model.values()
    Z1 = W1(X)
    A1 = torch.sigmoid(Z1)
    Z2 = W2(A1)
    return F.softmax(Z2, dim = 1)    

In [81]:
forward(model, first_two_observations)

# tensor([[0.0739, 0.0943, 0.0598, 0.0721, 0.1460, 0.0820, 0.2049, 0.0732, 0.1394,
#          0.0546],
#         [0.1017, 0.0406, 0.0614, 0.0767, 0.2207, 0.0680, 0.1502, 0.0989, 0.1457,
#          0.0360]], grad_fn=<SoftmaxBackward>)

tensor([[0.0739, 0.0943, 0.0598, 0.0721, 0.1460, 0.0820, 0.2049, 0.0732, 0.1394,
         0.0546],
        [0.1017, 0.0406, 0.0614, 0.0767, 0.2207, 0.0680, 0.1502, 0.0989, 0.1457,
         0.0360]], grad_fn=<SoftmaxBackward>)

And with that, we've built the hypothesis function of a neural network using Pytorch.

### Summary

In this lesson, we 

In [1]:
import torch
import torchvision

from torchvision import transforms, datasets

In [78]:
train = datasets.MNIST("", train = True, download = True, 
                       transform = transforms.Compose([transforms.ToTensor()]))
test = datasets.MNIST("", train = False, download = True, 
                       transform = transforms.Compose([transforms.ToTensor()]))


Using downloaded and verified file: MNIST/raw/train-images-idx3-ubyte.gz
Extracting MNIST/raw/train-images-idx3-ubyte.gz to MNIST/raw
Using downloaded and verified file: MNIST/raw/train-labels-idx1-ubyte.gz
Extracting MNIST/raw/train-labels-idx1-ubyte.gz to MNIST/raw
Using downloaded and verified file: MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting MNIST/raw/t10k-images-idx3-ubyte.gz to MNIST/raw
Using downloaded and verified file: MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting MNIST/raw/t10k-labels-idx1-ubyte.gz to MNIST/raw
Processing...
Done!


In [79]:
trainset = torch.utils.data.DataLoader(train, batch_size = 50, shuffle = True)
testset = torch.utils.data.DataLoader(test, batch_size = 50, shuffle = True)

### Initialize the model

In [4]:
X_1  = trainset.dataset.data[0]
y_1 = trainset.dataset.targets[0]

In [5]:
import torch.nn as nn
import torch.nn.functional as F
def build_model():
    W1 = nn.Linear(28*28, 64)
    W2 = nn.Linear(64, 10)
    return {'W1': W1, 'W2': W2} 

In [6]:
model = build_model()

model

{'W1': Linear(in_features=784, out_features=64, bias=True),
 'W2': Linear(in_features=64, out_features=10, bias=True)}

In [7]:
def forward(X, model):
    W1, W2 = tuple(model.values())
    Z1 = W1(X)
    A1 = F.sigmoid(Z1)
    Z2 = W2(A1)
    A2 = F.softmax(Z2, dim = 1)
    return (Z1, A1, Z2, A2)

### Combine Into Class

In [80]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        A1 = torch.sigmoid(self.fc1(x))
        return F.softmax(self.fc2(A1), dim = 1)

In [81]:
net = Net()

In [82]:
net(X_1.float().view(-1,784)).shape

torch.Size([1, 10])

In [64]:
for param in net.parameters():
    print(param.shape)

torch.Size([64, 784])
torch.Size([64])
torch.Size([10, 64])
torch.Size([10])


In [65]:
net.parameters()

<generator object Module.parameters at 0x185f05ed0>

### Training A Network

In [83]:
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0005)

In [84]:
for epoch in range(10): # 3 full passes over the data
    for data in trainset:  # `data` is a batch of data
        X, y = data  # X is the batch of features, y is the batch of targets.
        net.zero_grad()  # sets gradients to 0 before loss calc. You will do this likely every step.
        output = net(X.view(-1,28*28))  # pass in the reshaped batch (recall they are 28x28 atm)
        loss = F.nll_loss(output, y)  # calc and grab the loss value
        loss.backward()  # apply this loss backwards thru the network's parameters
        optimizer.step()  # attempt to optimize weights to account for loss/gradients
    print(loss)  # print loss. We hope loss (a measure of wrong-ness) declines! 

tensor(-0.7409, grad_fn=<NllLossBackward>)
tensor(-0.8884, grad_fn=<NllLossBackward>)
tensor(-0.9072, grad_fn=<NllLossBackward>)
tensor(-0.9295, grad_fn=<NllLossBackward>)
tensor(-0.8825, grad_fn=<NllLossBackward>)
tensor(-0.9556, grad_fn=<NllLossBackward>)
tensor(-0.9786, grad_fn=<NllLossBackward>)
tensor(-0.9443, grad_fn=<NllLossBackward>)
tensor(-0.9728, grad_fn=<NllLossBackward>)
tensor(-0.9612, grad_fn=<NllLossBackward>)


In [69]:

# net(trainset.dataset.data.view(-1, 784).float())[:10]

In [56]:
torch.argmax(net(trainset.dataset.data.view(-1, 784).float()), axis = 1)[:100]

tensor([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9, 1,
        1, 2, 4, 3, 2, 7, 3, 8, 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9, 3, 9, 8, 5,
        9, 3, 3, 0, 7, 4, 9, 8, 0, 9, 4, 1, 4, 4, 6, 0, 4, 5, 6, 1, 0, 0, 1, 7,
        1, 6, 3, 0, 2, 1, 1, 7, 0, 0, 2, 6, 7, 8, 3, 9, 0, 4, 6, 7, 4, 6, 8, 0,
        7, 8, 3, 1])

In [57]:
trainset.dataset.targets[:100]

tensor([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9, 1,
        1, 2, 4, 3, 2, 7, 3, 8, 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9, 3, 9, 8, 5,
        9, 3, 3, 0, 7, 4, 9, 8, 0, 9, 4, 1, 4, 4, 6, 0, 4, 5, 6, 1, 0, 0, 1, 7,
        1, 6, 3, 0, 2, 1, 1, 7, 9, 0, 2, 6, 7, 8, 3, 9, 0, 4, 6, 7, 4, 6, 8, 0,
        7, 8, 3, 1])

In [58]:
torch.argmax(net(testset.dataset.data.view(-1, 784).float()), axis = 1)[:100]

tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 8, 4, 9, 6, 6, 5,
        4, 0, 7, 4, 0, 1, 3, 1, 3, 4, 7, 2, 7, 1, 2, 1, 1, 7, 4, 2, 3, 5, 1, 2,
        4, 4, 6, 3, 5, 5, 6, 0, 4, 1, 9, 5, 7, 8, 9, 3, 7, 4, 6, 4, 3, 0, 7, 0,
        2, 9, 1, 7, 3, 2, 9, 7, 7, 6, 2, 7, 8, 4, 7, 3, 6, 1, 3, 6, 9, 3, 1, 4,
        1, 7, 6, 9])

In [59]:
testset.dataset.targets[:100]

tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 3, 4, 9, 6, 6, 5,
        4, 0, 7, 4, 0, 1, 3, 1, 3, 4, 7, 2, 7, 1, 2, 1, 1, 7, 4, 2, 3, 5, 1, 2,
        4, 4, 6, 3, 5, 5, 6, 0, 4, 1, 9, 5, 7, 8, 9, 3, 7, 4, 6, 4, 3, 0, 7, 0,
        2, 9, 1, 7, 3, 2, 9, 7, 7, 6, 2, 7, 8, 4, 7, 3, 6, 1, 3, 6, 9, 3, 1, 4,
        1, 7, 6, 9])

### Understanding Data

In [77]:
# trainset.dataset.data[0]

### Resources

[Towards data science Pytorch Gradients](https://towardsdatascience.com/understanding-pytorch-with-an-example-a-step-by-step-tutorial-81fc5f8c4e8e)

[Pytorch viz](https://github.com/szagoruyko/pytorchviz)