# Pytorch Class Lab

### Introduction 

In this lesson, we'll become more familiar with working initializing a neural network in Pytorch with classes.  Let's get started.

### Defining a Class

Begin by defining a Pytorch class with three layers.  The class should have attributes of $W1$, $W2$, and $W3$ each of which point to a different linear layer.

> The input layer should take in `28*28` features, with 64 neurons.  The second layer should have 16 neurons, and the last layer should make 10 predictions for each observation.

In [3]:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Linear(28*28, 64)
        self.W2 = nn.Linear(64, 16)
        self.W3 = nn.Linear(16, 10)
        
    def forward(self, X):
        A1 = torch.sigmoid(self.W1(X))
        A2 = torch.sigmoid(self.W2(A1))
        Z3 = torch.sigmoid(self.W3(A2))
        return F.log_softmax(Z3, dim = 1)

In [4]:
import torch
torch.manual_seed(12)

neural_net = Net()

neural_net

# Net(
#   (W1): Linear(in_features=784, out_features=64, bias=True)
#   (W2): Linear(in_features=64, out_features=16, bias=True)
#   (W3): Linear(in_features=16, out_features=10, bias=True)
# )


Net(
  (W1): Linear(in_features=784, out_features=64, bias=True)
  (W2): Linear(in_features=64, out_features=16, bias=True)
  (W3): Linear(in_features=16, out_features=10, bias=True)
)

Next, write the `forward` function.  It should pass each the first two layers through the sigmoid function.  Then the last layer should be a `log_softmax` activation function.  Set `dim = 1` in the `log_softmax` function.

> The `log_softmax` function is the same as the softmax function except that the log is then applied to softmax.  It is often used in a neural network.

Ok, here is an observation.

In [6]:
import torch
torch.manual_seed(12)
x = torch.randn(1, 784)

In [7]:
x.shape

torch.Size([1, 784])

In [8]:
predictions = neural_net(x)

predictions

# tensor([[-2.2135, -2.4482, -2.2330, -2.3596, -2.3348, -2.2169, -2.3197, -2.3751,
#          -2.2690, -2.2820]], grad_fn=<LogSoftmaxBackward>)

tensor([[-2.2135, -2.4482, -2.2330, -2.3596, -2.3348, -2.2169, -2.3197, -2.3751,
         -2.2690, -2.2820]], grad_fn=<LogSoftmaxBackward>)

We can translate these predictions into probabilities by passing the outputs to the exponent function.

In [9]:
torch.exp(predictions)

# tensor([[0.1093, 0.0864, 0.1072, 0.0945, 0.0968, 0.1089, 0.0983, 0.0930, 0.1034,
#          0.1021]], grad_fn=<ExpBackward>)

tensor([[0.1093, 0.0864, 0.1072, 0.0945, 0.0968, 0.1089, 0.0983, 0.0930, 0.1034,
         0.1021]], grad_fn=<ExpBackward>)

### Exploring BackPropagation

Notice that our prediction tensor has a gradient function associated with it.  This is because the predictions come from a linear layer that automatically has the `requires_grad = True` option set.

In [107]:
predictions

tensor([[-2.4225, -2.2919, -2.1338, -2.3123, -2.1192, -2.4466, -2.2972, -2.4385,
         -2.2620, -2.3632]], grad_fn=<LogSoftmaxBackward>)

Ultimately, we'll need to pass these predictions to a loss function.

> Let's initialize the cross entropy loss from pytorch's nn module.

In [108]:
loss = nn.CrossEntropyLoss()

In [109]:
y_actual = torch.tensor(0).view(1)
y_actual

tensor([0])

Our cross entropy `loss` function initialized above takes predictions and the actual value.  

In [110]:
loss(predictions, torch.tensor(0).view(1))

tensor(2.4225, grad_fn=<NllLossBackward>)

Notice that it returns a tensor with a gradient function.  So if we wish to calculate the gradient of the parameters in our linear layer on the cort function we should be able to do so.

In [111]:
cost = loss(predictions, torch.tensor(0).view(1))
cost

tensor(2.4225, grad_fn=<NllLossBackward>)

First, have pytorch perform back-propagation.

In [112]:
cost.backward()

We can now look at the parameters and see that our gradient has been calculated.

> Comment and uncomment the line below.

In [114]:
# for i in neural_net.parameters():
#     print(i.grad)

And let's take a look at the parameters of our neural network.  Use similar code to look at the parameters of the neural network.

> Copy the first few weights of the first linear layer below for safe-keeping.

In [117]:

# #

Now to actually make a change and update our parameters, we need to initialize an optimizer.

In [118]:
import torch.optim as optim
adam = optim.Adam(neural_net.parameters(), lr=0.0005)
adam

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0005
    weight_decay: 0
)

And call the `step` function on the `adam` optimizer.  This will update the parameters by the gradient times the learning rate.

Then take another look at the parameters.  We should see that the parameters have been updated.

> Uncomment the cells below when done.

In [121]:
# for i in neural_net.parameters():
#     print(i)
    
# -0.0290, -0.0104, -0.0239

If we want to remove these gradients we should call cost.

In [122]:
neural_net.zero_grad()

Now, let's take another look at the gradients of our neural net parameters.  Print them out below.

In [124]:
# for i in neural_net.parameters():
#     print(i.grad)

We can see that they are all zero.

### Summary

In this lesson, we practiced creating an object oriented neural network in Pytorch.  We saw that creating the neural network meant defining two functions: 

* The `init` function where we define our linear layers and 
* The `forward` function where we specify a forward pass through the data

Then we explored performing backpropagation to have Pytorch calculate the gradient of our linear layers in Pytorch.  Once we calculated the cost, we called `backward()` to perform backpropagation.  Then we saw we could update our parameters through the `adam.step()` function.  Finally, we cleared the gradients through the `neural_net.zero_grad()` function.