In [0]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py

## 1. Neural Network Class with Pytorch

**Convolution operation**

- Input = black and white image with 32 by 32 pixels
- Layer 1 = conv2d with 6 filters with kernel = 5,5
- Layer 2 = max_pooling with kernel 2,2
- Layer 3 = conv2d with 16 filters with kernel (5,5)
- Layer 4 = max_pooling with kernel 2,2

**Fully Connected Neural Network**

 - Flatten the output from convolution operation
 - Feed forward to a 120 fully connected neurons
 - Feed forward to 82 neurons
 - output to 10 neurons



In [0]:
class LeNet(nn.Module):
  
  def __init__(self):
    
    super(LeNet,self).__init__()
    self.conv1 = nn.Conv2d(1,6,5)   # (N,1,32,32) -> (N,6,28,28)
    self.conv2 = nn.Conv2d(6,16,(5,5)) # (N,6,14,14) -> (N,16,10,10)
    
    # Full Connected
    self.fc1 = nn.Linear(400,120)
    self.fc2 = nn.Linear(120,84)
    self.fc3 = nn.Linear(84,10)
    
  def forward(self,X):
    x = F.max_pool2d(F.relu(self.conv1(X)),(2,2))
    x = F.max_pool2d(F.relu(self.conv2(x)),(2,2))
    # flatten the output from convolution operation
    x = x.view(x.size(0),-1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

In [99]:
net = LeNet()
print(net)

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


## Backpropagation and Weight Update

### Backpropagation

In [100]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 5, 5])


In [101]:
# try a random input
x_input = torch.randn(1,1,32,32)
out = net(x_input)
print(out)

tensor([[-0.0100,  0.0623, -0.0195, -0.0446,  0.0623,  0.1433,  0.0183, -0.0838,
          0.1426, -0.1487]], grad_fn=<AddmmBackward>)


In [0]:
# computing the loss
loss_fn = nn.MSELoss()
true = torch.randn(1,10)
loss = loss_fn(out,true)

So, when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has requires_grad=True will have their .grad Tensor accumulated with the gradient

In [103]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0])
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

<MseLossBackward object at 0x7f90796cf470>
<AddmmBackward object at 0x7f90796cf4a8>
<AccumulateGrad object at 0x7f90796cf470>


**To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.**

In [106]:
net.zero_grad()

print('conv1.bias.grad before backward function')
print(net.conv1.bias.grad)

loss.backward(retain_graph=True)
print('conv1.bias.grad after backward function')
print(net.conv1.bias.grad)

conv1.bias.grad before backward function
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward function
tensor([ 0.0210,  0.0023,  0.0020, -0.0207, -0.0004,  0.0015])


**To reduce memory usage, during the .backward() call, all the intermediary results are deleted** when they are not needed anymore. Hence if you try to call .backward() again, the intermediary results don’t exist and the backward pass cannot be performed (and you get the error you see).
You can call .backward(retain_graph=True) to make a backward pass that will not delete intermediary results, and so you will be able to call .backward() again. All but the last call to backward should have the retain_graph=True option.

*For more Information:* https://discuss.pytorch.org/t/runtimeerror-trying-to-backward-through-the-graph-a-second-time-but-the-buffers-have-already-been-freed-specify-retain-graph-true-when-calling-backward-the-first-time/6795

### Updating the Weights

In [0]:
opt = optim.SGD(net.parameters(),lr=0.01)
opt.zero_grad() # zero the gradient buffers
output = net(x_input)
loss = loss_fn(output,true)
loss.backward()
opt.step()  # step updates the weights

**The use of zero_grad in pytorch**

You need to clear the existing gradients though, else gradients will be accumulated to existing gradients

*for more information: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html*