# Neural Networks Introduction

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

## Define the Network

In [15]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [16]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()  # Calls the parent's (nn.Module's) constructor (__init__ method)
        # input grayscale image (single channel), 6 output channels, 5x5 conv kernel
        self.conv1 = nn.Conv2d(1, 6, 5) # in_channels, out_channels, kernel_size
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5x5 from image dimensions, 120 is out_channels
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, input): # backward is automatically defined using autograd
        # Original image is 32x32
        c1 = F.relu(self.conv1(input)) # Feature maps are 28x28
        s2 = F.max_pool2d(c1, (2, 2)) # Feature maps become 14x14
        c3 = F.relu(self.conv2(s2)) # Feature maps become 10x10
        s4 = F.max_pool2d(c3, (2, 2)) # Feature maps become 5x5
        s4 = torch.flatten(s4, 1) # Flatten the array to 1D, can use start_dim=1 if got batch size
        f5 = F.relu(self.fc1(s4))
        f6 = F.relu(self.fc2(f5))
        output = self.fc3(f6) # No activation function because if using nn.CrossEntropyLoss, softmax is applied there already
        return output

net = Net()
print(net)
        

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


Learnable Parameters:

In [17]:
params = list(net.parameters())
print(len(params)) # 2 conv and 3 fc layers, each has 1 weight and 1 bias tensor; (2+3)*2=10 distinct parameter tensors
print(params[0].size()) # conv1's weights: out_channels, in_channels, kernel_height, kernel_width
# height then width because no. of rows then no. of columns

10
torch.Size([6, 1, 5, 5])


Test the network:

In [18]:
input = torch.randn(1, 1, 32, 32) # drawn from standard normal distribution: batch_size, channels, height, width
out = net(input)
print(out)

tensor([[ 0.1370,  0.1243, -0.1146, -0.0983, -0.1272,  0.0617, -0.0869,  0.0115,
         -0.1168, -0.0069]], grad_fn=<AddmmBackward0>)


Zero gradient buffers of all parameters and backpropagate with random gradients

In [19]:
net.zero_grad()
out.backward(torch.rand(1, 10))

**torch.nn only supports mini-batches (e.g. nn.Conv2d will take 4D tensor of nSamples x nChannels x Height x Width).**

**If have single sample, use input.unsqueeze(0) to add a fake batch dimension**

## Loss Function

In [20]:
output = net(input)
target = torch.randn(10) # dummy target with 10 values
target = target.view(1, -1) # make same shape as output (same as unsqueeze(0))
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.1390, grad_fn=<MseLossBackward0>)


In [21]:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU

<MseLossBackward0 object at 0x000001ED8630BEB0>
<AddmmBackward0 object at 0x000001ED8630AC80>
<AccumulateGrad object at 0x000001ED8630BEB0>


## Backpropagation

In [22]:
net.zero_grad() # zeroes gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([ 0.0043, -0.0076,  0.0021,  0.0031, -0.0076, -0.0027])


## Update the Weights

Manual Implementation:

In [23]:
'''

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate) # basic SGD
    
'''    

'\n\nlearning_rate = 0.01\nfor f in net.parameters():\n    f.data.sub_(f.grad.data * learning_rate) # basic SGD\n    \n'

Using Package:

In [24]:
import torch.optim as optim

# Create optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# In training loop:
optimizer.zero_grad() # zeroes gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # optimizer performs the param updates