# Training using Backpropagation

Let's use the ```autograd``` feature to enable training using backpropagation. A general flow of training follows:
- Repeat for as many epochs:
    - Repeat until one epoch:
        - Extract a batch from the training set,
        - Execute a **forward pass** to obtain predictions on the batch,
        - Compute the **loss**,
        - Compute the **gradient** of the loss function w.r.t the weights using ```autograd```,
        - **Update the weights** using the gradients propagated using backpropagation,


In [42]:
import torch
import torchvision
import numpy as np

from scipy import io
from torch import nn
from torch import optim 
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from torch.nn import functional as F

torch.set_printoptions(linewidth=120)

Let's set the following:
- Training dataset - MNIST with batch size $100$
- Convolutional Neural Network

In [43]:
class MNIST(Dataset):
    def __init__(self, transform=None):
        data = io.loadmat('./../datasets/MNIST/mnist_training_data.mat')
        labels = io.loadmat('./../datasets/MNIST/mnist_training_label.mat')
        
        self.num_samples = len(data['training_data'])
        self.samples = data['training_data'].reshape(self.num_samples, 28, 28)[:,None].astype(np.float32)
        self.targets = labels['training_label'][:,0].astype(np.int64)
        self.transform = transform
    
    def __getitem__(self, index):
        if self.transform:
            return self.transform(self.samples[index]), self.targets[index]
        else:
            return self.samples[index], self.targets[index]
    
    def __len__(self):
        return self.num_samples

In [44]:
train_set = MNIST()
print(len(train_set))
print(train_set.targets)
# print(train_data.targets.bincount())
# Make bincount feature, comment on the balance of the dataset

50000
[0 0 0 ... 9 9 9]


In [45]:
batch_size = 100
train_loader = DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True
)

Let's use the convolutional neural network and instantiate an object network.

In [46]:
class ConvNetwork(nn.Module):
    
    def __init__(self):
        super(ConvNetwork, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.dense1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.dense2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, x):
        # (1) hidden conv layer
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        
        # (2) hidden conv layer
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        
        # (3) hidden linear layer
        x = x.reshape(-1, 12*4*4)
        x = self.dense1(x)
        x = F.relu(x)
        
        # (4) hidden linear layer
        x = self.dense2(x)
        x = F.relu(x)
        
        # (5) output layer
        x = self.out(x)
        # x = F.softmax(x, dim=1) # This will be implemented within the loss function
        
        return x

In [47]:
network = ConvNetwork()
network

ConvNetwork(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (dense1): Linear(in_features=192, out_features=120, bias=True)
  (dense2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

Let's compute the cross-entropy loss on one batch, and also, see how many predictions it gets right.

In [48]:
def get_num_correct(predictions, labels):
    return F.softmax(predictions, dim=1).argmax(dim=1).eq(labels).sum().item()

In [49]:
batch = next(iter(train_loader))
images, labels = batch

In [50]:
predictions = network(images)
loss = F.cross_entropy(predictions, labels)
loss.item()

2.3098042011260986

In [51]:
get_num_correct(predictions, labels)

7

Let's use the ```backward()``` module attached to the loss function to compute the gradients using backpropagation. The gradients can be computed using the computational graph that is carried over to the loss function by the ```predictions``` tensor computed during the forward pass.

In [52]:
print(network.conv1.weight.grad)

None


In [53]:
loss.backward()

In [54]:
print(network.conv1.weight.grad)
print(network.conv1.weight.grad.shape)

tensor([[[[ 4.0024e-04,  6.7776e-04,  6.6709e-04,  8.9709e-04,  1.0035e-03],
          [ 1.3444e-03,  1.1197e-03,  6.9950e-04,  3.4294e-04,  8.9420e-04],
          [ 1.0179e-03,  1.9818e-04,  2.7285e-04,  8.6131e-04,  1.9530e-03],
          [ 8.2348e-04,  2.1270e-04,  4.4001e-04,  1.5023e-03,  2.4396e-03],
          [ 5.9178e-04,  2.3150e-04,  1.0605e-03,  1.8462e-03,  2.3944e-03]]],


        [[[-5.1652e-04, -1.4356e-04, -3.0494e-04, -7.2880e-04, -1.3777e-03],
          [ 1.3309e-04,  5.5074e-04,  1.5907e-04,  2.3878e-04, -1.3154e-03],
          [ 1.2052e-03,  1.1874e-03,  6.3362e-04,  6.8915e-04, -1.2664e-03],
          [ 1.4353e-03,  1.2919e-03,  4.9581e-04,  1.1869e-04, -1.9263e-03],
          [ 1.2097e-03,  1.0014e-03,  5.0995e-04, -3.8542e-04, -1.8888e-03]]],


        [[[ 1.0951e-03,  9.5744e-04,  3.5992e-04, -2.0244e-05, -3.6566e-05],
          [ 5.6199e-04,  7.0101e-04,  6.5007e-04,  2.5849e-04,  3.0237e-05],
          [ 7.0865e-04,  9.5106e-04,  7.1071e-04,  1.1685e-04, -4.90

Notice that the gradient of the loss w.r.t the weights of ```conv1``` have the same shape as the weights of the layer. Using these gradients, we use an optimiser to update all parameters of the network.

In [55]:
optimiser = optim.Adam(network.parameters(), lr=0.01)
optimiser.step()

In [56]:
predictions = network(images)
loss = F.cross_entropy(predictions, labels)
loss.item()

2.2640018463134766

In [57]:
get_num_correct(predictions, labels)

20

We expect the loss on the same batch to decrese and the number of correct predictions to increase.

We can now put all this together to train with all the images to make an epoch, and then to run multiple epochs.

In [59]:
model = ConvNetwork()
optimiser = optim.Adam(model.parameters(), lr=0.01)

num_epochs = 5
for epoch in range(num_epochs):
    total_loss = 0
    accuracy = 0
    for batch in train_loader:
        # Load a batch
        images, labels = batch
        
        # Execute a forward pass
        predictions = model(images)
        
        # Compute the loss
        loss = F.cross_entropy(predictions, labels)
        
        # Compute gradients
        optimiser.zero_grad()
        loss.backward()
        
        # Update the weights
        optimiser.step()
        
        total_loss += loss.item()
        accuracy += get_num_correct(predictions, labels) / len(train_set)
        
    print("epoch: ", epoch, "accuracy: ", accuracy, "loss: ", total_loss)

epoch:  0 accuracy:  0.9321600000000004 loss:  106.51637787092477
epoch:  1 accuracy:  0.9765 loss:  39.997753218747675
epoch:  2 accuracy:  0.9806399999999998 loss:  33.283312194165774
epoch:  3 accuracy:  0.9811799999999992 loss:  31.91149481764296
epoch:  4 accuracy:  0.9820799999999991 loss:  32.08094493061071
epoch:  5 accuracy:  0.9832399999999979 loss:  29.699549744153046
epoch:  6 accuracy:  0.9842999999999988 loss:  28.690777756128227
epoch:  7 accuracy:  0.9860999999999983 loss:  25.455984299827833
epoch:  8 accuracy:  0.9841999999999984 loss:  29.483688041975256
epoch:  9 accuracy:  0.9858999999999979 loss:  27.84778618204291
