### Training a CNN

In this notebook, we show how to train a toy CNN with just 2 layers to do Image Classification on the MNIST Dataset

#### Importing the required libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torchvision.datasets as Datasets
import torchvision.transforms as transforms

#### Convolution layer mechanics...

For any conv layer with parameters (num_filters=k, filter_size=f, stride=s, padding=p) that takes as input a volume Ch_in x H_in x W_in produces an output vol Ch_out x H_out x W_out where
- Ch_out = k
- H_out = [ (H_in + 2*p - f)//s ] + 1
- W_out = [ (W_in + 2*p - f)//s ] + 1

#### Defining the CNN architecture

- Input: 28 x 28 Grayscale images
- Conv1 : 8 3x3 filters, with stride = 1 and padding = 1 (This choice of parameters maintains the spatial resolution of the input - sometimes referred to as **SAME** convolution in the CNN parlance!)
- Pool1 : A maxpooling layer that downsamples by a factor of 2 (no learnable parameters in this layer!)
- Conv2 : 16 3x3 filters with stride = 1 and padding = 1
- Pool2 : A maxpooling layer that downsamples by a factor of 2
- FC1 : A fully-connected (aka Linear) layer that transforms 16x7x7-d vector into a 10-d vector (corresponding to number of output classes in the MNIST dataset)



In [2]:
class ConvNeuralNet(nn.Module):
  def __init__(self, in_channels=1, num_classes=10):
    super(ConvNeuralNet, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(3,3), stride=(1,1), padding=(1,1))
    self.pool = nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))
    self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3,3), stride=(1,1), padding=(1,1))
    self.fc1 = nn.Linear(16*7*7, num_classes) #We will be using two pooling layers... check forward for clarity...
  
  def forward(self, x):
    x = F.relu(self.conv1(x))
    x = self.pool(x)
    x = F.relu(self.conv2(x))
    x = self.pool(x)
    x = x.reshape(x.shape[0], -1) #flatten out the other three dimensions retaining only the batch dimension...
    # equivalently we can do x = torch.flatten(x, start_dim=1, end_dim=-1)
    x = self.fc1(x)

    return x


#### Just a sanity check...

In [3]:
model = ConvNeuralNet()
x = torch.randn(64,1,28,28)
print(model(x).shape) # We expect to see a tensor of shape batch_size x num_classes

torch.Size([64, 10])


#### Selecting the device and setting the hyperparameters

In [4]:
dev = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(dev)

cuda


In [5]:
#hyperparams...
in_channels = 1
num_classes = 10
learning_rate = 0.01
batch_size = 64
num_epochs = 1


#### Downloading the MNIST dataset and configuring the dataloader

In [6]:
#Loading MNIST data
train_set = Datasets.MNIST(root='MNIST/', train=True, transform=transforms.ToTensor(), download=True)
train_loader = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True)

test_set = Datasets.MNIST(root='MNIST/', train=False, transform=transforms.ToTensor(), download=True)
test_loader = DataLoader(dataset=test_set, batch_size=batch_size, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to MNIST/MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:01<00:00, 7058731.76it/s]


Extracting MNIST/MNIST\raw\train-images-idx3-ubyte.gz to MNIST/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to MNIST/MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 3619448.24it/s]


Extracting MNIST/MNIST\raw\train-labels-idx1-ubyte.gz to MNIST/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST/MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 2926317.25it/s]


Extracting MNIST/MNIST\raw\t10k-images-idx3-ubyte.gz to MNIST/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST/MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4540164.15it/s]

Extracting MNIST/MNIST\raw\t10k-labels-idx1-ubyte.gz to MNIST/MNIST\raw






#### Initializing the model and configuring loss function and optimizer

In [7]:
#Initialize the model..
model = ConvNeuralNet().to(dev)

In [8]:
#Loss and optimizer...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

#### Training the model...

In [9]:
#Training loop...
for epoch in range(num_epochs):
  for batch_id, (data,gt) in enumerate(train_loader):
    data = data.to(device=dev) #already in correct shape..
    gt = gt.to(device=dev)

    #forward pass...
    scores = model(data)
    loss = criterion(scores, gt)

    #backward pass...
    optimizer.zero_grad()
    loss.backward()

    optimizer.step()


#### Model evaluation

In [10]:
def check_accuracy(loader, model):
  if loader.dataset.train:
    print("Checking accuracy on training partition")
  else:
    print("Checking accuracy on test partition")

  num_correct = 0
  num_samples = 0

  model.eval()

  with torch.no_grad():
    for x,y in loader:
      x = x.to(device=dev)
      y = y.to(device=dev)

      scores = model(x)
      _, predictions = torch.max(scores,1)
      num_correct += (predictions==y).sum()
      num_samples += predictions.size(0)

    print(f'Got {num_correct} / {num_samples} correct with accuracy {float(num_correct) / float(num_samples)*100:.2f}')

  model.train()

In [11]:
check_accuracy(train_loader, model)
check_accuracy(test_loader, model)

Checking accuracy on training partition
Got 58695 / 60000 correct with accuracy 97.82
Checking accuracy on test partition
Got 9796 / 10000 correct with accuracy 97.96
