#### Convolutional Neural Network

A convolutional neural network (CNN) is different from a regular neural network in that it applies filters or filter kernels and they work best on image data.

In this notebook, we are going to apply a CNN to the CIFAR10 dataset, which is an image dataset with 10 classes each with 6000 32x32 coloured images (60000 images total). This task is an image multiclass classification task. The dataset was published by the Canadian Institution for Advanced Research.

In [1]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Configure device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


In [3]:
# Hyperparameters
num_epochs = 4
batch_size = 4
alpha = 1e-3

In [4]:
# Composite transform to convert CIFAR10 dataset
# (PIL (Python Imaging Library) images of range[0, 1]) 
# into normalized tensors of range[-1, 1]
transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize R, G and B channels with mean 0.5 and standard deviation 0.5
])

In [5]:
# Load data
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms) # Download training version of CIFAR10 data applying the transforms
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms) # Download testing version of CIFAR10 data applying the transforms

Files already downloaded and verified
Files already downloaded and verified


In [6]:
# Load data into DataLoader
train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

In [7]:
# Declare classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

The output of a convolutional layer is as follows:<br>
Given a square image of IxI pixels, a filter kernel of size FxF, P padding and S strides:<br>
Output is OxO matrix where:<br>
$O = \frac{I-F+2P}{S} + 1$

The output of a pooling layer is as follows:<br>
Given a square image of IxI pixels, a filter kernel of size FxF and S strides:<br>
Output is OxO matrix where:<br>
$O = \frac{I-F}{S} + 1$

In [8]:
# Custom CNN
class CNN(nn.Module):
    def __init__(self): # Kinda hard to figure out the parameters beforehand, better to go layer by layer within the __init__ block
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) # 3 input channels, 6 output channels, 5x5 filter kernel, output => (32-5+2*0)/1 + 1 = 28
        self.conv2 = nn.Conv2d(6, 16, 5) # 6 input channels (from first conv layer), 16 output channels, 5x5 filter kernel, output => (((28-2)/2 [This portion is from the first pooling] + 1)-5+2*0)/1 + 1 = 10
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2) # 2x2 kernel, 2 stride
        self.fc1 = nn.Linear(16*5*5, 120) # 16 channel input where each channel is (10-2)/2 + 1 = 5x5 matrix; all values flattened out
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10) # Input size of L linear layer = Output size of L-1 linear layer; 10 classes require 10 outputs in the final linear layer  
        
    def forward(self, x):
        # Conv + relu + pool block 1 
        out = self.conv1(x)
        out = self.relu(out)
        out = self.pool(out)
        # Conv + relu + pool block 2
        out = self.conv2(out)
        out = self.relu(out)
        out = self.pool(out)
        # Flatten the output so far
        out = out.view(-1, 16*5*5) # Infer the number of rows automatically, but have 16*5*5 columns since the last output is of size [16, 5, 5]
        # FC + relu block 1
        out = self.fc1(out)
        out = self.relu(out)
        # FC + relu block 2
        out = self.fc2(out)
        out = self.relu(out)
        # FC block 3
        out = self.fc3(out)
        
        return out

In [9]:
model = CNN().to(device)

In [10]:
# Criterion and optimizer
criterion = nn.CrossEntropyLoss() # Since this is a multiclass classification problem, CrossEntropyLoss is used; the output does not have to go through a softmax layer and the labels do not have to be one-hot encoded
optimizer = torch.optim.SGD(model.parameters(), lr=alpha)

In [11]:
# Training loop
num_iters = len(train_dataloader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_dataloader):
        # Transfer inputs and labels to device
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        predicted_outputs = model(images)
        
        # Loss calculation
        loss = criterion(predicted_outputs, labels)
        
        # Backpropagation
        loss.backward()
        
        # Parameter update
        optimizer.step()
        
        # Zero out gradients
        optimizer.zero_grad()
        
        if (i+1) % 200 == 0:
            print(f"Epoch: {epoch+1}/{num_epochs}, step: {i+1}/{num_iters}, loss: {loss.item():.3f}")

Epoch: 1/4, step: 200/12500, loss: 2.299
Epoch: 1/4, step: 400/12500, loss: 2.254
Epoch: 1/4, step: 600/12500, loss: 2.285
Epoch: 1/4, step: 800/12500, loss: 2.354
Epoch: 1/4, step: 1000/12500, loss: 2.320
Epoch: 1/4, step: 1200/12500, loss: 2.279
Epoch: 1/4, step: 1400/12500, loss: 2.334
Epoch: 1/4, step: 1600/12500, loss: 2.270
Epoch: 1/4, step: 1800/12500, loss: 2.324
Epoch: 1/4, step: 2000/12500, loss: 2.321
Epoch: 1/4, step: 2200/12500, loss: 2.305
Epoch: 1/4, step: 2400/12500, loss: 2.328
Epoch: 1/4, step: 2600/12500, loss: 2.351
Epoch: 1/4, step: 2800/12500, loss: 2.289
Epoch: 1/4, step: 3000/12500, loss: 2.285
Epoch: 1/4, step: 3200/12500, loss: 2.274
Epoch: 1/4, step: 3400/12500, loss: 2.297
Epoch: 1/4, step: 3600/12500, loss: 2.329
Epoch: 1/4, step: 3800/12500, loss: 2.283
Epoch: 1/4, step: 4000/12500, loss: 2.284
Epoch: 1/4, step: 4200/12500, loss: 2.310
Epoch: 1/4, step: 4400/12500, loss: 2.344
Epoch: 1/4, step: 4600/12500, loss: 2.305
Epoch: 1/4, step: 4800/12500, loss: 2.

Epoch: 4/4, step: 2000/12500, loss: 2.348
Epoch: 4/4, step: 2200/12500, loss: 1.323
Epoch: 4/4, step: 2400/12500, loss: 1.445
Epoch: 4/4, step: 2600/12500, loss: 1.573
Epoch: 4/4, step: 2800/12500, loss: 2.082
Epoch: 4/4, step: 3000/12500, loss: 0.785
Epoch: 4/4, step: 3200/12500, loss: 1.319
Epoch: 4/4, step: 3400/12500, loss: 1.214
Epoch: 4/4, step: 3600/12500, loss: 1.817
Epoch: 4/4, step: 3800/12500, loss: 1.578
Epoch: 4/4, step: 4000/12500, loss: 1.904
Epoch: 4/4, step: 4200/12500, loss: 1.794
Epoch: 4/4, step: 4400/12500, loss: 1.651
Epoch: 4/4, step: 4600/12500, loss: 1.111
Epoch: 4/4, step: 4800/12500, loss: 1.136
Epoch: 4/4, step: 5000/12500, loss: 1.122
Epoch: 4/4, step: 5200/12500, loss: 0.961
Epoch: 4/4, step: 5400/12500, loss: 1.262
Epoch: 4/4, step: 5600/12500, loss: 1.330
Epoch: 4/4, step: 5800/12500, loss: 0.641
Epoch: 4/4, step: 6000/12500, loss: 0.662
Epoch: 4/4, step: 6200/12500, loss: 2.218
Epoch: 4/4, step: 6400/12500, loss: 0.792
Epoch: 4/4, step: 6600/12500, loss

In [13]:
# Testing and evaluation
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    n_class_correct = [0 for i in range(10)] # Array to store number of correctly classified samples for each of the 10 classes
    n_class_samples = [0 for i in range(10)] # Array to store number of samples for each of the 10 classes
    
    for i, (images, labels) in enumerate(test_dataloader):
        # Transfer inputs and labels to device
        images = images.to(device)
        labels = labels.to(device)
        
        # Predictions
        predicted_outputs = model(images)
        
        # Overall accuracy stats
        _, predicted_output = torch.max(predicted_outputs, 1) # Get the index of the highest value (this is the predicted class)
        n_samples += labels[0] # Increase the number of samples predicted by the number of samples in this batch
        n_correct += (predicted_output==labels).sum().item() # Get the number of correctly classified samples in this batch
        
        # Classwise accuracy stats
        for i in range(batch_size):
            label = labels[i] # Get the true label of the sample
            pred = predicted_output[i] # Get the predicted label of the sample
            
            if (label==pred):
                n_class_correct[label] += 1 # If correctly classified, increase count by 1
            
            n_class_samples[label] += 1 # Increase the count of samples for the true label class
            
    overall_accuracy = n_correct / n_samples * 100.0
    print(f"Overall accuracy: {overall_accuracy:.3f}%")
    print(f"Classwise accuracy")
    for i in range(10):
        class_accuracy = n_class_correct[i] / n_class_samples[i] * 100.0
        print(f"Accuracy for {classes[i]}: {class_accuracy:.3f}%")

Overall accuracy: 40.379%
Classwise accuracy
Accuracy for plane: 43.000%
Accuracy for car: 52.600%
Accuracy for bird: 43.600%
Accuracy for cat: 27.200%
Accuracy for deer: 47.600%
Accuracy for dog: 41.700%
Accuracy for frog: 40.900%
Accuracy for horse: 53.800%
Accuracy for ship: 51.100%
Accuracy for truck: 54.500%
