# 4. DNNs vs. CNNs on the MNIST dataset

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.1 (06/07/2023)

**Requirements:**
- Python 3 (tested on v3.11.4)
- Matplotlib (tested on v3.7.1)
- Torch (tested on v2.0.1+cu118)
- Torchvision (tested on v0.15.2+cu118)

### Imports and CUDA

In [1]:
# Matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
# Torch
import torch
import torchvision
from torch.utils.data import Dataset
from torchvision import datasets
import torch.optim as optim
from torchvision.transforms import ToTensor, Compose, Normalize
from torchvision.datasets import MNIST
import torch.nn.functional as F
import torch.nn as nn

In [2]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


### MNIST Dataset

As before

In [3]:
# Define transform to convert images to tensors and normalize them
transform_data = Compose([ToTensor(),
                          Normalize((0.1307,), (0.3081,))])

# Load the data
batch_size = 256
train_dataset = MNIST(root='./mnist/', train = True, download = True, transform = transform_data)
test_dataset = MNIST(root='./mnist/', train = False, download = True, transform = transform_data)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = batch_size, shuffle = False)

### Our CNN model

As in the previous notebook

In [4]:
class MNIST_CNN(nn.Module):
    def __init__(self):
        super(MNIST_CNN, self).__init__()
        # Two convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size = 3, stride = 1, padding = 1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 1)
        # Two fully connected layers
        self.fc1 = nn.Linear(64*28*28, 128) # 64*28*28 = 50176
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # Pass input through first convolutional layer
        x = self.conv1(x)
        x = F.relu(x)
        # Pass output of first conv layer through second convolutional layer
        x = self.conv2(x)
        x = F.relu(x)
        # Flatten output of second conv layer
        x = x.view(-1, 64*28*28)
        # Pass flattened output through first fully connected layer
        x = self.fc1(x)
        x = F.relu(x)
        # Pass output of first fully connected layer through second fully connected layer
        x = self.fc2(x)
        return x

### Writing a trainer function like before

Below is a trainer function for our CNN model.
- We will be using the Adam optimizer, like before.
- Loss is cross entropy, like before.
- We will keep track of the train losses, test losses, train accuracies and test accuracies, and display them in training performance curves later.
- Over a given number of iterations, we will use stochastic mini-batches of the train dataset, eventually leveraging the power of the **backward()** PyTorch method to update parameters automatically for us in all the Conv2d and fully connected layers. We will also update the losses and accuracies on the fly.
- We will then set the model on eval mode and compute losses and accuracies on the testing set.

In [5]:
def train(model, train_loader, test_loader, epochs = 10, lr = 0.001):
    # Use Adam optimizer to update model weights
    optimizer = optim.Adam(model.parameters(), lr = lr)
    # Use cross-entropy loss function
    criterion = nn.CrossEntropyLoss()
    # Performance curves data
    train_losses = []
    train_accuracies = []
    test_losses = []
    test_accuracies = []
    
    for epoch in range(epochs):
        # Set model to training mode
        model.train()
        # Initialize epoch loss and accuracy
        epoch_loss = 0.0
        correct = 0
        total = 0
        # Iterate over training data
        for batch_number, (inputs, labels) in enumerate(train_loader):
            # Get from dataloader and send to device
            inputs = inputs.to(device)
            labels = labels.to(device)
            # Zero out gradients
            optimizer.zero_grad()
            # Compute model output and loss
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            loss = criterion(outputs, labels)
            # Backpropagate loss and update model weights
            loss.backward()
            optimizer.step()
            # Accumulate loss and correct predictions for epoch
            epoch_loss += loss.item()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            print(f'Epoch {epoch+1}/{epochs}, Batch number: {batch_number}, Cumulated accuracy: {correct/total}')
        # Calculate epoch loss and accuracy
        epoch_loss /= len(train_loader)
        epoch_acc = correct/total
        train_losses.append(epoch_loss)
        train_accuracies.append(epoch_acc)
        print(f'--- Epoch {epoch+1}/{epochs}: Train loss: {epoch_loss:.4f}, Train accuracy: {epoch_acc:.4f}')
        
        # Set model to evaluation mode
        model.eval()
        # Initialize epoch loss and accuracy
        epoch_loss = 0.0
        correct = 0
        total = 0
        # Iterate over test data
        for inputs, labels in test_loader:
            # Get from dataloader and send to device
            inputs = inputs.to(device)
            labels = labels.to(device)
            # Compute model output and loss
            # (No grad computation here, as it is the test data)
            with torch.no_grad():
                outputs = model(inputs)
                _, predicted = torch.max(outputs.data, 1)
                loss = criterion(outputs, labels)
            # Accumulate loss and correct predictions for epoch
            epoch_loss += loss.item()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        # Calculate epoch loss and accuracy
        epoch_loss /= len(test_loader)
        epoch_acc = correct/total
        test_losses.append(epoch_loss)
        test_accuracies.append(epoch_acc)
        print(f'--- Epoch {epoch+1}/{epochs}: Test loss: {epoch_loss:.4f}, Test accuracy: {epoch_acc:.4f}')
    
    return train_losses, train_accuracies, test_losses, test_accuracies

We will train for 3 epochs only, which should prove enough to converge already.

In [6]:
model = MNIST_CNN().to(device)
train_losses, train_accuracies, test_losses, test_accuracies = train(model, \
                                                                     train_loader, \
                                                                     test_loader, \
                                                                     epochs = 3, \
                                                                     lr = 1e-3)

Epoch 1/3, Batch number: 0, Cumulated accuracy: 0.06640625
Epoch 1/3, Batch number: 1, Cumulated accuracy: 0.10546875
Epoch 1/3, Batch number: 2, Cumulated accuracy: 0.15494791666666666
Epoch 1/3, Batch number: 3, Cumulated accuracy: 0.2421875
Epoch 1/3, Batch number: 4, Cumulated accuracy: 0.29296875
Epoch 1/3, Batch number: 5, Cumulated accuracy: 0.3502604166666667
Epoch 1/3, Batch number: 6, Cumulated accuracy: 0.40457589285714285
Epoch 1/3, Batch number: 7, Cumulated accuracy: 0.44580078125
Epoch 1/3, Batch number: 8, Cumulated accuracy: 0.4830729166666667
Epoch 1/3, Batch number: 9, Cumulated accuracy: 0.518359375
Epoch 1/3, Batch number: 10, Cumulated accuracy: 0.5461647727272727
Epoch 1/3, Batch number: 11, Cumulated accuracy: 0.5738932291666666
Epoch 1/3, Batch number: 12, Cumulated accuracy: 0.5964543269230769
Epoch 1/3, Batch number: 13, Cumulated accuracy: 0.6177455357142857
Epoch 1/3, Batch number: 14, Cumulated accuracy: 0.63671875
Epoch 1/3, Batch number: 15, Cumulated ac

Epoch 1/3, Batch number: 125, Cumulated accuracy: 0.9125124007936508
Epoch 1/3, Batch number: 126, Cumulated accuracy: 0.9130474901574803
Epoch 1/3, Batch number: 127, Cumulated accuracy: 0.913604736328125
Epoch 1/3, Batch number: 128, Cumulated accuracy: 0.9139716569767442
Epoch 1/3, Batch number: 129, Cumulated accuracy: 0.9143629807692307
Epoch 1/3, Batch number: 130, Cumulated accuracy: 0.9148676049618321
Epoch 1/3, Batch number: 131, Cumulated accuracy: 0.9153053977272727
Epoch 1/3, Batch number: 132, Cumulated accuracy: 0.9157953477443609
Epoch 1/3, Batch number: 133, Cumulated accuracy: 0.9162196828358209
Epoch 1/3, Batch number: 134, Cumulated accuracy: 0.9166666666666666
Epoch 1/3, Batch number: 135, Cumulated accuracy: 0.9171357996323529
Epoch 1/3, Batch number: 136, Cumulated accuracy: 0.9175980839416058
Epoch 1/3, Batch number: 137, Cumulated accuracy: 0.9180253623188406
Epoch 1/3, Batch number: 138, Cumulated accuracy: 0.9185026978417267
Epoch 1/3, Batch number: 139, Cumul

Epoch 2/3, Batch number: 10, Cumulated accuracy: 0.9850852272727273
Epoch 2/3, Batch number: 11, Cumulated accuracy: 0.984375
Epoch 2/3, Batch number: 12, Cumulated accuracy: 0.9840745192307693
Epoch 2/3, Batch number: 13, Cumulated accuracy: 0.9838169642857143
Epoch 2/3, Batch number: 14, Cumulated accuracy: 0.98359375
Epoch 2/3, Batch number: 15, Cumulated accuracy: 0.984375
Epoch 2/3, Batch number: 16, Cumulated accuracy: 0.9841452205882353
Epoch 2/3, Batch number: 17, Cumulated accuracy: 0.9839409722222222
Epoch 2/3, Batch number: 18, Cumulated accuracy: 0.9841694078947368
Epoch 2/3, Batch number: 19, Cumulated accuracy: 0.9837890625
Epoch 2/3, Batch number: 20, Cumulated accuracy: 0.9834449404761905
Epoch 2/3, Batch number: 21, Cumulated accuracy: 0.9834872159090909
Epoch 2/3, Batch number: 22, Cumulated accuracy: 0.9842051630434783
Epoch 2/3, Batch number: 23, Cumulated accuracy: 0.98388671875
Epoch 2/3, Batch number: 24, Cumulated accuracy: 0.98390625
Epoch 2/3, Batch number: 25

Epoch 2/3, Batch number: 132, Cumulated accuracy: 0.9867246240601504
Epoch 2/3, Batch number: 133, Cumulated accuracy: 0.9867945429104478
Epoch 2/3, Batch number: 134, Cumulated accuracy: 0.98671875
Epoch 2/3, Batch number: 135, Cumulated accuracy: 0.9867015165441176
Epoch 2/3, Batch number: 136, Cumulated accuracy: 0.9866845346715328
Epoch 2/3, Batch number: 137, Cumulated accuracy: 0.9867527173913043
Epoch 2/3, Batch number: 138, Cumulated accuracy: 0.9866513039568345
Epoch 2/3, Batch number: 139, Cumulated accuracy: 0.9866629464285714
Epoch 2/3, Batch number: 140, Cumulated accuracy: 0.9867021276595744
Epoch 2/3, Batch number: 141, Cumulated accuracy: 0.9867132482394366
Epoch 2/3, Batch number: 142, Cumulated accuracy: 0.9866968968531469
Epoch 2/3, Batch number: 143, Cumulated accuracy: 0.9867078993055556
Epoch 2/3, Batch number: 144, Cumulated accuracy: 0.9866648706896551
Epoch 2/3, Batch number: 145, Cumulated accuracy: 0.9867562071917808
Epoch 2/3, Batch number: 146, Cumulated ac

Epoch 3/3, Batch number: 18, Cumulated accuracy: 0.9915707236842105
Epoch 3/3, Batch number: 19, Cumulated accuracy: 0.991796875
Epoch 3/3, Batch number: 20, Cumulated accuracy: 0.9921875
Epoch 3/3, Batch number: 21, Cumulated accuracy: 0.9923650568181818
Epoch 3/3, Batch number: 22, Cumulated accuracy: 0.9926970108695652
Epoch 3/3, Batch number: 23, Cumulated accuracy: 0.9928385416666666
Epoch 3/3, Batch number: 24, Cumulated accuracy: 0.99296875
Epoch 3/3, Batch number: 25, Cumulated accuracy: 0.9929387019230769
Epoch 3/3, Batch number: 26, Cumulated accuracy: 0.9929108796296297
Epoch 3/3, Batch number: 27, Cumulated accuracy: 0.9930245535714286
Epoch 3/3, Batch number: 28, Cumulated accuracy: 0.9931303879310345
Epoch 3/3, Batch number: 29, Cumulated accuracy: 0.993359375
Epoch 3/3, Batch number: 30, Cumulated accuracy: 0.9935735887096774
Epoch 3/3, Batch number: 31, Cumulated accuracy: 0.9935302734375
Epoch 3/3, Batch number: 32, Cumulated accuracy: 0.9933712121212122
Epoch 3/3, Bat

Epoch 3/3, Batch number: 142, Cumulated accuracy: 0.9930889423076923
Epoch 3/3, Batch number: 143, Cumulated accuracy: 0.9931369357638888
Epoch 3/3, Batch number: 144, Cumulated accuracy: 0.9931573275862069
Epoch 3/3, Batch number: 145, Cumulated accuracy: 0.9931506849315068
Epoch 3/3, Batch number: 146, Cumulated accuracy: 0.9931972789115646
Epoch 3/3, Batch number: 147, Cumulated accuracy: 0.993190456081081
Epoch 3/3, Batch number: 148, Cumulated accuracy: 0.9932099412751678
Epoch 3/3, Batch number: 149, Cumulated accuracy: 0.9932552083333334
Epoch 3/3, Batch number: 150, Cumulated accuracy: 0.9932222682119205
Epoch 3/3, Batch number: 151, Cumulated accuracy: 0.9932411595394737
Epoch 3/3, Batch number: 152, Cumulated accuracy: 0.9931832107843137
Epoch 3/3, Batch number: 153, Cumulated accuracy: 0.9931006493506493
Epoch 3/3, Batch number: 154, Cumulated accuracy: 0.9931199596774194
Epoch 3/3, Batch number: 155, Cumulated accuracy: 0.9931139823717948
Epoch 3/3, Batch number: 156, Cumul

### Training curves

The training curves here, show that the model was able to train just fine. It eventually nears 100% accuracy on the test set, while our DNN was stuck around 96-97%. This proves the superiority of CNN models and why the Conv2d operation is typically much preferable when processing images.

The reason for that is rather simple.

Convolutional layers (conv2d) are often prefered in image processing tasks because they are able to extract spatial information from the image in a way that is translation invariant. In other words, the learned features are not affected by the location of the object in the image. Linear layers, on the other hand, do not have this property and are typically used for tasks where the position of the object in the input is not important (and that is rarely the case with images!).

Additionally, convolutional layers are able to learn a large number of features by using a small number of parameters, which is computationally efficient and helps to prevent overfitting.

### What's next?

In the next notebook, we will investigate three additional operations, typically used in CNNs, along with Conv2d, namely the pooling, dropout and batchnorm operations.