<a href="https://colab.research.google.com/github/fchang-smith/CSC370Planning/blob/main/Class_Activity_Solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Credit: This lab was inspired by https://github.com/aamini/introtodeeplearning/ from MIT 6.S191 Intro to Deep Learning, borrowed parts of the code and model architecture from https://nextjournal.com/gkoehler/pytorch-mnist, and was compiled and modified by Feiran Chang.

# **Part 1: MNIST Digit Classification**
In the first portion of this lab, we will build and train a convolutional neural network (CNN) for classification of handwritten digits from the famous MNIST dataset. The MNIST dataset consists of 60,000 training images and 10,000 test images. Our classes are the digits 0-9.

First, let's import the relevant packages we'll need for this lab.

In [None]:
# Import pyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import random_split, DataLoader

import matplotlib.pyplot as plt
import numpy as np
import random
from tqdm import tqdm


Let's check that we are using a GPU.

In [None]:
# Check that we are using a GPU, if not switch runtimes
# The program should print "cuda"
#   using Runtime > Change Runtime Type > GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
assert(device == 'cuda')
print(device)

# 1.1 MNIST dataset
Let's download and load the dataset using PyTorch's specific functions, such as Dataset and DataLoader, to handle loading data and labels. We can also define the batch size during this step.


In [None]:
train_loader = torch.utils.data.DataLoader(
  datasets.MNIST('/files/', train=True, download=True,
                             transform=transforms.Compose([
                               transforms.ToTensor()
                             ])),
  batch_size=64, shuffle=True)

test_loader = torch.utils.data.DataLoader(
  datasets.MNIST('/files/', train=False, download=True,
                             transform=transforms.Compose([
                               transforms.ToTensor(),
                             ])),
  batch_size=64, shuffle=True)


# Access the first batch of training data
example_iter = iter(train_loader)
example_images, example_labels = next(example_iter)

# You can print out the shape of a tensor by using variable.shape
print(f"example_images: {example_images.shape}")
print(f"example_labels: {example_labels.shape}")

Our training set is made up of 28x28 grayscale images of handwritten digits.

Let's visualize what some of these images and their corresponding training labels look like.

In [None]:
plt.figure(figsize=(10,10))
random_inds = np.random.choice(60000,36)
for i in range(36):
  plt.subplot(6,6,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  image_ind = random_inds[i]
  plt.imshow(np.squeeze(example_images[i]), cmap=plt.cm.binary)
  plt.xlabel(example_labels[i].item())

# 1.2 Neural Network for Handwritten Digit Classification
We'll first build a simple neural network consisting of two fully connected layers and apply this to the digit classification task. Our network will ultimately output a probability distribution over the 10 digit classes (0-9). This first architecture we will be building is depicted below:
![alt_text](https://raw.githubusercontent.com/aamini/introtodeeplearning/master/lab2/img/mnist_2layers_arch.png "CNN Architecture for MNIST Classification")


### Fully connected neural network architecture
To define the architecture of this first fully connected neural network, we'll define the model using the `nn.Linear` layer. Note how we first use a `nn.Flatten` layer, which flattens the input so that it can be fed into the model.

In this next block, you'll define the fully connected layers of this simple work.

In [None]:
class mlp_model(nn.Module):
    def __init__(self):
        super(mlp_model, self).__init__()
        self.flatten = nn.Flatten()
        self.mlp1 = nn.Linear(784, 128)  # Adjust input size based on your data
        self.mlp2 = nn.Linear(128, 10)  # Adjust output size based on your task (e.g., number of classes)

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.mlp1(x))
        x = F.softmax(self.mlp2(x), dim=1)
        return x

# Instantiate the model
model_1 = mlp_model()


Let's take a step back and think about the network we've just created. The first layer in this network, `nn.Flatten`, transforms the format of the images from a 2d-array (28 x 28 pixels), to a 1d-array of 28 * 28 = 784 pixels. You can think of this layer as unstacking rows of pixels in the image and lining them up. There are no learned parameters in this layer; it only reformats the data.

After the pixels are flattened, the network consists of a sequence of two `nn.Linear` layers. These are fully-connected neural layers. The first `Linear` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.

That defines our fully connected model!

### Train the model

We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the number of epochs, or iterations over the MNIST dataset, to use during training.

We'll use a stochastic gradient descent (SGD) optimizer initialized with a learning rate of 0.1. Since we are performing a categorical classification task, we'll want to use the cross entropy loss.

In [None]:
# Set up key parameters
num_epoch = 5
learning_rate = 0.1
momentum = 0.5
log_interval = 100
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_1.parameters(), lr=learning_rate, momentum = momentum)

# Using array to keep track of the loss
loss_array = []

# Training loop
# During each epoch, the model is trained on the entire dataset
for epoch in range(num_epoch):
  model_1.train() # Set up trainning mode
  train_iter = iter(train_loader)

  # loop over all batches in the dataloader
  for train_idx, (train_images, train_labels) in enumerate(train_iter):
    pred = model_1.forward(train_images) # Use the model to predict the label
    loss = loss_function(pred, train_labels) # Compute loss
    loss.backward() # Backpropogation
    optimizer.step() # Modify the learning rate based on the optimizer's algorithm
    optimizer.zero_grad()
    loss_array.append(loss.item())

    # Print out metrics very 100 batches
    if train_idx % log_interval == 0: # Remind: log_interval = 100
      print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
        epoch+1, train_idx * len(train_images), len(train_loader.dataset),
        100. * train_idx / len(train_loader), loss.item()))

### Evaluate accuracy on the test dataset

Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.


In [None]:
model_1.eval() # Set up the evaluation mode
test_loss = 0
correct = 0
with torch.no_grad():
  # Loop over the test dataset only once
  for test_images, test_labels in test_loader:
    pred = model_1(test_images)
    test_loss += F.nll_loss(pred, test_labels, size_average=False).item()
    pred = pred.data.max(1, keepdim=True)[1]
    correct += pred.eq(test_labels.data.view_as(pred)).sum() # Compare labels and predictions
test_loss /= len(test_loader.dataset)
print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
  test_loss, correct, len(test_loader.dataset),
  100. * correct / len(test_loader.dataset)))

What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...

## 1.3 Convolutional Neural Network (CNN) for handwritten digit classification

Convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will now build a CNN composed of two convolutional layers and pooling layers, followed by two fully connected layers, and ultimately output a probability distribution over the 10 digit classes (0-9). The CNN we will be building is depicted below:

![CNN Architecture for MNIST Classification](https://raw.githubusercontent.com/fchang-smith/CSC370Planning/main/cnn_structure.jpg "CNN Architecture for MNIST Classification")


### Define the CNN model

We'll use the same training and test datasets as before, and proceed similarly as our fully connected network to define and train our new CNN model. To do this we will explore two layers we have not encountered before: you can use  `nn.Conv2d` to define convolutional layers and `nn.MaxPool2d` to define the pooling layers. Use the parameters shown in the network architecture above to define these layers and build the CNN model.

In [None]:
class cnn_model(nn.Module):
  def __init__(self):
    super(cnn_model, self).__init__()
    # TODO: Define the first convolutional layer
    self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
    self.pool1 = nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0)
    self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
    self.pool2 = nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0)
    self.flatten = nn.Flatten()
    self.mlp1 = nn.Linear(320, 50)
    self.mlp2 = nn.Linear(50, 10)

  def forward(self, x):
    x = self.conv1(x)
    x = F.relu(x)
    x = self.pool1(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool2(x)
    x = self.flatten(x)
    x = self.mlp1(x)
    x = F.relu(x)
    x = self.mlp2(x)
    x = F.log_softmax(x)
    return x
    # you can also write it as:
    # x = self.pool1(F.relu(self.con1(x)))
    # x = self.pool2(F.relu(self.con2(x)))
    # x = self.flatten(x)
    # x = F.relu(self.mlp1(x))
    # x = F.log_softmax(self.mlo2(x))

# Create an instance of the CNN model
model_2 = cnn_model()

# Print the summary of the layers in the model
print(model_2)

### Train and test the CNN model

Now, as before, we can define the loss function, optimizer, and calculate accuracy.
Basically, the only difference between this training loop and the multiple percepton (mlp) trainning loop is that model_1 is substituted with model_2.



In [None]:
 # Set up key parameters
num_epoch = 5
learning_rate = 0.1
momentum = 0.5
log_interval = 100
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_2.parameters(), lr=learning_rate, momentum = momentum)

# Using array to keep track of the loss
train_losses = []
train_counter = []

# Training loop
# During each epoch, the model is trained on the entire dataset
for epoch in range(num_epoch):
  model_2.train() # Set up trainning mode
  train_iter = iter(train_loader)
  # loop over all batches in the dataloader
  for train_idx, (train_images, train_labels) in enumerate(train_iter):
    pred = model_2.forward(train_images) # Use the model to predict the label
    loss = loss_function(pred, train_labels) # Compute loss
    loss.backward() # Backpropogation
    optimizer.step() # Modify the learning rate based on the optimizer's algorithm
    optimizer.zero_grad()

    if train_idx % log_interval == 0:
      print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
        epoch+1, train_idx * len(train_images), len(train_loader.dataset),
        100. * train_idx / len(train_loader), loss.item()))
      train_losses.append(loss.item())
      train_counter.append((train_idx*64) + ((epoch)*len(train_loader.dataset)))

As before, let's test the model and calculate the accuracy.

In [None]:
model_2.eval()
test_loss = 0
correct = 0
with torch.no_grad():
  for test_images, test_labels in test_loader:
    pred = model_2(test_images)
    test_loss += F.nll_loss(pred, test_labels, size_average=False).item()
    pred = pred.data.max(1, keepdim=True)[1]
    correct += pred.eq(test_labels.data.view_as(pred)).sum()
test_loss /= len(test_loader.dataset)
print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
  test_loss, correct, len(test_loader.dataset),
  100. * correct / len(test_loader.dataset)))

We can visualize the loss descending process.

In [None]:
fig = plt.figure()
plt.plot(train_counter, train_losses, color='blue')
plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
plt.xlabel('number of training examples seen')
plt.ylabel('negative log likelihood loss')

# A failed example
The model below is too complex that it does not perform well. You can try it yourself by uncommenting everything in the cell below.

![alt_text](https://raw.githubusercontent.com/aamini/introtodeeplearning/master/lab2/img/convnet_fig.png "CNN Architecture for MNIST Classification")

In [None]:
class cnn_model(nn.Module):
  def __init__(self):
    super(cnn_model, self).__init__()
    self.conv1 = nn.Conv2d(1, 24, kernel_size=(3, 3), stride=1, padding=0)
    self.pool1 = nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0)
    self.conv2 = nn.Conv2d(24, 36, kernel_size=(3, 3), stride=1, padding=0)
    self.pool2 = nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0)
    self.flatten = nn.Flatten()
    self.mlp1 = nn.Linear(36 * 5 * 5, 128)
    self.mlp2 = nn.Linear(128, 10)

  def forward(self, x):
    x = F.relu(self.conv1(x)) # output size: 24 * 26 * 26
    x = self.pool1(x) # output size: 24 * 13 * 13
    x = F.relu(self.conv2(x)) # output size: 36 * 11 * 11
    x = self.pool2(x) # output size: 36 * 5 * 5
    x = self.flatten(x) # output size: 900
    x = F.relu(self.mlp1(x))
    x = self.mlp2(x)
    return F.softmax(x, dim=0)