For this notebook, please insert where there is `_FILL_` either code or logic to make this work.



# MNIST CNN Digit Recognition Network

For this problem, you will code a basic digit recognition network. The data are images which specify the digits 1 to 10 as (1, 28, 28) data - this data is black and white images. Each pixed of the image is an intensity between 0 and 255, and together the (1, 28, 28) pixel image can be visualized as a picture of a digit. The data is given to you as $\{(x^{(i)}, y^{(i)})\}_{i=1}^{N}$ where $y$ is the given label and x is the (1, 28, 28) data. This data will be gotten from `torchvision`, a repository of computer vision data and models.

Highlevel, the model and notebook goes as follows:
*   You first download the data and specify the batch size of B = 16. Each image will need to be turned from a (1, 28, 28) volume into a serious of other volumes either via convolutional layers or max pooling layers.
*   You will pass the data through several layers to built a CNN classfier. Use the hints below to get the right dimensions and figure out what the layers should be. Be careful with the loss function. Add regularization (L1 and L2) manually.

See the comments below and fill in the analysis where there is `_FILL_` specified. All asserts should pass and Test accuracy should be about 95%.






In [17]:
!pip uninstall --yes torchtune torchaudio torchvision
!pip install torch==2.3.0 torchtext==0.18.0 torchdata==0.8.0 portalocker>=2.0.0 torchvision==0.18

[0mFound existing installation: torchvision 0.18.0
Uninstalling torchvision-0.18.0:
  Successfully uninstalled torchvision-0.18.0


In [18]:
import torchvision
from torchvision import transforms
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn as nn

In [19]:
SEED = 1
torch.manual_seed(SEED)
_FILL_ = '_FILL_'
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [20]:
image_path = './'

# Use ToTensor
transform = transforms.Compose([transforms.ToTensor()])

mnist_train_dataset = torchvision.datasets.MNIST(
  root=image_path,
  train=True,
  transform=transform,
  download=True
)

mnist_test_dataset = torchvision.datasets.MNIST(
   root=image_path,
   train=False,
   transform=transform,
   download=True
)

In [21]:
print(f"mnist train {mnist_train_dataset}  mnist_test_dataset {mnist_test_dataset}")

mnist train Dataset MNIST
    Number of datapoints: 60000
    Root location: ./
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           )  mnist_test_dataset Dataset MNIST
    Number of datapoints: 10000
    Root location: ./
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
           )


In [30]:
BATCH_SIZE = 16
LR = 0.1
L1_WEIGHT = 1e-10
L2_WEIGHT = 1e-12
EPOCHS = 20
# Get the dataloader for train and test
train_dl = DataLoader(mnist_train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_dl = DataLoader(mnist_test_dataset, batch_size=BATCH_SIZE, shuffle=True)

In [40]:
class CNNClassifier(nn.Module):

  def __init__(self):
    super().__init__()
    self.cnn1 = nn.Conv2d(1, 32, 3, stride=1)
    self.cnn2 = nn.Conv2d(32, 16, 3, stride=1)
    self.cnn3 = nn.Conv2d(16, 1, 3, stride=1, padding=1)
    self.linear = nn.Linear(25, 10)

  def forward(self, x):
    # Flatten x to be of last dimension 784
    assert(x.shape == (BATCH_SIZE, 1, 28, 28))

    # Pass through cnn layer 1
    # (28, 28, 1) -> (26, 26, 32)
    x = self.cnn1(x)
    assert(x.shape == (BATCH_SIZE, 32, 26, 26))

    # Pass through max pooling to give the result shape below
    # (26, 26, 32) -> (13, 13, 32)
    x = nn.MaxPool2d(2, stride=2)(x)
    assert(x.shape == (BATCH_SIZE, 32, 13, 13))

    # Apply ReLU
    x = nn.functional.relu(x)

    # Pass through cnn layer 2 to give the result below
    # (13, 13, 32) -> (11, 11, 16)
    x = self.cnn2(x)
    assert(x.shape == (BATCH_SIZE, 16, 11, 11))

    # Pass through max pooling pool to give the result below
    # (11, 11, 16) -> (5, 5, 16)
    x = nn.MaxPool2d(2, stride=2)(x)
    assert(x.shape == (BATCH_SIZE, 16, 5, 5))

    # Apply rely
    x = nn.functional.relu(x)

    # Pass through cnn layer 3 to give the result below
    # (5, 5, 16) -> (5, 5, 1)
    x = self.cnn3(x)
    assert(x.shape == (BATCH_SIZE, 1, 5, 5))

    # Apply rely
    x = nn.functional.relu(x)

    # Flatten to get the result below
    # (5, 5, 1) - > (25, )
    x = torch.flatten(x, start_dim=1)
    assert(x.shape == (BATCH_SIZE, 25))

    # Pass through linear layer to get the result below
    # (25, ) -> (10, )
    x = self.linear(x)
    assert(x.shape == (BATCH_SIZE, 10))

    # Return the logits
    return x

model = CNNClassifier().to(DEVICE)

In [None]:
# Get the loss function; remember you are outputting the logits
loss_fn = nn.CrossEntropyLoss().to(DEVICE)

# Set the optimizer to SGD and let the learning rate be LR
# Do not add L2 regularization; add it manually below ...
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

model.train()
torch.manual_seed(SEED)
for epoch in range(EPOCHS):
    accuracy_hist_train = 0.0
    loss_hist_train = 0.0
    # Loop through the x and y pairs of data
    for x_batch, y_batch in train_dl:
        # Get he the model predictions
        y_pred = model(x_batch)
        # print("logits mean/std:", y_pred.mean().item(), y_pred.std().item())

        # print(f" model predict {y_pred} label {y_batch}")
        # Get the loss
        loss = loss_fn(y_pred, y_batch)
        # print(f"y_pred {y_pred[0:2]} y_batch: {y_batch[0:2]} loss: {loss}")

        # Add an L1 regularizaton with a weight of L1_WEIGHT to the objective
        l1_reg = L1_WEIGHT * sum(p.abs().sum() for p in model.parameters())

        # Add an L2 regularization with a weight of L2_WEIGHT to the objective
        l2_reg = L2_WEIGHT * sum((p**2).sum() for p in model.parameters())

        # Add the regularizers to the objective
        loss += l1_reg + l2_reg

        # Get the gradients
        loss.backward()

        # Add to the loss
        # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
        loss_hist_train += loss

        # Update the parameters
        optimizer.step()

        # Zero out the gradient
        optimizer.zero_grad()

        # Get the number of correct predictions, do this directly
        # print(f" prediction {torch.argmax(y_pred, dim=1)} truth {y_batch}")
        is_correct = (torch.argmax(y_pred, dim=1) == y_batch).sum()

        accuracy_hist_train += is_correct

        # print(f"loss in epoch {loss} {accuracy_hist_train}")

    accuracy_hist_train /= len(train_dl.dataset)
    loss_hist_train /= len(train_dl.dataset)
    print(f'Train Metrics Epoch {epoch} Loss {loss_hist_train:.4f} Accuracy {accuracy_hist_train:.4f}')

    # Get the average value of each metric across the test batches
    with torch.no_grad():
      loss_hist_test = 0.0
      accuracy_hist_test = 0.0
      # Loop through the x and y pairs of data
      for x_batch, y_batch in test_dl:
          # Get he the model predictions
          y_batch_pred = model(x_batch)

          # Get the loss
          loss = loss_fn(y_batch_pred, y_batch)

          # Add an L1 regularizaton with a weight of L1_WEIGHT to the objective
          l1_reg = L1_WEIGHT * sum(p.abs().sum() for p in model.parameters())

          # Add an L2 regularization with a weight of L2_WEIGHT to the objective
          l2_reg = L2_WEIGHT * sum((p**2).sum() for p in model.parameters())

          # Add the regularizers to the objective
          loss += l1_reg + l2_reg

          # Add to the loss
          # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
          loss_hist_test += loss

          # Get the number of correct predictions
          is_correct = (torch.argmax(y_batch_pred, dim=1) == y_batch).sum()

          # Get the accuracy
          accuracy_hist_test += is_correct

      # Normalize the metrics by the right number
      accuracy_hist_test /= len(test_dl.dataset)
      loss_hist_test /= len(test_dl.dataset)
      print(f'Test Metrics Epoch {epoch} Loss {loss_hist_test:.4f} Accuracy {accuracy_hist_test:.4f}')

Train Metrics Epoch 0 Loss 0.0184 Accuracy 0.9076
Test Metrics Epoch 0 Loss 0.0083 Accuracy 0.9593
Train Metrics Epoch 1 Loss 0.0093 Accuracy 0.9539
Test Metrics Epoch 1 Loss 0.0066 Accuracy 0.9675
Train Metrics Epoch 2 Loss 0.0078 Accuracy 0.9622
Test Metrics Epoch 2 Loss 0.0068 Accuracy 0.9674
Train Metrics Epoch 3 Loss 0.0070 Accuracy 0.9664
Test Metrics Epoch 3 Loss 0.0065 Accuracy 0.9698
Train Metrics Epoch 4 Loss 0.0064 Accuracy 0.9679
Test Metrics Epoch 4 Loss 0.0060 Accuracy 0.9717
Train Metrics Epoch 5 Loss 0.0060 Accuracy 0.9705
Test Metrics Epoch 5 Loss 0.0059 Accuracy 0.9704
Train Metrics Epoch 6 Loss 0.0058 Accuracy 0.9716
Test Metrics Epoch 6 Loss 0.0051 Accuracy 0.9753
Train Metrics Epoch 7 Loss 0.0056 Accuracy 0.9724
Test Metrics Epoch 7 Loss 0.0052 Accuracy 0.9754
Train Metrics Epoch 8 Loss 0.0054 Accuracy 0.9728
Test Metrics Epoch 8 Loss 0.0054 Accuracy 0.9729
Train Metrics Epoch 9 Loss 0.0051 Accuracy 0.9751
Test Metrics Epoch 9 Loss 0.0044 Accuracy 0.9795
Train Metr