# About: Deep Learning '23 Assignment 1


We will perform Image Classification task on the MNIST dataset which has 70,000 28*28 images labelled into 10 classes. 

**Total Marks: 60**


**Fill these**

Name: `Sidharth Vishwakarma`

Roll Number: `20CS10082`

**Instructions:**

1. We have left code cells blank for you to fill up with appropriate code. Do not add any extra code cells. Strictly follow the format and fill up the cells with the correct code. Refer to cell comments for what to fill in that cell.

2. *Do not* use any training frameworks like PyTorch Lightning. This assignment will test your ability to write custom training loops.

3. Save the notebook with cell outputs of all cells. The cell outputs  will be used for evaluating your submission.




In [1]:
import torch
import torch.nn
import random
import numpy as np

from torchvision import datasets, transforms
from torch.utils.data import random_split, DataLoader


## Add any other imports here
import matplotlib.pyplot as plt

In [2]:
SEED=42
torch.manual_seed(SEED)
random.seed(SEED)
np.random.seed(SEED)

## Getting the data

In [3]:
train_data = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.MNIST('data', train=False, download=True, transform=transforms.ToTensor())
train, val = random_split(train_data, [50000, 10000], generator=torch.Generator().manual_seed(SEED))

train_loader = DataLoader(train, batch_size=64, shuffle=True)
val_loader = DataLoader(val, batch_size=64, shuffle=False)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

print(len(train), len(val), len(test_data))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw

50000 10000 10000


## Defining the Model [18 marks]

You will define 3 models, with 2, 3, 4 hidden layers respectively. Lets call these models A, B, C. We will be studying the comparitive performance of these 3 models on this task.

Use ReLU as the activation function for all three models. Later we will experiment with other activation functions as well.

### Model A

Architecture:

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Output Layer (Dimension Size = Number of Classes = 10)

In [4]:
# Model A Definition 
input_size = 28 * 28
num_classes = 10

class ModelA(torch.nn.Module):
    def __init__(self, input_size, num_classes):
        super(ModelA, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 64)
        self.fc2 = torch.nn.Linear(64, 128)
        self.fc3 = torch.nn.Linear(128, num_classes)
        self.relu = torch.nn.ReLU()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x


# Fill in appropriately while maintaining the name of the variable
modelA = ModelA(input_size,num_classes)

### Model B


Architecture:

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Hidden Layer (Dimension Size - 256)
7. Activation Function
8. Output Layer (Dimension Size = Number of Classes = 10)

In [5]:
# Model B Definiton
input_size = 28 * 28
num_classes = 10

class ModelB(torch.nn.Module):
    def __init__(self, input_size, num_classes):
        super(ModelB, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 64)
        self.fc2 = torch.nn.Linear(64, 128)
        self.fc3 = torch.nn.Linear(128, 256)
        self.fc4 = torch.nn.Linear(256, num_classes)
        self.relu = torch.nn.ReLU()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.fc4(x)
        return x

# Use the same variable name
modelB = ModelB(input_size, num_classes)

### Model C


Architecture

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Hidden Layer (Dimension Size - 256)
7. Activation Function
8. Hidden Layer (Dimension Size - 512)
9. Activation Function
10. Output Layer (Dimension Size = Number of Classes = 10)

In [7]:
# Model C Definition
input_size = 28 * 28
num_classes = 10

class ModelC(torch.nn.Module):
    def __init__(self, input_size, num_classes):
        super(ModelC, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 64)
        self.fc2 = torch.nn.Linear(64, 128)
        self.fc3 = torch.nn.Linear(128, 256)
        self.fc4 = torch.nn.Linear(256, 512)
        self.fc5 = torch.nn.Linear(512, num_classes)
        self.relu = torch.nn.ReLU()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.fc4(x)
        x = self.relu(x)
        x = self.fc5(x)
        return x

# Use the same variable name
modelC = ModelC(input_size, num_classes)

## Loss Function & Optimizer [2 marks]

* Loss Function: Cross Entropy Loss
* Optimizer : Adam

Use PyTorch Library versions for these two.

In [8]:
# Use the same variable names
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam

## Training Loop [30 marks]

We give you the freedom to choose Hyperparameters like learing rate, number of epochs etc, but take care to use the **same** hyperparameters for all the 3 models. Also clearly state the hyperparameters you have chosen

For each model, You need to report these metrics: Train Loss, Val Loss, Train Accuracy, Val Accuracy at the end of each epoch.

Also plot the graphs of the following (in separate cells)
1. Train Loss & Val Loss V/s Epoch
2. Train Accuracy & Val Accuracy V/s Epoch

In [9]:
# Define the hyperparameters (same for all 3 models) here
EPOCHS = 10
LEARNING_RATE = 0.001

optimizerA = optimizer(modelA.parameters(), lr=LEARNING_RATE)
optimizerB = optimizer(modelB.parameters(), lr=LEARNING_RATE)
optimizerC = optimizer(modelC.parameters(), lr=LEARNING_RATE)

### Model A 



In [None]:
# Training Loop for model A
train_lossesA = []
val_lossesA = []
train_accA = []
val_accA = []

for epoch in range(EPOCHS):
    train_loss = 0.0
    val_loss = 0.0
    train_correct = 0
    val_correct = 0
    
    modelA.train()
    for data, target in train_loader:
        data = data.view(-1, 28*28)
        optimizerA.zero_grad()
        output = modelA(data)
        loss = criterion(output, target)
        loss.backward()
        optimizerA.step()
        train_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        train_correct += pred.eq(target.data.view_as(pred)).sum()
    
    modelA.eval()
    for data, target in val_loader:
        data = data.view(-1, 28*28)
        output = modelA(data)
        loss = criterion(output, target)
        val_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        val_correct += pred.eq(target.data.view_as(pred)).sum()
    
    train_loss /= len(train_loader.dataset)
    val_loss /= len(val_loader.dataset)
    train_lossesA.append(train_loss)
    val_lossesA.append(val_loss)
    train_accA.append(100. * train_correct / len(train_loader.dataset))
    val_accA.append(100. * val_correct / len(val_loader.dataset))
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \tTraining Accuracy: {:.2f}% \tValidation Accuracy: {:.2f}%'.format(
        epoch, train_loss, val_loss, 100. * train_correct / len(train_loader.dataset), 100. * val_correct / len(val_loader.dataset)))

In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model A
plt.plot(train_lossesA, label='Training Loss')
plt.plot(val_lossesA, label='Validation Loss' )
plt.legend()
plt.show()

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model A
plt.plot(train_accA, label='Training Accuracy')
plt.plot(val_accA, label='Validation Accuracy')
plt.legend()
plt.show()

### Model B


In [None]:
# Training Loop for model B
train_lossesB = []
val_lossesB = []
train_accB = []
val_accB = []

for epoch in range(EPOCHS):
    train_loss = 0.0
    val_loss = 0.0
    train_correct = 0
    val_correct = 0
    
    modelB.train()
    for data, target in train_loader:
        data = data.view(-1, 28*28)
        optimizerB.zero_grad()
        output = modelB(data)
        loss = criterion(output, target)
        loss.backward()
        optimizerB.step()
        train_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        train_correct += pred.eq(target.data.view_as(pred)).sum()
    
    modelB.eval()
    for data, target in val_loader:
        data = data.view(-1, 28*28)
        output = modelB(data)
        loss = criterion(output, target)
        val_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        val_correct += pred.eq(target.data.view_as(pred)).sum()
    
    train_loss /= len(train_loader.dataset)
    val_loss /= len(val_loader.dataset)
    train_lossesB.append(train_loss)
    val_lossesB.append(val_loss)
    train_accB.append(100. * train_correct / len(train_loader.dataset))
    val_accB.append(100. * val_correct / len(val_loader.dataset))
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \tTraining Accuracy: {:.2f}% \tValidation Accuracy: {:.2f}%'.format(
        epoch, train_loss, val_loss, 100. * train_correct / len(train_loader.dataset), 100. * val_correct / len(val_loader.dataset)))

In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model B
plt.plot(train_lossesB, label='Training Loss')
plt.plot(val_lossesB, label='Validation Loss')
plt.legend()
plt.show()

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model B
plt.plot(train_accB, label='Training Accuracy')
plt.plot(val_accB, label='Validation Accuracy')
plt.legend()
plt.show()

### Model C


In [None]:
# Training Loop for model C
train_lossesC = []
val_lossesC = []
train_accC = []
val_accC = []

for epoch in range(EPOCHS):
    train_loss = 0.0
    val_loss = 0.0
    train_correct = 0
    val_correct = 0
    
    modelC.train()
    for data, target in train_loader:
        data = data.view(-1, 28*28)
        optimizerC.zero_grad()
        output = modelC(data)
        loss = criterion(output, target)
        loss.backward()
        optimizerC.step()
        train_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        train_correct += pred.eq(target.data.view_as(pred)).sum()
    
    modelC.eval()
    for data, target in val_loader:
        data = data.view(-1, 28*28)
        output = modelC(data)
        loss = criterion(output, target)
        val_loss += loss.item()
        pred = output.data.max(1, keepdim=True)[1]
        val_correct += pred.eq(target.data.view_as(pred)).sum()
    
    train_loss /= len(train_loader.dataset)
    val_loss /= len(val_loader.dataset)
    train_lossesC.append(train_loss)
    val_lossesC.append(val_loss)
    train_accC.append(100. * train_correct / len(train_loader.dataset))
    val_accC.append(100. * val_correct / len(val_loader.dataset))
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \tTraining Accuracy: {:.2f}% \tValidation Accuracy: {:.2f}%'.format(
        epoch, train_loss, val_loss, 100. * train_correct / len(train_loader.dataset), 100. * val_correct / len(val_loader.dataset)))

In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model C
plt.plot(train_lossesC, label='Training Loss')
plt.plot(val_lossesC, label='Validation Loss')
plt.legend()
plt.show()

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model C
plt.plot(train_accC, label='Training Accuracy')
plt.plot(val_accC, label='Validation Accuracy')
plt.legend()
plt.show()

## Choosing a Activation Function [10 marks]

Based on the best performing model you found above, define 2 more models with these 2 activation functions (1 activation function is used throughout the model definiation). Use these Activation funcitons 


*   Tanh
*   LeakyRELU

In [None]:
# Leaky ReLU model definiton

# Tanh model definition


# Maintain these variable names
model_lrelu = ...
model_tanh = ...

### Training 

Train these two models with the same hyperparameters. Train in separate cells given below, and report the same metrics descrived previously (train_loss, val_loss, train_acc, val_acc)

In [None]:
# Training Loop for LRELU

In [None]:
# Training Loop for TanH

### Results on Test Set

Report the Test Set classfication accuracy for the three activation functions (ReLU, LeakyReLU & TanH) and state which activation function gave the best performance on test set

In [None]:
# Define how to calculate Accuracy on Test Set

In [None]:
# Accuracy of RELU model

In [None]:
# Accuracy of TanH model

In [None]:
# Accuracy of LeakyReLU model

Fill in these with the values you obtained from training.

* ReLU model Test Set Accuracy: `....` %
* TanH model Test Set Accuracy: `....` %
* LeakReLU model Test Set Accuracy: `....` %