---
# Homework 2: Convolutional Neural Networks (CNNs) (100 points)


In this homework, we're going to build a specific convolutional neural network (CNN) architecture called LeNet and then train it on an image classification dataset (the same dataset as in homework 1).

In [1]:
import torch
from torch import nn
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
from tqdm import tqdm
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import time
import copy
torch.set_num_threads(4)
torch.set_num_interop_threads(4)

# Loading Data (0 points)

This is an example of building a torch data loader for a dataset. Here we have used it to load the Fashion MNIST image dataset. No code needed here.

The raw data is retrieved from: [Kaggle - Fashion MNIST Dataset](https://www.kaggle.com/datasets/zalando-research/fashionmnist?resource=download)

We have already built a set of training, validation and test data for this homework.

In [2]:
train_csv=pd.read_csv('../../assets/fashion_mnist/train.csv')
test_csv=pd.read_csv('../../assets/fashion_mnist/test.csv')
valid_csv=pd.read_csv('../../assets/fashion_mnist/val.csv')

In [3]:
class FashionDataset(Dataset):
    """User defined class to build a datset using Pytorch class Dataset."""
    
    def __init__(self, data, transform = None):
        """Method to initilaize variables.""" 
        self.fashion_MNIST = list(data.values)
        self.transform = transform
        
        label = []
        image = []
        
        for i in self.fashion_MNIST:
             # first column is of labels.
            label.append(i[0])
            image.append(i[1:])
        self.labels = np.asarray(label)
        # Dimension of Images = 28 * 28 * 1. where height = width = 28 and color_channels = 1.
        self.images = np.asarray(image).reshape(-1, 28, 28, 1).astype('float32')
        self.images = self.images/256

    def __getitem__(self, index):
        label = self.labels[index]
        image = self.images[index]
        
        if self.transform is not None:
            image = self.transform(image)

        return image, label

    def __len__(self):
        return len(self.images)

In [4]:
batch_size=256

train_set = FashionDataset(train_csv, transform=transforms.Compose([transforms.ToTensor()]))
val_set = FashionDataset(valid_csv, transform=transforms.Compose([transforms.ToTensor()]))
test_set = FashionDataset(test_csv, transform=transforms.Compose([transforms.ToTensor()]))

train_loader = DataLoader(train_set, batch_size=batch_size)
val_loader = DataLoader(val_set, batch_size=batch_size)
test_loader = DataLoader(test_set, batch_size=batch_size)

# Question 1: Build LeNet (50 pts)

In this part, you will build a LeNet Model.

The input to the model initialization should include the sizes of two hidden layers and the sizes of the output channels of two convolutional layers. You can refer to the original LeNet paper ([Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/document/726791)) for more details on the model. 

To improve the performance, you may consider adding batch normalization ([Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)) after each convolutional layer.

You may put all of your modules inside the nn.Sequential function, so it will be easy to "forward" the input. 

We have prepared the model initialization for you. We have used a Xavier uniform initialization for linear layers and convolutional layers.

In [5]:
class leNet(nn.Module):
    def __init__(self,hidden_1=80,hidden_2=84,output_channel_1=4,output_channel_2=12):
        super(leNet, self).__init__()
        self.net=nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=output_channel_1, kernel_size=5, stride=1, padding=0),
            nn.BatchNorm2d(output_channel_1),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2, padding=0),
            nn.Conv2d(in_channels=output_channel_1, out_channels=output_channel_2, kernel_size=5, stride=1, padding=0),
            nn.BatchNorm2d(output_channel_2),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2, padding=0),
            nn.Flatten(),
            nn.Linear(in_features=output_channel_2*4*4, out_features=hidden_1),
            nn.BatchNorm1d(hidden_1),
            nn.Tanh(),
            nn.Linear(in_features=hidden_1, out_features=hidden_2),
            nn.BatchNorm1d(hidden_2),
            nn.Tanh(),
            nn.Linear(in_features=hidden_2, out_features=10)
        )
        
        # Xavier initialization for each module in the network.
        for m in self.net:
            if type(m) == nn.Linear or type(m) == nn.Conv2d:
                nn.init.xavier_uniform_(m.weight)

    def forward(self, X):
        return self.net(X)

In [6]:
# Model hyperparameters to consider, we just tune output_channel in this homework
lrs=[0.9,1.5]
hidden_1=80
hidden_2=84
output_channel_1s=[4,6]
output_channel_2s=[12,16]
# loss function
loss_function = nn.CrossEntropyLoss()

In [7]:
# function for evaluating a model's performance
def eval_model(model,data_loader):
    model.eval()
    y_true_list=[]
    y_pred_list=[]
    model.eval()
    for x,y in data_loader:
        outputs=model(x)
        _, y_pred = torch.max(outputs, 1)
        y_pred_list.extend(y_pred.clone().detach().tolist())
        y_true_list.extend(y.clone().detach().tolist())
    acc=classification_report(y_true_list, y_pred_list,output_dict=True)['accuracy']
    return acc

In [8]:
# Hidden tests in this cell

# Question 2: Train the LeNet CNN model (50 pts)

In this question, you will write a few lines to train the LeNet you just build.

This is similar to Homework 1. 

Here's the list of things that you need to implement. All of them can (should) be done using one line of code.

    Initialize the model with a set of hyperparameters
    Initialize the optimizer with the model's trainable parameters
    Set the model into the training mode
    For every batch of data: 
        zero the gradient in the optimizer
        feed the input into the model
        compute the loss
        back propagate the loss
        update the optimizer
        
Your model should obtain a test set accuracy of at least 0.85 in order to secure full points. A training procedure that is correct but yields an accuracy lower than 0.85 will receive 25 points. 

In [9]:
random_seed = 3407
torch.manual_seed(random_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [None]:
start_time=time.time()
current_best=0
best_model=0
import torch.optim as optim
for output_channel_1 in output_channel_1s:
    for output_channel_2 in output_channel_2s:
        for lr in lrs:
            model = leNet(hidden_1=hidden_1,hidden_2=hidden_2,output_channel_1=output_channel_1,output_channel_2=output_channel_2)
#             optimizer = optim.SGD(model.parameters(), lr=lr)
            optimizer = optim.Adagrad(model.parameters(), lr=lr)
            for i in range(20):
#                 print(f"epoch started for output_chanel1: {output_channel_1},output_channel_2: {output_channel_2},lr: {lr}")
                model.train()
                for x, y in train_loader:
                    optimizer.zero_grad()
                    output = model(x)
                    loss = loss_function(output, y)
                    loss.backward()
                    optimizer.step()
                if (i % 5) ==0:
#                     model.eval()
                    accuracy = eval_model(model, val_loader)
                    if accuracy > current_best:
                        current_best = accuracy
                        best_model=copy.deepcopy(model)
#                         best_model = model.state_dict()
#                     model.train()
#                     print(f"Epoch {i+1}: accuracy = {accuracy}")                    

In [None]:
# Hidden tests in this cell

In [None]:
# Hidden tests in this cell