# Convolutional neural network

### Libraries and Variables

In [None]:

import torchvision
from torch.utils.data import DataLoader
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
from tqdm import tqdm
import torch.nn.functional as nnf
import numpy as np
import os
from torchinfo import summary
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

home_dir = os.path.expanduser('~')
raw_data_dir = os.path.join(home_dir, 'repos/DaNuMa2024/data/raw_data')
output_data_dir = os.path.join(home_dir, 'repos/DaNuMa2024/data/output_data')

### Overview

In this notebook, you will enhance the MLP architecture from the last exercise with a well-known regularization technique, namely "dropout". Furthermore, you will implement a convolutional neural network and demonstrate its superiority over the MLP when it comes to image processing.

### Data

The CIFAR10 dataset is a well-known dataset for image classification. The torchvision library directly implements functionality for creating a torch dataset for this and many other standard datasets. For faster training, we only consider every 5th image by subsetting the original dataset. The inputs to a neural network are usually normalized. We use the training dataset to calculate the mean RGB value across all images and pixel values as well as their standard deviation. Then we define a transformation that normalizes input tensors with these values: $x_{norm} = (x - mean_x) / std_x$. \
We use the torchvision.transforms object which is a popular transformation tool for images that implements many transformations. What other transformations / augmentations could be added? Feel free to read the torchvision.transforms documentation and add further components to the pipeline!

In [14]:
# calculate normalization values
data_root = os.path.join(raw_data_dir, '4_convnet')
trainset = torchvision.datasets.CIFAR10(root=data_root, train=True, transform=torchvision.transforms.ToTensor())
every_fifth_idx = list(range(0, len(trainset), 5))
trainset = torch.utils.data.Subset(trainset, every_fifth_idx)

num_samples = trainset.dataset.data.shape[0]
trainloader = torch.utils.data.DataLoader(trainset, batch_size=num_samples, 
                                            num_workers=2)
imgs, _ = next(iter(trainloader))
# the argument "dim" defines over which dimesions the mean and standard deviation of a tensor is calculated 
# In this case: dim 0=batch and dim (2, 3) = (height, width) --> we get the mean RGB value since the channel dimension is dim 1
dataset_mean = torch.mean(imgs, dim=(0, 2, 3))
dataset_std = torch.std(imgs, dim=(0, 2, 3))

normalized_transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(dataset_mean, dataset_std)
])

With the normalize transform we just defined we can now instantiate the datasets and dataloaders.

In [15]:
# get train and validation set and dataloaders
trainset = torchvision.datasets.CIFAR10(root=data_root, train=True, transform=normalized_transform)
every_fifth_idx = list(range(0, len(trainset), 5))
trainset = torch.utils.data.Subset(trainset, every_fifth_idx)
valset = torchvision.datasets.CIFAR10(root=data_root, train=False, transform=normalized_transform)
every_fifth_idx = list(range(0, len(valset), 5))
valset = torch.utils.data.Subset(valset, every_fifth_idx)

trainloader = DataLoader(trainset, batch_size=1000, shuffle=True, num_workers=2)
valloader = DataLoader(valset, batch_size=1000, shuffle=False, num_workers=2)

We use the same normalize transform for both the training and validation set. Why do we not use the validation set for the calculation of the normalization values?

######### YOUR ANSWER HERE:

Let's visualize 10 of the images to get an idea about what kind of dataset we are dealing with. When running the code as it is, you will get a warning that the pixels have unexpected values and the colors of the images look a little bit off. 
* Why is that the case? Transform the images to look normal again! Hint: Broadcasting might come in handy for the back-transformation.
* What is the size of the images?

In [None]:
# visualize images
classes = trainset.dataset.classes
print(classes)
fig, axs = plt.subplots(1, 10, figsize=(20, 5))
torch.manual_seed(0)

for ax in axs:
    # select random image from the batch
    batch = next(iter(trainloader))
    images, labels = batch
    random_idx = torch.randint(0, len(labels), (1,))
    image = images[random_idx].squeeze()
    label = labels[random_idx]

    ######### YOUR CODE HERE:
    
    # plot the image together with the class name
    class_name = classes[label]
    ax.set_title(class_name)
    ax.axis('off')
    ax.imshow(image.permute(1,2,0))

### MLP models

First, we will investigate how well MLPs are able to deal with the class prediction task. Since images are high-dimensional inputs, we will naturally need bigger models compared to the MLP exercise where we had a single input value. However, using bigger models also increases the risk of overfitting because there are more neurons available with which the model can learn the inputs by heart! There are various possibilities to prevent overfitting such as *data augmentation*, *l2 regularization*, *using a validation set* or *dropout*. In this exercise, we will specifically explore the use of dropout. Your task is to modify the MLP code from the last exercise to fulfil the following:

* Linear layers take one-dimensional data as input. Images are multi-dimensional. Modify the network in a way that images can be processed by these linear layers. Hint: you might have encountered helpful functionality in the pytorch introduction :)
* read sections 1, 2, 3 and 5 of this short medium article about dropout: https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9
* Modify the network to include dropout after each linear layer except the last one. The last layer makes the prediction so we do not want too much uncertainty there.
* When instantiating the model, you should be able to define the dropout ratio. Add it as an argument to the constructor!

In [2]:
# Modify the code!
class MLP(nn.Module):
    def __init__(self, layer_sizes):
        super().__init__()

        modules = []
        for i in range(len(layer_sizes)-1):
            modules.append(nn.Linear(layer_sizes[i], layer_sizes[i+1]))

            if not i == len(layer_sizes)-2:
                modules.append(nn.ReLU())

        self.layers = nn.Sequential(*modules)


    def forward(self, x):
        return self.layers(x)

Use the summary function from the torchinfo package (already imported) to visualize the model structure. It takes as first argument the model (need to instantiate first) and as the second argument the input size

In [None]:
######### YOUR CODE HERE:

### training loop

In principle, we can use the same train and validation functions as in the last exercise (see cell below). However, you have to modify them a little bit for the task at hand:

* Images are already multi-dimensional. So adding an empty dimension is not necessary anymore.
* In addition to the loss, the functions should also return the accuracy (percentage of correctly predicted classes). Hint: You can use the argmax function to obtain the prediction and compare it with the ground truth labels. The average accuracy across the whole dataset can be calculate in the same way as the average loss.
* We do not use the mse loss anymore but rather need the cross entropy loss (nnf.cross_entropy) to compare the predicted probabilities with the ground truth classes



In [None]:
# Modify the code!
def train_one_epoch(model, trainloader, optimizer, device):
    model.train()
    total_loss = 0
    for x_batch, y_batch in trainloader:
        x_batch = x_batch[:, None].float().to(device)
        y_batch = y_batch[:, None].float().to(device)
        y_pred = model(x_batch)

        optimizer.zero_grad()
        loss = nnf.mse_loss(y_pred, y_batch)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(trainloader)

def validate(model, valloader, device):
    model.eval()
    total_loss = 0
    with torch.no_grad():
        for x_batch, y_batch in valloader:
            x_batch = x_batch[:, None].float().to(device)
            y_batch = y_batch[:, None].float().to(device)
            y_pred = model(x_batch)
            loss = nnf.mse_loss(y_pred, y_batch)
            total_loss += loss.item()
    return total_loss / len(valloader)

Once the train_one_epoch and validate functions are defined, you should be able to run the below code block to train the model. Try out different dropout ratios. How does it affect the training and validation accuracy?

######### YOUR ANSWER HERE:

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"The model is running on {device}.")

# training parameters
epochs = 25
lr = 0.001
val_interval = 1

# save best model state dict
save_dir_state_dict = os.path.join(output_data_dir, '4_convnet')
os.makedirs(save_dir_state_dict, exist_ok=True)
save_path_state_dict = os.path.join(save_dir_state_dict, 'best_mlp.pth')

# instantiate model and optimizer
layer_sizes = [3*32*32, 384, 384, 384, 10]
dropout_ratio = 0.3
model = MLP(layer_sizes, dropout_ratio).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)

# training loop
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []
min_val_loss = float('inf')

for epoch in tqdm(range(epochs)):
    train_loss, train_accuracy = train_one_epoch(model, trainloader, optimizer, device)
    train_losses.append(train_loss)
    train_accuracies.append(train_accuracy)

    if epoch % val_interval == 0:
        val_loss, val_accuracy = validate(model, valloader, device)
        val_losses.append(val_loss)
        val_accuracies.append(val_accuracy)

    if val_loss < min_val_loss:
        torch.save(model.state_dict(), save_path_state_dict)
        min_val_loss = val_loss

In [None]:
####################### plot losses
plt.plot(np.linspace(1, epochs, epochs), train_losses, c='blue', label='Training Loss')
plt.plot(np.linspace(1, epochs, epochs), val_losses, c='red', label='Validation Loss')

# Mark the minimum validation loss
index = np.argmin(val_losses)
plt.plot(index+1, val_losses[index], 'kx', label='Min Validation Loss')

# Adding labels and legend
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
####################### plot accuracies
plt.plot(np.linspace(1, epochs, epochs), train_accuracies, c='blue', label='Training accuracy')
plt.plot(np.linspace(1, epochs, epochs), val_accuracies, c='red', label='Validation accuracy')

# Mark the maximum validation accuracy
index = np.argmax(val_accuracies)
plt.plot(index+1, val_accuracies[index], 'kx', label='max Validation accuracy')

# Adding labels and legend
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# print maximum accuracy
print(f'maximum validation accuracy: {np.max(val_accuracies)}')

### convnet

Using MLPs for image classification tasks certainly works to some extent. However, we already discussed that it is certainly not optimal since it does not make use of inherent image characteristics, such as recurring features at different positions. To learn such reatures, convolutional neural networks are the model of choice:

* Implement a simple convolutional neural network with 5 convolutional layers (kernel_size=3, stride=1, padding='valid', channel sizes [3, 32, 32, 64, 64, 64])
* Each convolutional layer should be followed by a batch norm and relu layer
* The feature map after these 5 layers should be pooled (e.g. with nn.AdaptiveAvgPool2d) and then mapped onto 10 classes with a final linear layer.

In [15]:
class ConvNet(nn.Module):
    ######### YOUR CODE HERE:
    pass

Once again, you can use the summary function from the torchinfo package (already imported) to visualize the model structure.

In [None]:
######### YOUR CODE HERE:

We can use exactly the same training loop as before and only need to change the model! This is the beauty of code modularity. Since the model has exactly the same input and output shape we can just plug it in and leave the rest unchanged. 
* Train the model with the standard training parameters provided in the above training loop.
* Visualize the loss and accuracy also using the code from above

In [None]:
# training loop
######### YOUR CODE HERE:

In [None]:
# loss curves
######### YOUR CODE HERE:

In [None]:
# accuracies
######### YOUR CODE HERE:

### confusion matrix

The accuracy is a rough measure of model performance. To get a more detailed picture of the model's strengths and weaknesses, we can look at the confusion matrix. The confusion matrix requires as input a list or numpy array of all predicted and ground truth labels.
* Obtain all predictions and ground truth labels using your model with the best validation loss. Hint: The code is very similar to the one in the validate function.
* Look at the resulting confusion matrix. Which classes are confused most often?

In [None]:
# get valset and classes
valset = torchvision.datasets.CIFAR10(root=data_root, train=False, transform=normalized_transform)
classes = valset.classes
every_fifth_idx = list(range(0, len(valset), 5))
valset = torch.utils.data.Subset(valset, every_fifth_idx)
valloader = DataLoader(valset, batch_size=1000, shuffle=False, num_workers=2)


######### YOUR CODE HERE:
# load the best model
# get all predictions and ground truth labels
all_labels = None
all_preds = None






# Compute the confusion matrix
cm = confusion_matrix(all_labels, all_preds)

# Display the confusion matrix with class names
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)
disp.plot(cmap=plt.cm.Blues, xticks_rotation='vertical')
plt.show()

### Bonus:

If you still have time, you can try to tune some of the hyperparameters, modernize the architecture a little bit (e.g. strided convolutions), add data augmentations etc. and try to nudge the validation performance even higher. Everything over 60% accuracy is certainly good :)

### further reads:

These reads require some understanding of mathematical notation. \
original dropout paper: https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf \
There are many articles on regularization in neural networks in general. Here is one of them: https://www.pinecone.io/learn/regularization-in-neural-networks/