# Exercise 03 Notebook - Convolutional Neural Networks

In this exercise we'll implement different convolutional neural networks, that is AlexNet and ResNet18. Further, we'll use a pretrained model and finetune it to further increase the performance.

1. Load Data
2. Implement AlexNet from Scratch
3. Implement ResNet18 from Scratch
4. Implement pretrained ResNet18
5. Implement and finetune a pretrained ResNet18

## 1. Load Data

We'll use the Imagenette data set during this exercise. More specifically, we'll use the Imagenette-320 data set. Imagenette is a very small subset of the Imagenet data set which consists of 9469 training samples and 3925 validation examples. Each sample is a 320x320 image, associated with a label from 10 different classes. More information can be found [here](https://github.com/fastai/imagenette)

You can download the dataset [here](https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz)

Labels:</br>
* 0=n01440764='tench',
* 1=n02102040='English springer'
* 2=n02979186='cassette player'
* 3=n03000684='chain saw'
* 4=n03028079='church'
* 5=n03394916='French horn'
* 6=n03417042='garbage truck'
* 7=n03425413='gas pump'
* 8=n03445777='golf ball'
* 9=n03888257='parachute'

Task 1: Create a *torch.utils.data.Dataset* class for training and validation set, respectively. </br>
*Hint1 : You can use PyTorch's [ImageFolder](https://pytorch.org/vision/stable/datasets.html#imagefolder) class to create a torch.utils.data.Dataset.*</br>
*Hint2: You need to transform each sample such that each image is of size 224 and each image needs to be a tensor. Use PyTorch's [transforms method](https://pytorch.org/vision/stable/transforms.html#torchvision-transforms)* </br>
*Hint3: Enable GPU runtime as follows: Runtime > Change Runtime type > Hardware accelerator > GPU*

Task 2: Create a DataLoader object with batch size = 8 for training and valid set, respectively<br>
* *Hint: Don't forget to shuffle the DataLoader!* </br>

Task 3: Print the shape of one batch (images and labels) and plot one example with image and label

Download and unzip data

In [None]:
!wget -q https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz
!tar zxf /content/imagenette2-320.tgz

In [None]:
import torch

print(torch.cuda.is_available())
if torch.cuda.is_available():
  print(torch.cuda.get_device_name(0))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### 1.1 Create Data Sets (Task 1)

In [None]:
from torchvision.datasets import ImageFolder
from torchvision import *
from tqdm.auto import tqdm

transformer = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])
train_ds = ImageFolder("/content/imagenette2-320/train", transform=transformer)
valid_ds = ImageFolder("/content/imagenette2-320/val", transform=transformer)

### 1.2 Create DataLoader (Task 2)

In [None]:
from torch.utils.data import DataLoader


train_dl = DataLoader(train_ds, batch_size=8, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=8, shuffle=False)

### 1.3 One Example (Task 3)

In [None]:
images, labels = next(iter(train_dl))
print(f"Shape of the batch {images.shape}")
print(f"Number of images/labels in the batch: {images.shape[0]}")
print(f"Number of channels each image has: {images.shape[1]}")
print(f"Size of each image is: {images.shape[2]}x{images.shape[3]}")

In [None]:
import matplotlib.pyplot as plt

index = 3

fig, ax = plt.subplots(1, 1, figsize=(5,5))
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(images[index].permute(1, 2, 0), cmap='gray')
ax.set_title(labels[index], fontsize = 14)

## 2. AlexNet Model from Scratch

We introduced [AlexNet](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) in the lecture. We now want to implement AlexNet from scratch using almost the same architecture as the authors proposed in the original paper and in the ImageNet Large Scale Visual Recognition Competition (ILSVRC)

Task 4: Create a AlexNetFashionCNN class with 5 Convolutional Layers, 3 Pooling Layers and 2 fully connected-layers.
* First convolutional layer: Kernel size: 11x11, stride:4, padding=2, check the number of input/output channels yourself
* First max pooling layer: Window size: 3x3, stride:2
* Second conv layer: Kernel size: 5x5, stride=1, padding:2, check the number of input/output channels: yourself
* Second max pooling layer: Window size: 3x3, stride:2
* Third conv layer: Kernel size: 3x3, stride=1, padding=1, check the number of input/output channels: yourself
* Fourth conv layer: Kernel size: 3x3, stride=1, padding=1, check the number of input/output channels: yourself
* Fifth conv layer: Kernel size: 3x3, stride=1, padding=1, check the number of input/output channels: yourself
* Third max pooling layer: Window size: 3x3, stride=2
* The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer. (except for the last fully-connected layer!)
* *Hint 1: Think about the output of the last layer. Should we apply an activation function? If yes, which one is most appropriate?
* *Hint 2: What is the number of neurons in the first fully-connected layer? Try to figure it out by yourself!

Task 5: Forward propagate the first batch to test whether the network is working as expected.

Task 6: Implement a method which serves as a training loop to train the network
* The method should expect the following parameters: the model which should be trained, number of epochs, optimizer, and loss function
* The method should return the training and test accuracy of each epoch. Optionally, print the accuracy of training and validation for each epoch during training
* This is actually the most difficult part, have a look at the other notebooks which we have already provided

Task 7: Create an optimizer object (such as stochastic gradient descent), define the number of epochs and train the AlexNetFashionCNN
* Play with the number of epochs, and the learning rate of the optimizer until you are satisfied with the results

### 2.1 AlexNetFashionCNN (Task 4)

In [None]:
import torch.nn.functional as F
from torch import nn
import torch
from tqdm.auto import tqdm


In [None]:
class AlexNetFashionCNN(nn.Module):
    def __init__(self):
        super(AlexNetFashionCNN, self).__init__()
        # Create Convolutional Layer
        self.conv_1 = nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride = 4, padding = 2)
        self.conv_2 = nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride = 1, padding = 2)
        self.conv_3 = nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, stride = 1, padding = 1)
        self.conv_4 = nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, stride = 1, padding = 1)
        self.conv_5 = nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, stride = 1, padding = 1)
        # Create max pooling layer
        self.pooling_1 = nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2))
        self.pooling_2 = nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2))
        self.pooling_3 = nn.MaxPool2d(kernel_size=(3, 3), stride=(2, 2))
        # Fully-connected Layer
        self.fc_1 = nn.Linear(in_features=9216, out_features=4096)
        self.fc_2 = nn.Linear(in_features=4096, out_features=10)

    def forward(self, X):

        X = F.relu(self.conv_1(X))    # 96x55x55
        X = self.pooling_1(X)         # 96x27x27
        X = F.relu(self.conv_2(X))    # 256x27x27
        X = self.pooling_2(X)         # 256x13x13
        X = F.relu(self.conv_3(X))    # 384x13x13
        X = F.relu(self.conv_4(X))    # 384x13x13
        X = F.relu(self.conv_5(X))    # 256x13x13
        X = self.pooling_3(X)         # 256x6x6
        X = X.flatten(1)              # 9216x1
        X = F.relu(self.fc_1(X))      # 4096
        X = self.fc_2(X)              # 10

        return X

In [None]:
alexnet_model = AlexNetFashionCNN()

### 2.2 Forward Propagation (first batch) (Task 5)

In [None]:
alexnet_model(images)

### 2.3 Training Loop (Task 6)

In [None]:
def training(net, n_epochs, optimizer, loss_function, verbose=True):
  # Store the losses for each epoch
  loss_train_list = []
  loss_valid_list = []

  # Store the accuracy for each epoch
  acc_train_list = []
  acc_valid_list = []

  # Iterate over the dataset n_epochs times
  for epoch in range(n_epochs):
    net.train()  # net.train() will notify all your layers that you are in training mode

    train_loss = 0  # Training loss in epoch
    num_train_correct  = 0
    num_train_examples = 0

    # For each batch, pass the training examples, calculate loss and gradients and optimize the parameters
    for xb, yb in tqdm(train_dl, total=len(train_dl)):
      optimizer.zero_grad()  # zero_grad clears old gradients from the last step

      xb = xb.to(device)
      yb = yb.to(device)

      y_hat = net(xb)  # Forward pass
      loss = loss_function(y_hat, yb)  # Calculate Loss

      loss.backward()  # Calculate the gradients (using backpropagation)
      optimizer.step()  # # Optimize the parameters: opt.step() causes the optimizer to take a step based on the gradients of the parameters.

      train_loss += loss.item()
      num_train_correct += (torch.max(y_hat, 1)[1] == yb).sum().item()
      num_train_examples += xb.shape[0]

    train_acc = num_train_correct / num_train_examples
    train_loss /= num_train_examples
    valid_loss = 0  # Validation loss in epoch
    num_val_correct  = 0
    num_val_examples = 0

    net.eval()  # net.eval() will notify all your layers that you are in evaluation mode
    # torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed
    # up computations but you won’t be able to backprop (which you don’t want in an evaluation script).
    with torch.no_grad():
      # Perform a prediction on the validation set
      for xb_valid, yb_valid in tqdm(valid_dl, total=len(valid_dl)):
        xb_valid = xb_valid.to(device)
        yb_valid = yb_valid.to(device)

        y_hat = net(xb_valid)  # Forward pass
        loss = loss_function(y_hat, yb_valid)  # Calculate Loss

        valid_loss += loss.item()
        num_val_correct += (torch.max(y_hat, 1)[1] == yb_valid).sum().item()
        num_val_examples += xb_valid.shape[0]

    val_acc = num_val_correct / num_val_examples
    valid_loss /= num_val_examples

    if verbose:
      print(f"Train Loss in epoch {epoch}: {train_loss:.2f}")
      print(f"Validation Loss in epoch {epoch}: {valid_loss:.2f}")
      print(f"Train Accuracy in epoch {epoch}: {100 * (train_acc):.2f}")
      print(f"Validation Accuracy in epoch {epoch}: {100 * (val_acc):.2f}\n")
      print("\n")

    loss_train_list.append(train_loss)
    loss_valid_list.append(valid_loss)
    acc_train_list.append(100 * (train_acc))
    acc_valid_list.append(100 * (val_acc))

  return acc_train_list, acc_valid_list, loss_train_list, loss_valid_list

### 2.4 Training (Task 7)

In [None]:
from torch import optim

optimizer = optim.SGD(alexnet_model.parameters(), lr=0.01)
EPOCHS = 10

# Create Loss Function
loss_function = nn.CrossEntropyLoss()

alexnet_model.to(device)

acc_train, acc_valid, loss_train, loss_valid = training(net=alexnet_model, n_epochs=EPOCHS, optimizer=optimizer, loss_function=loss_function)

## 3. ResNet18 from Scratch

We introduced [ResNet18](https://arxiv.org/abs/1512.03385) in the lecture. Now, we want to build the ResNet18 model from scratch using the same architecture as in the original paper!

Task 8: Instead of implementing the entire architecture yourself, you can use PyTorch's [torchvision-models](https://pytorch.org/vision/stable/models.html#torchvision-models) to download the architecture. Please don't use the pretrained model and explicitly set pretrained=False when using PyTorchs torchvison models.

Task 9: Use the *fit method* defined above to train the ResNet18 model

Task 9.1 (optional): Why is this approach strange?

In [None]:
from torchvision.models import resnet18
from tqdm.auto import tqdm


### 3.1 Create ResNet18 Model (Task 8)

In [None]:
resnet18_model = resnet18(pretrained=False)

### 3.2 ResNet18 Training (Task 9)

In [None]:
from torch import optim

optimizer = optim.Adam(resnet18_model.parameters(), lr=0.01)
EPOCHS = 5

# Create Loss Function
loss_function = nn.CrossEntropyLoss()

resnet18_model.to(device)

acc_train, acc_valid, loss_train, loss_valid = training(net=resnet18_model, n_epochs=EPOCHS, optimizer=optimizer, loss_function=loss_function)

### 3.2.1 ResNet18 CrossEntropy (optional)

Let's say we were given 8 images. And each of these images is exactly one of 10 different classes. For example:

In [None]:
print(labels)
print(labels.shape)

#### 3.2.1.1 Using ResNet18 without own, fully-connected layer

For each image, the model outputs the logits for 1000 different classes (as the neural network was originally trained on a dataset with 1000 different classes)

In [None]:
images.shape

In [None]:
resnet_output = resnet18_model(images.to(device))
resnet_output.shape

In [None]:
nn.CrossEntropyLoss()(resnet_output, labels.to(device))

Although the shapes of the predicted values (resnet_output) and ground truth (labels) are different, we can calculate the CrossEntropyLoss. Why? Because the CrossEntropyLoss only looks at indices. It first uses Softmax to determine the logit with the highest value and then selects its index.

#### 3.2.1.2 Using ResNet18 with own, fully-connected layer

For each image, the model outputs the logits for 10 different classes

In [None]:
resnet18_model.fc = nn.Linear(512, 10)

In [None]:
resnet18_model.to(device)

In [None]:
resnet_output = resnet18_model(images.to(device))
resnet_output.shape

In [None]:
nn.CrossEntropyLoss()(resnet_output, labels.to(device))

## 4. Pretrained ResNet18

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.

Hence, we'll now use a ResNet18 model pretrained on ImageNet.

Task 10: Download the ResNet18 architecture and weights from PyTorch, create a pretrained ResNet18 CNN, and replace the last fully-connected layer
* Again, you can use PyTorch's [torchvision-models](https://pytorch.org/vision/stable/models.html#torchvision-models). But this time, set pretrained=True in order to download the weights from the pretrained model
* *Hint: Note that the ResNet18 was pretrained on the ImageNet data set where the task was to classify 1000 different labels. Here we will only need 10 classes. Thus, you need to **replace** the last fully-connected layer with one which only produces 10 outputs instead of 1000.*

Task 11: Use the *fit method* defined above to train the pretrained ResNet18 model
* Again, play with the optimizer and the number of epochs until you are satisfied with the results (in terms of accuracy on training and validation set)

In [None]:
from torchvision.models import resnet18
from tqdm.auto import tqdm


### 4.1 Create Pretrained ResNet18 Model (Task 10)

In [None]:
resnet18_pre_model = resnet18(pretrained=True)
resnet18_pre_model.fc = nn.Linear(512, 10)

### 4.2 Pretrained ResNet18 Training (Task 11)

In [None]:
for p in resnet18_pre_model.parameters():
    print(p)
    break

In [None]:
from torch import optim

optimizer = optim.SGD(resnet18_pre_model.parameters(), lr=0.1)
EPOCHS = 5

# Create Loss Function
loss_function = nn.CrossEntropyLoss()

resnet18_pre_model.to(device)

acc_train, acc_valid, loss_train, loss_valid = training(net=resnet18_pre_model, n_epochs=EPOCHS, optimizer=optimizer, loss_function=loss_function)

## 5. Finetuning Pretrained ResNet18

In the previous exercise (Chapter 4), we retrained the entire pretrained ResNet18 model using the Imagenette data set. Although the results were already pretty good, it might be better to only train the custom, final fully-connected layer. Remember, that every layer in a pretrained model is already optimized except for the custom final fully-connected layer which is randomly initialized. Thus, in this exercise we want to **only train the custom fully-connected layer** (only update the weights of the custom fully-connected layer) and keep the other layers (and weigts).

Task 12: Download the ResNet18 architecture and weights from PyTorch, create a pretrained ResNet18 CNN, freeze all layers, and replace the last fully-connected layer
* *Hint 1: In order to "freeze" a layer, you need to set its requires_grad Parameter to False
* *Hint 2: First freeze all layers, than add the custom, final fully-connected layer

Task 13: Use the *fit method* defined above to train only the last layer of the pretrained ResNet18 model
* Note how the training is much faster (as we don't need to compute the gradients for each layer!)

In [None]:
from torchvision.models import resnet18

### 5.1 Create Pretrained ResNet18 Model (Task 12)

In [None]:
resnet18_fine_model = resnet18(pretrained=True)

for param in resnet18_fine_model.parameters():
    param.requires_grad = False

resnet18_fine_model.fc = nn.Linear(512, 10)

In [None]:
for p in resnet18_fine_model.parameters():
    print(p.requires_grad)
    break

### 5.2 Pretrained ResNet18 Fine Tuning (Task 13)

In [None]:
from torch import optim

optimizer = optim.SGD(resnet18_fine_model.parameters(), lr=0.1)
EPOCHS = 5

# Create Loss Function
loss_function = nn.CrossEntropyLoss()

resnet18_fine_model.to(device)

acc_train, acc_valid, loss_train, loss_valid = training(net=resnet18_fine_model, n_epochs=EPOCHS, optimizer=optimizer, loss_function=loss_function)

Done