# Machine Vision - Assignment 7: Deep Learning

In this exercises you will apply different concepts of deep learning in order to classify images of traffic signs. While working through this notebook, different links to official web-sites or blog-posts are provided for additional information.
This exercise uses the Pytorch framework, which is one of the most popular deep learning frameworks.
If you are new to pytorch please follow this introduction: [PyTorch Introduction](https://pytorch.org/tutorials/beginner/basics/intro.html)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import ConfusionMatrixDisplay
import torch
from torch import nn
from torch.utils.data import DataLoader
from tqdm import tqdm
import torchvision
from torchvision.transforms import v2

##### Preparation

##### German Traffic Sign Recognition Benchmark

The German Traffic Sign Recognition Benchmark [(GTSRB)](https://benchmark.ini.rub.de/) is a competition that was held at the IJCNN 2011. In this competition images of traffic signs should be classified.
You will implement your own neural network to classify a subset of the GTSRB dataset. This subset consists of `12` different classes, which are shown in the figures below. However, you are free to extend your solution to the full dataset.


|---|------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|--------------------------------|--------------------------------|--------------------------------|
|  ![Class 0](res/images/0.png) | ![Class 1](res/images/6.png) | ![Class 2](res/images/16.png) | ![Class 3](res/images/17.jpg) | ![Class 4](res/images/19.png) | ![Class 5](res/images/22.jpg) | ![Class 6](res/images/28.png) | ![Class 7](res/images/29.png) | ![Class 8](res/images/32.png) | ![Class 9](res/images/33.png) | ![Class 10](res/images/38.png) | ![Class 11](res/images/40.png) |
<br></br>

In order to simplify this exercise, the raw GTSRB images are already transformed into a dataset, where each image has the shape of `[C,H,W]` (Height x Width x Channels) with values ranging from `0-1`.
Furthermore, the dataset is split into a train-, validation- and test-dataset, where the train- and validation-datasets are provided.

In [None]:
NUM_CLASSES = 12

train_ds = torchvision.datasets.ImageFolder("data_train", transform=v2.Compose([
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
]))

val_ds = torchvision.datasets.ImageFolder("data_val", transform=v2.Compose([
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
]))

train_dl = DataLoader(dataset=train_ds, batch_size=16, shuffle=True)
val_dl = DataLoader(dataset=val_ds, batch_size=16, shuffle=False)

Which means, that each label is a vector of 12 entries, where only the entry of the class has the value $1$ and all others values are $0$The `torchvision.datasets.ImageFolder` is a simple way to represent classification datasets. For more information you can read it up here: [ImageFolder](https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html)

Furthermore, the standard pytorch dataloader class is used to create an iterator based on an `torch.utils.data.Dataset` class.
Each iteration, the dataloader returns a batch of `x = [Bx3x32x32]` images and `y = [Bx12]` class labels, where B is the batch size.

There are different approaches to encode the class label. For further information you can read this blog entry [integer- or one-hot-encoding](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/).
In this exercise the labels are encoded in the integer format.

In [None]:
x_batch, y_batch = next(iter(train_dl))

# @student print the image and label shape of a batch

# @student show one image of the batch and its label

##### Execution

In order to compare models against each other metrics are calculated on an unseen test dataset.
(It will be uploaded during the exam session)

In this exercise you should try to develop your own model.
If you are new to pytorch, this section of the introduction is about creating your model [PyTorch Create A Model](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html).


In [None]:
# @student implement your model here (nn.Module)

For inspiration you can take a look at these ground-breaking publications:
[LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)
[AlexNet](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
[GoogLeNet](https://arxiv.org/pdf/1409.4842.pdf)
[ResNet](https://arxiv.org/pdf/1512.03385.pdf)
[Vit](https://arxiv.org/abs/2010.11929)

Additionally, you can also add data augmentation to the training data in order to improve the generalization of your model.
[Torchvision Augmentation](https://pytorch.org/vision/stable/transforms.html)

In [None]:
model = ...
print(model)

optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
loss_fn = torch.nn.CrossEntropyLoss()

Below is the training process of the model.
If your machine supports gpu acceleration uncomment line 6.

In [None]:
for epoch in range(10):
    print(f"epoch {epoch+1}")
    pbar = tqdm(enumerate(train_dl))
    running_loss = []
    for i, (imgs, labels) in pbar:
        # @student: uncomment if your machine has GPU support
        #imgs, labels = imgs.to("cuda"), labels.to("cuda")

        optimizer.zero_grad()

        preds = model(imgs)
        loss = loss_fn(preds, labels)
        loss.backward()
        optimizer.step()

        running_loss.append(loss.detach().numpy())
        pbar.set_description(f"loss {np.mean(running_loss):.3} - " )
        pbar.update()

#### From Logits to Labels

Your network will output logits and not the final predictions.
Hence, you further need to calculate the predicted label based on the logits.

In [None]:
def eval(model, dl):
    y_labels = []
    pred_labels = []
    model.eval()
    with torch.no_grad():
        for i, (imgs, labels) in enumerate(dl):
            logits = model(imgs)

            # @student: calculate class predictions based on the logits
            preds = ...

            pred_labels.extend(preds.detach().numpy())
            y_labels.extend(labels.detach().numpy())

    from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score
    cm = confusion_matrix(y_labels, pred_labels)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm)
    disp.plot()
    plt.show()

    print(f"f1: {f1_score(y_labels, pred_labels, average='macro')}, p: {precision_score(y_labels, pred_labels, average='macro')}, r: {recall_score(y_labels, pred_labels, average='macro')}")

In [None]:
eval(model, val_dl)

Once you feel confident with your model evaluate it one last time on the test set.
The test set will be uploaded during the exercise session.

In [None]:
# @student: final check of your fully trained model
test_ds = torchvision.datasets.ImageFolder("data_test", transform=v2.Compose([
    v2.PILToTensor(),
    v2.ToDtype(torch.float32, scale=True),
]))
test_dl = DataLoader(dataset=test_ds, batch_size=16, shuffle=False)

eval(model, test_dl)