# Machine Vision 2022/23 - Assignment 7: Deep Learning

In this exercises you will apply different concepts of deep learning in order to classify images of traffic signs. While working through this notebook, different links to official web-sites or blog-posts are provided for additional information.
This exercise uses the Tensorflow framework, which is one of the most popular deep learning frameworks.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import ConfusionMatrixDisplay
import tensorflow as tf
import logging
tf.get_logger().setLevel(logging.ERROR)

##### Preparation

##### German Traffic Sign Recognition Benchmark

The German Traffic Sign Recognition Benchmark [(GTSRB)](https://benchmark.ini.rub.de/) is a competition that was held at the IJCNN 2011. In this competition images of traffic signs should be classified.
You will implement your own neural network to classify a subset of the GTSRB dataset. This subset consists of `12` different classes, which are shown in the figures below. However, you are free to extend your solution to the full dataset.


|---|------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|--------------------------------|--------------------------------|--------------------------------|
|  ![Class 0](res/images/0.png) | ![Class 1](res/images/6.png) | ![Class 2](res/images/16.png) | ![Class 3](res/images/17.jpg) | ![Class 4](res/images/19.png) | ![Class 5](res/images/22.jpg) | ![Class 6](res/images/28.png) | ![Class 7](res/images/29.png) | ![Class 8](res/images/32.png) | ![Class 9](res/images/33.png) | ![Class 10](res/images/38.png) | ![Class 11](res/images/40.png) |
<br></br>

In order to simplify this exercise, the raw GTSRB images are already transformed into a dataset, where each image has the shape of `[H,W,C]` (Height x Width x Channels) with values ranging from `0-1`.
Furthermore, the dataset is split into a train-, validation- and test-dataset, where the train- and validation-datasets are provided.

In [None]:
NUM_CLASSES = 12

def get_datasets(*, images, labels, shuffle=True):
    ds = tf.data.Dataset.from_tensor_slices((np.load(images), np.load(labels)))

    if shuffle:
        ds = ds.shuffle(1024)

    return ds.batch(32)

# Create a dataset for training
train_ds = get_datasets(
    images="res/images_train.npy",
    labels="res/labels_train.npy",
)
# Create a dataset for validation
val_ds = get_datasets(
    images="res/images_val.npy",
    labels="res/labels_val.npy",
    shuffle=False
)

The `gen_datasets` function creates a [Tensorflow dataset](https://blog.tensorflow.org/2019/02/introducing-tensorflow-datasets.html) from two numpy arrays containing the images and the corresponding ground truth class labels.

A Tensorflow dataset is a python class, which provide an easy way iterate a dataset. Each iteration, the dataset returns a batch of `x = [Bx32x32x3]` images and `y = [Bx12]` class labels, where B is the batch size.

There are different approaches to encode the class label. For further information you can read this blog entry [integer- or one-hot-encoding](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/).
In this exercise the labels are encoded in one-hot format. Which means, that each label is a vector of 12 entries, where only the entry of the class has the value $1$ and all others values are $0$.

In [None]:
x_batch, y_batch = next(iter(train_ds))

# @student: print the image and label shape of a batch
...

In [None]:
# @student: show one image of the batch and its label
...

##### Execution

In order to compare models against each other metrics are calculated on an unseen test dataset. In this exercise you should try to develop your own model and upload the `final_model` directory to ilias.
All submitted models are compared to each other using the accuracy metric.

**The student that submits the best model will be given the chance to be in the passenger seat while our autonomous car drives around the adenauer ring**.

In [None]:
def create_model(image_shape):
    input = tf.keras.layers.Input(shape=image_shape)
    # @student: implement your neural network

    outputs = ...

    return tf.keras.Model(inputs=input, outputs=outputs)

In the `create_model` function you can define your own neural network using the [functional tensorflow API](https://www.tensorflow.org/guide/keras/functional).
You can also use other tensorflow APIs like the [SequentialModel](https://www.tensorflow.org/guide/keras/sequential_model). However, when transitioning to complex architectures the functional API provides the more flexibility.


You are free to use any layer and activation provided by tensorflow for example:
[tf.keras.layers.Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)
[tf.keras.layers.Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)
[tf.keras.layers.AveragePooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D)
[tf.keras.layers.Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten)
[tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)
[tf.keras.activations.relu](https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu)

For inspiration you can take a look at these ground-breaking publications:
[LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)
[AlexNet](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
[GoogLeNet](https://arxiv.org/pdf/1409.4842.pdf)
[ResNet](https://arxiv.org/pdf/1512.03385.pdf)
[ViT](https://arxiv.org/abs/2010.11929)
[Swin Transformer](https://arxiv.org/abs/2103.14030)
[ConvNext](https://arxiv.org/abs/2201.03545)

Additionally, you can also add data augmentation to the training data in order to improve the generalization of your model.

In [None]:
model = create_model(image_shape=(32,32,3))
model.summary()

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-3)
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
acc_metric = tf.keras.metrics.CategoricalAccuracy(name="accuracy")
model.compile(optimizer, loss=loss_fn, metrics=[acc_metric])
model.fit(
    train_ds,
    epochs=25,
    validation_data=val_ds,
    callbacks=[
        tf.keras.callbacks.ModelCheckpoint(filepath="final_model", monitor='val_accuracy', mode='max', save_best_only=True)
    ]
)

In [None]:
model = tf.keras.models.load_model("final_model/")
loss, acc = model.evaluate(val_ds)

#### From Logits to Labels

Your network will output logits as long as you use `tf.keras.losses.CategoricalCrossentropy` as loss function.
However, if you want to manually infer you have to calculate the predicted label based on the logits.

In [None]:
pred_logit = model.predict(val_ds)
y_labels = [labels for _, labels in val_ds.unbatch()]
y_labels = tf.argmax(y_labels, axis=1)


# @student: pred_logit are the predicted logits of your model, while y_labels are the ground truth labels.
# calculate the labels from the logits
...
pred_labels = ...


confusion_matrix = tf.math.confusion_matrix(y_labels, pred_labels, num_classes=NUM_CLASSES)
disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix.numpy())
disp.plot()
plt.show()