# Training an Image Classificaton Network on CIFAR-100 using PyTorch

This is the second practical exercise of our course [Applied Edge AI](https://learn.ki-campus.org/courses/edgeai-hpi2022).
In this exercise, we will introduce basic PyTorch functionalilty for the training of image classification models.
We will use the CIFAR-100 dataset as an example dataset because training on this dataset does not require too many resources and should easily be doable as an exercise.

This notebook contains code that is incomplete.
Please fill all gaps with your code and train a model.
In the graded quiz at the end of the week, we might ask some questions that deal with this exercise, so make sure to do the exercise (and have your output handy) **before** taking the quiz!

# The CIFAR-100 Dataset

![cifar100 image](https://web.stanford.edu/~hastie/CASI_files/DATA/cifar100.jpg)

The CIFAR-100 database is a dataset containing `100` classes, where each class contains `600` images, which are divided into `500` images for training and `100` images for testing.
The images have a resolution of `32x32` pixels.
If you want to know more about the CIFAR-100 dataset, please [click here](https://www.cs.toronto.edu/~kriz/cifar.html).

We will use this dataset to train an image classification model with 100 classes.  
To do this, we will build a deep learning network with PyTorch by following these steps:

1. write code to load and prepare the data for training
1. write code that defines the neural network we wish to use
1. write code that performs the training and upate of our network using the data

## Loading the Dataset

We already added the CIFAR-100 dataset as input to this notebook.
You can access the data via `/kaggle/input/cifar100`.
There, you can find the files `meta`, `test`, and `train`.

Let us first have a look at the `meta` file:

In [1]:
import pickle

from pathlib import Path
from pprint import pprint  # pprint is a function to pretty print dictionaries and lists

labels = pickle.load(Path('/kaggle/input/cifar100/meta').open('rb'))
pprint(labels, compact=True)

As we can see, this file contains a dictionary with all label names.
On the one hand the coarse label names of CIFAR-100/20 and on the other hand the fine label names of CIFAR-100.  
We are interested in the fine labels CIFAR-100.
The list of fine labels does not only provide us the name of each class, we also receive the id of each class as the index of the label name in the list.

Alright let's now have a look at the training and testing data:

In [2]:
train_data = pickle.load(Path('/kaggle/input/cifar100/train').open('rb'), encoding='bytes')
print(f'Train Data Keys: {list(train_data.keys())}')

test_data = pickle.load(Path('/kaggle/input/cifar100/test').open('rb'), encoding='bytes')
print(f'Test Data Keys: {list(test_data.keys())}')

### Axis Order

Note, that when working with deep learning frameworks, datasets, and image libraries the meaning of the array axes is quite important.
The libraries we use expect a certain order, since they need to "know" whether an array with a shape of 3x32x32 is one RGB image or three grayscale images.
Thus you often will recognize that a particular code, such as NCHW or NHWC is used to describe the expected order of axes.
In this code, the letters usually stand for the following:

- N: *batch axes* (selecting one item along this axis would select one complete image)
- C: *color/channel axes*
- H: *height axes*
- W: *width axes*
- D: *depth axes* (usually only used when working with 3d images or tensors)

### Reordering the Axes

We need to figure out which format the axes are in, when loading the data, since the data we are loading from CIFAR is not shaped and ordered correctly yet.
Our data loader should return the data of a single image in HWC format, thus it makes sense to store the data in NHWC format first (we can simply select one image with the `[]` operator in python).

Therefore, we should do some more data exploration to figure out how the data is stored in the given files.
Here, it makes sense to examine the `shape` of the stored arrays and visualize some images, because we will need to do this to successfully load the images in our data loader.
Also remember that the CIRAR data consists of RGB images (3 color channels) with a size of 32x32 pixels.

There are two options: visualize a single image (easier), or visualize a set of images (harder).
In the following we provide a few hints for both approaches to this task:

- Load the image data from either `train_data` or `test_data` (already done below)
- Determine the correct shape of the array and reshape it accordingly
- (A) Single Image:
    - Tranpose the array of all images to NHWC with [transpose](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html)
    - Show one of the images with [plt.imshow](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html)
- (B) Set of Images:
    - Create a figure with [plt.subplots](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
    - Add an empty list of ticks for x and y
    - Create a torch tensor from a few images (e.g. 20) and use [make_grid](https://pytorch.org/vision/stable/utils.html)
    - Reorder the axes to the NHWC format with [permute](https://pytorch.org/docs/stable/generated/torch.permute.html)
    - Show the images with [plt.imshow](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html)
    
# Task 1: Getting to Know the CIFAR-100 Dataset

In the following you can start programming a few codelines on your own to try to correctly visualize a few images in the dataset with one (or both) of the approaches described above.

In [3]:
import matplotlib.pyplot as plt
import torch
from torchvision.utils import make_grid
%matplotlib inline

images = train_data[b'data']
print("initial shape:" ,images.shape)

# TODO:
# determine the correct shape of the data
# reshape the data accordingly
# display some images (transpose or permute the axes as necessary)
images_t = images.reshape(-1, 3, 32, 32)     # First 1024 is red, followed by 1024 green, and then 1024 blue
print('transposed shape: ', images_t.shape)  # Format NCHW

# Selecting some images
images_plot = torch.from_numpy(images_t[0:20, :]) # Format NCHW

# Concatenating the images in images_plot 
grid = make_grid(images_plot)         # Format CHW
grid = torch.permute(grid, (1, 2, 0)) # Reordening to HWC

_, ax = plt.subplots(figsize=(16,10))
ax.set_xticks([])
ax.set_yticks([])
_ = ax.imshow(grid)

As we can see, both files contain a dictionary with data.
We are mostly interested in the following keys: `fine_labels` and `data` because we will need the data from these lists to train our model.

**You should play around with the data a bit more to get a feeling of the way it is saved.
Such information is important because otherwise, you will not be able to correctly load the data.**

# Task 2: Building the Data Loading

Before we can train anything, we need to explore the data a bit and create code for data loading.
In PyTorch, we can simply create a new subclass of the class `Dataset`.
The task of this class is to provide the training loop with the correct image and label if asked.
The dataset will also perform necessary image transformations, such as normalization and even augmentation.

Your first task is to complete the code of our dataset class using the knowledge you gathered about our input data:

In [4]:
import numpy as np
import torchvision.transforms as tt

from typing import Union, Type

from imgaug import augmenters as iaa
from PIL import Image
from torch.utils.data import Dataset


class CIFAR100(Dataset):
    
    def __init__(self, dataset_path: Path, image_transforms: tt.Compose, image_augmentations: Union[None, Type[iaa.Augmenter]] = None):
        super().__init__()
        data = pickle.load(dataset_path.open('rb'), encoding='bytes')
        # TODO: store the images and the (fine) labels in class variables to be able to easily access them later on
        self.images = data[b'data'].reshape(-1, 3, 32, 32).transpose(0, 2, 3, 1)  # Format NHWC
        self.labels = data[b'fine_labels']
        
        self.image_transforms = image_transforms
        self.image_augmentations = image_augmentations
        
        assert len(self.images) == len(self.labels), "Number of images and labels is not equal!"
        
    def __len__(self) -> int:
        # TODO: return the length of the dataset, i.e., the number of available images 
        return self.images.shape[0]
        
    def __getitem__(self, index: int) -> tuple:
        # TODO: write the data loading code:
        # 1. get image and label for the corresponding index
        # 2. reshape the image array into the correct shape
        # 3. apply image augmentations *if* necessary (we should not augment the test data) - check if `self.image_augmentations` is not None
        #    HINT: use something like the following line to augment a single image (within the if clause):
        #              image = self.image_augmentations.augment_image(image)
        # 4. transform the image to a tensor using the image_transforms
        # 5. return a tuple of image and label
        image = self.images[index]
        label = self.labels[index]
        
        if self.image_augmentations is not None:
            image = self.image_augmentations.augment_image(image)
            
        tensor_image = self.image_transforms(image)
        return tensor_image, label

Now that we defined our dataset loader, we need to define two more things:
1. the transformations to apply to our images
1. some code that can be used to iterate over all images in an epoch.

### Image Transformations

We will now investigate how we can define image transformations.
PyTorch includes a set of predefined image [transformations](https://pytorch.org/vision/stable/transforms.html#) that we can use to ease image loading.
In this example, we only use the tensor transformation which normalizes the values of the tensor into the range [0-1] and also transposes the tensor to the shape [`num_channels`, `height`, `width`].
We also add some image augmentations from the great [imgaug](https://imgaug.readthedocs.io/en/latest/) library.
Here, we add random horizontal flips and also random crops:

In [5]:
image_transformations = tt.Compose([
    tt.ToTensor(),
    tt.Normalize((0.5074,0.4867,0.4411),(0.2011,0.1987,0.2025))
])

train_augmentations = iaa.Sequential([
    iaa.Fliplr(0.5),
    iaa.CropAndPad(percent=(-0.25, 0.25))
])

Now we should probably test our data loading based on these transform functions.
To do this, we can load the first image of the training dataset with the following code and check the result:

In [6]:
example_train_data = CIFAR100(Path('/kaggle/input/cifar100/train'), image_transformations, train_augmentations)
example_image, example_label = example_train_data[0]
print(f"Image shape: {example_image.shape}")                                    # Image shape: torch.Size([3, 32, 32])
print(f"First pixel: {example_image[:,0,0]}")                                   # First pixel: tensor([...])
print(f"Class: {example_label} - {labels['fine_label_names'][example_label]}")  # Class: 19 - cattle

plt.imshow(torch.permute(example_image, (1,2,0)))

Following the preparation of the transforms, we build our dataset objects and put them into PyTorch [DataLoader](https://pytorch.org/docs/stable/data.html) objects.
A `DataLoader` is used to iterate over the given dataset. The dataloader prepares batches of a given batch size for us and can also help with shuffling the data.

Have a look at the documentation of the data loader and create a meaningful data loader for train and test data:
* Both loaders should use a batch size of `64`.
* The *train data loader* should shuffle the data. (The test loader typically does not shuffle the data, but doing it there as well does not matter.)
* Both loaders should use `num_workers=2` which uses multiple processes for preprocessing the data.

In [7]:
from torch.utils.data import DataLoader

train_dataset = CIFAR100(Path('/kaggle/input/cifar100/train'), image_transformations, train_augmentations)
# TODO: build the `train_data_loader` using the `DataLoader` class of PyTorch
train_data_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)

test_dataset = CIFAR100(Path('/kaggle/input/cifar100/test'), image_transformations)
# TODO: build the `test_data_loader` using the `DataLoader` class of PyTorch
test_data_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

# Task 3: Defining the Neural Network

Now that we have the data ready, we need to define the neural network we wish to use for training.
Here, we can either define it completely by ourselves, or we can use a set of pre-defined methods for this.

In any way, we should always define at least the base network by ourselves.
To do this, PyTorch provides us with a base class that we can subclass, the [Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).
A module contains all code that is necessary to perform a forward pass through our model.
It makes sense to use a new subclass of the `Model` class because in that way we can distinguish between different tasks in a better way.

Your task is to write a module that takes as input an image tensor and returns a tensor with the shape `[batch_size, 100]` (100 represents all possible classes).
You can either define your own Network (**not recommended**), or you use a pre-defined feature extractor (**recommended**).
You can find a list of pre-defined feature extractors that work nicely with the small images of CIFAR-100 [here](https://www.kaggle.com/bartzi/cifar100-resnets). Please use a ResNet-20 feature extractor here!
You can use another feature extractor, once you are done with all tasks of this notebook!

In [8]:
import torch
import torch.nn as nn
import cifar100_resnets as models


class CIFAR100Net(nn.Module):
    
    def __init__(self, num_classes=100):
        super().__init__()
        # TODO:
        # create the model by instantiating the recommended resnet 20 (save it as `self.feature_extractor`)
        # set `num_classes` to the number of classes of CIFAR-100
        self.feature_extractor = models.resnet20(num_classes=num_classes)
        
    def forward(self, images: torch.Tensor) -> torch.Tensor:
        # TODO: pass the images through the saved `self.feature_extractor` and return the result
        return self.feature_extractor.forward(images)
        

To do a very basic test of our implementation, we can instantiate our network, and try to pass randomized data through it.
Running the following cell does that:

In [9]:
test_net = CIFAR100Net()
test_batch_size = 2
dummy_data = torch.rand((test_batch_size,3,32,32))
result = test_net(dummy_data)

print("Network output:", result.shape)
# Expected Output (otherwise something is wrong): Network output: torch.Size([2, 100])

assert result.shape[0] == test_batch_size, "the network should output one prediction for each sample in the batch"
assert result.shape[1] == 100, "the network should output predictions for 100 classes"

Okay that was simple, wasn't it?
If you want to build more complex models it is really helpful to have a separate class for such a model!

# Task 4: Training the network

Now, we implemented the data loading and the neural network.
We need to implement our training code to be able to train our neural network.
The training of a neural network (as you know) involves the following compontents:

1. Loss Function
1. Backward Propagation
1. Optimizer
1. Iterative Process

We will now have a look at all these still missing parts.


## Task 4a: Running one Iteration of the Train Process

One iteration includes a forward pass through our network, the calculation of the loss using a loss function (we are going to use softmax cross entropy for a multi-class classification problem ([PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)). After the calculation of the loss, we need to prepare the network for backpropagation. For this, we use the optimizer to zero the gradients stored in the network ([Pytorch Documentation](https://pytorch.org/tutorials/recipes/recipes/zeroing_out_gradients.html#zero-the-gradients-while-training-the-network)).
Once we cleared the gradients, we can run the backpropagation by calling backward on the obtained `loss`.
After gradient calculation, we run the update rule of the optimizer.
That's it we ran one training iteration \O/

At the end of the function, we can return the calculated loss for logging of our current process.

In [10]:
from typing import Type
from torch.optim import Optimizer

loss_function = nn.CrossEntropyLoss()

def train_for_one_iteration(network: Type[nn.Module], batch: tuple, optimizer: Type[Optimizer]) -> float:
    images, labels = batch
    # TODO:
    # pass the images through the network (store the predictions in a variable)
    # then calculate the loss using the loss_function based on the predictions and the labels
    outputs = network(images)
    loss = loss_function(outputs, labels)
    
    # Here come the real weight adjustments, first zero gradients, then calculate derivatives, followed by the actual update of the optimizer
    optimizer.zero_grad()  # this sets gradients to zero (e.g. to clean up from any previous backward passes)
    loss.backward()        # calculate gradients for our network
    optimizer.step()       # update all weights in our network according to the computed gradients
    
    return float(loss.item())
    

## Task 4b: Implement The Network Testing

A full training consists of going through the entire train dataset multiple times (so called epochs).
In each iteration of an epoch, we forward a batch of our training data through the network and update the network (as implemented above).
At the end of each epoch, we should run our trained network on the test dataset to see how it performs on unseen data.

For testing, we run the data of the full test dataset through the network and calculate the prediction accuracy of our network:

Here, you'll need to forward the images through the network, run the softmax function on the outputs to get a probability distribution, get the most probable class from the output of the network and the calculate the number of `correct_predictions`.

In [11]:
from tqdm.notebook import tqdm  # import a library for displaying a nice progressbar

import torch.nn.functional as F  # import torch functions, this contains softmax

def test_model(network: Type[nn.Module], data_loader: DataLoader) -> float:
    num_correct_predictions = 0
    device = get_device()
    
    for images, labels in tqdm(data_loader, desc="Testing...", leave=False):
        images = to_device(images, device)
        labels = to_device(labels, device)
        # TODO:
        # 1. get the network predictions, by passing the images through our networks
        # 2. calculate the output of the softmax function on our predictions (hint: do it on the dimension 1, which is the dimension of our 100 classes)
        # 3. store the predicted class, by finding the argument with the maximum value for each sample (thus working on dimension 1 again)
        # 4. count the cases where the predicted class is equal to the labels 
        predictions = network(images)
        predictions_softmax = F.softmax(predictions, dim=1)
        _, predicted_classes = torch.max(predictions_softmax.data, dim=1)
        
        correct_predictions = (predicted_classes == labels).sum()
        
        num_correct_predictions += correct_predictions
        
        
    accuracy = num_correct_predictions / len(data_loader.dataset)
    return float(accuracy.item())
        

## Data on CPU vs. GPU

Another thing, that we need to take care is handle where our data should live (either CPU or GPU).
In PyTorch, we have to handle this manually.
However, it is a simple task to handle this.
In this task, we assume that we train on only one GPU.
Thus, we just need to check whether a GPU is available, if a GPU is available, we can transfer our data to the GPU by using a simple function right after loading:

In [12]:
def get_device() -> torch.device:
    if torch.cuda.is_available():
        return torch.device("cuda")
    return torch.device("cpu")


def to_device(data: torch.Tensor, device: torch.device) -> torch.Tensor:
    if isinstance(data, (list, tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device)

Now, it is time to build our training loop code.
Here we go through our training dataset for a given amount of epochs.
We also management where our training will run (either on CPU, or on the GPU).
We already prepared all code for you, enjoy!

In [13]:
import statistics

from collections import defaultdict

from tqdm.notebook import trange

def train(train_data: DataLoader, test_data: DataLoader, network: Type[nn.Module], optimizer: Type[Optimizer], num_epochs: int) -> dict:
    device = get_device()
    metrics = defaultdict(list)
    for epoch in trange(num_epochs, desc="Epoch: "):
        losses = []
        with tqdm(total=len(train_data), desc="Iteration: ") as progress_bar:
            for iteration, batch in enumerate(train_data):
                batch = to_device(batch, device)
                loss = train_for_one_iteration(network, batch, optimizer)
                losses.append(loss)
                progress_bar.update()
                metrics["losses"].append({"iteration": epoch * len(train_data) + iteration, "value": loss})

            accuracy = test_model(network, test_data)
            metrics['accuracy'].append({"iteration": (epoch + 1) * (len(train_data)), "value": accuracy})

            progress_bar.set_postfix_str(f"Epoch {epoch}, Mean Loss: {statistics.mean(losses):.2f}, Test Accuracy: {accuracy:.2f}")
    
    return metrics

We are done with most of the coding.
The last things we need to do is the setup of all parts that can be changed without problem, namely setting of necessary hyperparameters, creation of the network, instantiation of the optimizer, transferring of the model to the GPU (if GPU is enabled).

### Setting of Hyperparameters

First, we are going to set some hyperparameters, such as the learning rate, the number of epochs we want to train the model.

In [14]:
learning_rate = 0.001
num_epochs = 50

Before starting the training below, you should enable the GPU acclerator in the sidebar on the right (you can open the sidebar by clicking on the |< Symbol in the top right, then select *Settings*, *Accelerator*, *GPU*).

If you have not done so at the beginning of working on this exercise (which is fine), this means the other cells need to be run again.
To do so, you can select *Run All* in the top toolbar.
The notebook should run most of the previous cells very quickly until the training below is executed.

In [15]:
# create the network and make sure to transfer it to the GPU if a GPU is available
network = CIFAR100Net()
network = network.to(get_device())

# create an optimizer for training, here we use Adam. However, you can also try other optimizers later on.
optimizer = torch.optim.Adam(network.parameters(), lr=learning_rate)

# we are done with all setup and can start the training
logged_metrics = train(train_data_loader, test_data_loader, network, optimizer, num_epochs)

# Optional Task: Rendering of Metrics

Looking only at the output of the train metrics such as loss and accuracy provides us with some information about the progress and success of the training.
To gain more insights, it makes sense to also plot the training metrics to get a better visualization.

This is our final but **optional** task for this exercise.
If you want to skip this task, please continue to "What now?" below.
We are going to provide a solution for this task at the end of the week.

We can use the matplotlib library (that we already used above to plot our logged metrics.
Since we are looking at a progression of values, it makes sense to use a [line-chart](https://pythonbasics.org/matplotlib-line-chart/) for such a plot.
Keep in mind that the `train` method returns all logged metrics as a dictionary with each key holding a list of dictionaries, where each item of the list contains one key to indicate the iteration the value was logged on and the other key indicating the logged value.

In [16]:
def plot_metrics(metrics: dict):
    # we prepare the plotting by creating a set of axes for plotting, we want to put each metric in its own plot in a separate row
    # furthermore, all plots should share the same x-axis values
    fig, axes = plt.subplots(len(metrics), 1, sharex=True, figsize=(10, 10))

    # we want to have a set of distinct colors for each logged metric
    colors = iter(plt.cm.rainbow(np.linspace(0, 1, len(metrics))))
    
    # TODO: (optional task)
    i = 0
    for key in metrics.keys():
        line_plot = np.array([(item['iteration'], item['value']) for item in logged_metrics[key]])
        axes[i].plot(line_plot[:,0], line_plot[:,1], label=key, color=next(colors))
        axes[i].legend()
        i += 1
        
    plt.show()
    
plot_metrics(logged_metrics)

# What now?

We are done with the second practical exercise.
Make sure to remember the results you got using the parameters and default settings provided in this notebook.
We will have a question in the graded exercise about the value range you received (hint: it should be better than 30% accuracy)

But what can you do now?
Well, if you want to play around with the network and make it perform even better, you can try to change some things here and there!
Some suggestions for you:
- add more data augmentations (have a look at the possible data augmentation strategies the [imgaug](https://imgaug.readthedocs.io/en/latest/index.html) library provides)
- add learning [rate scheduling](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate)!
- you can also try to use a different network architecture!

Whatever happens, have fun!