# Image Classification Notebook

# Table of Contents
  - [Image Classification Notebook](#Image-Classification-Notebook)
    - [References](#References)
    - [Libraries](#Libraries)
    - [Introduction](#Introduction)
    - [Classes](#Classes)
    - [Functions](#Functions)
    - [Dataset](#Dataset)
      - [Load data](#Load-data)
    - [Explore image processing](#Explore-image-processing)
      - [Example image](#Example-image)
      - [Geometric transformation](#Geometric-transformation)
        - [Scaling](#Scaling)
        - [Cropping](#Cropping)
        - [Horizontal Flip](#Horizontal-Flip)
        - [Vertical Flip](#Vertical-Flip)
        - [Rotation](#Rotation)
      - [Image filtering](#Image-filtering)
        - [Average filter ](#Average-filter)
        - [Median filter](#Median-filter)
        - [Gaussian filter](#Gaussian-filter)
      - [Photometric transformation](#Photometric-transformation)
        - [Adjust brightness](#Adjust-brightness)
        - [Adjust contrast](#Adjust-contrast)
        - [Adjust saturation](#Adjust-saturation)
    - [Image classifier development using CNNs](#Image-classifier-development-using-CNNs)
      - [Dataset preprocessing](#Dataset-preprocessing)
        - [Train, validation, and test sets](#Train,-validation,-and-test-sets)
        - [Data Augmentation](#Data-Augmentation)
        - [PyTorch Datasets](#PyTorch-Datasets)
        - [PyTorch Dataloaders](#PyTorch-Dataloaders)
      - [Model training](#Model-training)
      - [Model Training Overview](#Model-Training-Overview)
        - [Check which device is used for training](#Check-which-device-is-used-for-training)
        - [Define training hyperparameters](#Define-training-hyperparameters)
        - [Loss function](#Loss-function)
        - [Initialise model architecture](#Initialise-model-architecture)
        - [Optimiser function](#Optimiser-function)
        - [Train model](#Train-model)
        - [Learning curves](#Learning-curves)
      - [Model testing](#Model-testing)
      - [Explore results](#Explore-results)
        - [Compute average accuracy](#Compute-average-accuracy)
        - [Compute confusion matrix](#Compute-confusion-matrix)
      - [Explain image classifier predictions](#Explain-image-classifier-predictions)
        - [Prepare image for Grad-CAM](#Prepare-image-for-Grad-CAM)
        - [Compute GradCAM heatmap](#Compute-GradCAM-heatmap)
        - [Visualise Grad-CAM heatmap with the image](#Visualise-Grad-CAM-heatmap-with-the-image)

## References

Here are some additional references to guide you while self-learning:
- Official documentation for [openCV](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html).
- Official documentation for [PIL library](https://pillow.readthedocs.io/en/stable/).
- Official documentation for [PyTorch](https://pytorch.org/).
- Official documentation for [Albumentations](https://albumentations.ai/).
- Official documentation for [PyTorch GradCAM](https://jacobgil.github.io/pytorch-gradcam-book/introduction.html).
- [A tutorial from Microsoft to compute image classification using PyTorch](https://learn.microsoft.com/en-us/windows/ai/windows-ml/tutorials/pytorch-train-model).

## Libraries

- [Matplotlib](./20_library_matplotlib.ipynb)
- [NumPy](./21_library_numpy.ipynb)
- [scikit-learn](./22_library_sklearn.ipynb)
- OpenCV-Python
- PyTorch
- Albumentations
- PyTorch Grad-CAM

## Introduction

Image Classification is a foundational task in the field of computer vision and machine learning.
This notebook aims to provide practical experience in image processing and in building and evaluating image classification models. 

It begins by demonstrating how to load and preprocess image data using Matplotlib and OpenCV-Python.
Then, it shows how to build a basic image classification pipeline based on Convolutional Neural Networks (CNNs) using PyTorch, Albumentations, and Scikit-learn.
Next, it covers how to evaluate model performance using Scikit-learn and NumPy, and finally, it introduces model explainability using Grad-CAM.

The goal of this notebook is not to teach the underlying algorithms and procedures used in this field, but rather to give the user an idea of what can be done with these Python libraries.

## Classes

The following three classes are essential for improving modularity and readability.

- **ImageDataset** is used to load images along with their labels and to perform image augmentation.
- **ImageClassifier** is responsible for building the image classification model, which in this case is based on Convolutional Neural Networks (CNNs).
- **Trainer** handles the training and evaluation processes using batches of data.

By organizing the code in this way, we simplify debugging and future extensions.

The classes are currently not complete. Use the following code to prepare them:

```ImageDataset```: 

In ```__init__```, initialise the following attributes:
```python
self.images = images # Input images
self.labels = labels # Output classes
self.transform = transform # Transformations applied to the data when calling them
```

Complete function ```__len__``` - this method is needed to let the generator know how many samples there are in the data:
```python
return len(self.images)
```

Complete function ```__getitem__``` - this method is needed to lety the generator know what to do to samples when calling them:
```python
image = self.images[idx]
label = self.labels[idx]

# Ensure the image is in the shape (H, W, C) for Albumentations library (library used for image augmentation)
image = np.transpose(image, (1, 2, 0))

# Apply transformations on the images
if self.transform:
    augmented = self.transform(image=image)
    image = augmented['image']

return image, label
```

In [None]:
from torch.utils.data import Dataset
import numpy as np

class ImageDataset(Dataset):
    def __init__(self, images, labels, transform=None):
        pass
 
    def __len__(self):
        return
 
    def __getitem__(self, idx):
        return

```ImageClassifier```: 

`__init__` function:

The first thing to do is to build the `__init__` function, which contains the variables needed for building the neural network.
Let's start by defining the number of feature maps in the first convolutional layer (the value is empirical):

```python
self.feature_maps = 64
```

To help a computer understand and classify images, we build a model made up of layers, kind of like stacking Lego blocks.
Each block does a specific task — detecting patterns, reducing size, or making decisions. Here's what each component does:

```python
self.conv1 = nn.Conv2d(in_channels, self.feature_maps, kernel_size = 3)
```

This layer scans the image for small patterns (like edges or colors).
`in_channels` is the number of input image channels (e.g. 3 for RGB images).
`self.feature_maps` is how many different patterns we want the model to learn at this layer.
`kernel_size = 3` means the scanning window is 3x3 pixels. The value is empirical.

```python
self.pool1 = nn.MaxPool2d(kernel_size = 2)
```
This layer shrinks the size of the image while keeping the most important info (max values).
It helps the model focus and reduces computation.

```python
self.bn1 = nn.BatchNorm2d(self.feature_maps)
```

This layer normalizes the outputs, making training faster and more stable.
The combination of the foreamentioned layers is also usually called as convolutional block.
After defining the first convolutional block, lets define the second one:

```python
self.conv2 = nn.Conv2d(self.feature_maps, self.feature_maps * 2, kernel_size = 3)
self.pool2 = nn.MaxPool2d(kernel_size = 2)
self.bn2 = nn.BatchNorm2d(self.feature_maps * 2)
```
The second block is very similar to the first block, but now it looks for more complex patterns by increasing the number of feature maps (i.e. learning more features).
After defining the second convolutional block.
Lets define the third and last one:

```python
self.conv3 = nn.Conv2d(self.feature_maps * 2, self.feature_maps * 4, kernel_size = 3)
self.pool3 = nn.MaxPool2d(kernel_size = 2)
self.bn3 = nn.BatchNorm2d(self.feature_maps * 4)
```

This block explored even deeper patterns, such as shapes or textures.
As we go deeper, the network becomes better at understanding the image.
Then, we define the activation layer that is going to be used in-between these blocks:

```python
self.relu = nn.ReLU()
```

After each layer, we add a "yes/no" switch to keep only useful patterns.
ReLU (Rectified Linear Unit) sets negative values to zero — it adds non-linearity to help the network learn more complex things.
Next, we define the layer that transforms the data from 2D images into an 1D vector (like stretching out a grid of pixels into a line):

```python
self.flatten = nn.Flatten(start_dim=1)
```

Now, we define the dropout layer:

```python
self.dropout = nn.Dropout(p = 0.3)
```
This layer randomly turns off a pre-define percentage of neurons (`p = 0.3`) during training to prevent overfitting — so the model does not memorize the training data too closely.
Finally, we define the classifier:

```python
self.out_classes = out_classes
self.fc = nn.Linear(1024, self.out_classes)
```

This final layer is like the decision-maker.
It takes all the features the model has learned and decides which class (e.g. cat, dog, airplane) the input image belongs to.

1024 is the number of features coming into the layer (depends on the hyperparameters used in the previous layers), and `out_classes` is how many classes we want to predict.

`forward` function:

After defining the function `__init__`, we need to define the function `forward`.
This one is responsible to combine all the layers defined in the `__init__` to build the neural network model.
Basically, it describes how an input image flows through the network, one layer at a time, to become a prediction.

```python
# Convolutional block 1
x = self.conv1(x)
x = self.pool1(x)
x = self.relu(x)
x = self.bn1(x)

# Convolutional block 2
x = self.conv2(x)
x = self.pool2(x)
x = self.relu(x)
x = self.bn2(x)

# Convolutional block 3
x = self.conv3(x)
x = self.pool3(x)
x = self.relu(x)
x = self.bn3(x)

# Classifier
x = self.flatten(x)
x = self.dropout(x)
x = self.fc(x)
return x
```

In [None]:
import torch.nn as nn

class ImageClassifier(nn.Module):
    def __init__(self, in_channels = 1, out_classes = 1):
        super(ImageClassifier, self).__init__()
    
    def forward(self, x):
        return

```Trainer```: 

`__init__` function:

This function is used to initialise variables used in the other functions of the class.
Start by initialising the following attributes:

```python
self.model = model
self.train_losses = []
self.val_losses = []
self.best_model_weights = None
```

`self.model` – this is the neural network we're training.
`self.train_losses` and `self.val_losses` – these lists keep track of how well the model is doing on the training and validation sets over time (used to plot learning curves).

`self.best_model_weights` – this will store a copy of the model when it performed best on the validation set (used for early stopping).

`fit` function:

This function goes through the data multiple times (epochs) to optimize the model’s performance.
It also applies early stopping, which stops training if performance stops improving.
Lets start by initialising the following variables:

```python
early_stopping_count = 0
best_val_loss = 9999
best_epoch = 0
```

`early_stopping_count` is used to track the number of epochs without improving validation loss (used in early stopping).
`best_val_loss`: is used to track the best validation loss ever seen. Here we use a very large meaning-less number because validation loss for classification is always smaller than that.
`best_epoch`: is used to track the epoch that got the best validation loss.

Then, we initialise the file that is going to store the training statistics:

```python
# Training log file
log_filename = "training_log.txt"
with open(log_filename, "w") as log_file:
    log_file.write("Epoch,Train Loss,Val Loss,Best Val Loss,Best Epoch\n")
```

Now comes the training phase.
It includes a main for-loop that runs until the end of the pre-defined number of epochs, and two inner loops: one for optimising the model's weights and another for evaluating the model after each epoch.

```python
for epoch in range(epochs):
    # Set the model to training mode. This is important because some layers behave differently during training than they do during evaluation.
    self.model.train()
    
    # Loop over the training set
    train_loss = 0.0
    train_samples_count = 0.0
    for i, data in enumerate(train_dataloader):
        # Get the data and send it to the training device
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.long().to(device)
        
        # Clear old gradients
        optimizer.zero_grad()

        # Perform the forward step to get the predictions for the inputs
        outputs = self.model(inputs)

        # Compute the loss of the predictions
        loss = criterion(outputs, labels)

        # Perform the backward step which is responsible for computing the gradients
        loss.backward()
        
        # Update the model weights using the new gradients
        optimizer.step()

        # Save losses and number of samples in the batch
        train_loss += loss.item()
        train_samples_count += 1
        
    # Set the model to evaluation mode
    self.model.eval()
    
    # Loop over the validation set. Here we just want to evaluate the model. Therefore, there is no weight optimisation.
    val_loss = 0.0
    val_samples_count = 0.0
    for i, data in enumerate(val_dataloader):
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.long().to(device)
        
        outputs = self.model(inputs)
        loss = criterion(outputs, labels)
        
        val_loss += loss.item()
        val_samples_count += 1
    
    # Divide the total train and validation losses by the number of samples, respectively.
    train_loss /= train_samples_count
    val_loss /= val_samples_count
    
    # Average training and validation losses for the epoch are stored.
    self.train_losses.append(train_loss)
    self.val_losses.append(val_loss)
    
    # Increase early stopping count
    early_stopping_count += 1
    
    # In case the new validation loss is better than the best seen, 
    # save the current epoch index, new validation loss, current model 
    # weights and reset early stopping counter.
    if val_loss < best_val_loss:
        best_epoch = epoch
        best_val_loss = val_loss
        early_stopping_count = 0
        self.model.best_model_weights = self.model.state_dict()
    
    print(f'Epoch: {epoch}, Loss: {train_loss}, Val Loss: {val_loss}. The best val loss is {best_val_loss} in epoch {best_epoch}.')
    
    # Append the current epoch statistics to the training log file
    with open(log_filename, "a") as log_file:
        log_file.write(f"{epoch},{train_loss},{val_loss},{best_val_loss},{best_epoch}\n")
    
    # In case, the number of epochs without improving the validation loss 
    # gets above the pre-defined threshold, stop the training early to avoid overfitting.
    if early_stopping_count == early_stopping_limit and early_stopping_limit > 0:
        break
```

`predict` function:

Once training is done, this method is used to predict labels for new data.
As early stopping is used during the training, it might be the case that the last models weights were not the best ones. Therefore, load the best-performing ones.

```python
# Load best weights
if self.best_model_weights:
    self.model.load_state_dict(self.best_model_weights)
```

Set the model to evaluation mode
```python
# Test mode
self.model.eval()
```

Loop through the test set to get the model predictions.
Not only the predictions, but also the original images and the true labels are stored for future use.

```python
original_images = []
true_labels = []
predicted_labels = []

for data in test_dataloader:
    # Load data and send it to device
    images, labels = data
    images = images.to(device)

    # Get model predictions
    outputs = self.model(images)

    # As the model outputs a vector scores (one per class), take 
    # the index of the maximum score which corresponds to the predicted class.
    _, predicted = torch.max(outputs, 1)
    
    # .cpu() ensures that the data is on CPU and .numpy() convert it to a NumPy array
    images = images.cpu().numpy()
    labels = labels.numpy()
    predicted = predicted.cpu().numpy()
    
    original_images.append(images)
    true_labels.append(labels)
    predicted_labels.append(predicted)

# Convert the list of NumPy arrays into only one NumPy array
original_images = np.concatenate(original_images)
true_labels = np.concatenate(true_labels)
predicted_labels = np.concatenate(predicted_labels)

return original_images, true_labels, predicted_labels
```

In [None]:
import numpy as np

class Trainer():
    def __init__(self, model):
        pass
        
    def fit(self, epochs, train_dataloader, val_dataloader, optimizer, criterion, device, early_stopping_limit = 0):
        return
    
    def predict(self, test_dataloader, device):
        return


## Functions

The following three functions are going to be used throughout the notebook.
They comprise the loading of binary files using Pickle (**load_pickle_file**), single image plotting (**plot_image**), and multiple image plotting (**plot_multiple_images**).

In [None]:
from matplotlib import pyplot as plt
import pickle

def load_pickle_file(filepath):
    with open(filepath, "rb") as f:
        return pickle.load(f)

def plot_image(img, figsize = (2,3)):
    plt.figure(figsize = figsize)
    plt.imshow(img)
    plt.axis("off")
    
def plot_multiple_images(*images_titles, figsize = (2, 3)):
    num_images = len(images_titles)
    fig, axs = plt.subplots(1, num_images, figsize = figsize)
    for i in range(num_images):
        axs[i].imshow(images_titles[i][0])
        axs[i].set_title(images_titles[i][1])
        axs[i].axis("off")

## Dataset

In this section, we load the CIFAR-10 dataset, which consists of 60,000 32x32 color images across 10 different classes, with 6,000 images per class.
The dataset is divided into 50,000 training images and 10,000 test images. It was already processed and it is ready to use after loading the binary files *train_set.pkl* and *test_set.pkl*.

### Load data

Training and test sets are loaded using Pickle library. If you do not have the dataset already, open this [link](https://www.dropbox.com/scl/fo/p7gfb0kpgkbrrjup340pi/AAkX2u1g-W7290-Aq7gHHvo?rlkey=vdxaj6npfy09ywh17nl8f9v6e&st=8hfq9z20&dl=0) and download it.
Place it inside the data folder.

In [None]:
import os

# Sets filepaths
dataset_folder = os.path.join("data/CIFAR10")
train_set_file = os.path.join(dataset_folder, "train_set.pkl")
test_set_file = os.path.join(dataset_folder, "test_set.pkl")

# Load sets
train_set = load_pickle_file(train_set_file)
test_set = load_pickle_file(test_set_file)

# CIFAR10 classes
CIFAR_10_CLASSES = [
    "Airplane", "Automobile", "Bird", "Cat", "Deer",
    "Dog", "Frog", "Horse","Ship","Truck"
]

## Explore image processing

Image processing is fundamental to computer vision, forming the basis for interpreting and analyzing visual information.
By applying techniques such as resizing, filtering, color adjustments, and data augmentation, image processing enhances input quality, minimizes noise, and corrects distortions.
These methods can also simulate real-world variability, helping models generalize better. 

In this notebook, we explore three categories of image transformations: **geometric transformations**, **image filtering**, and **photometric transformations**.
The following cells contain a series of exercicies designed to help you explore the OpenCV-Python library. 

If you are unfamiliar with a particular method, refer to the [Image Processing in OpenCV](https://docs.opencv.org/4.x/d2/d96/tutorial_py_table_of_contents_imgproc.html) documentation.
There you can find the description of the functions needed for [Geometric transformations](https://docs.opencv.org/4.x/da/d6e/tutorial_py_geometric_transformations.html) and [image filtering](https://docs.opencv.org/4.x/d4/d13/tutorial_py_filtering.html).
Regarding photometric transformations, openCV documentation does not have a specific page for that.
To adjust brightness and contrast, you can read [Changing the contrast and brightness of an image!](https://docs.opencv.org/4.x/d3/dc1/tutorial_basic_linear_transform.html).
To adjust saturation, first convert the image to the HSV color space using [`cv2.cvtColor`](https://docs.opencv.org/4.x/d8/d01/group__imgproc__color__conversions.html#gaf86c09fe702ed037c03c2bc603ceab14).
Then, split the image into Hue, Saturation, and Value channels with [`cv2.split`](https://docs.opencv.org/4.x/df/df2/group__core__hal__interface__split.html).
Modify the Saturation channel as needed, merge the channels back together using [`cv2.merge`](https://docs.opencv.org/4.x/df/d2e/group__core__hal__interface__merge.html), and finally convert the image back to the RGB color space.

### Example image

In [None]:
import numpy as np

# Select image
image = train_set[0][9]

# Convert image from (C, H, W) to (H, W, C)
image = np.transpose(image, (1,2,0))

# Plot image
plot_image(image)

### Geometric transformation

Geometric transformations alter the spatial structure of the image while preserving its semantic content.
They help the model become invariant to different orientations and scales:

- **Scaling**: Resizes the image to a specific size, often required to match input dimensions for image classifiers.
  It uses interpolation to obtain the new pixel-values.
- **Cropping**: Extracts a subregion of the image; useful for focusing on important parts or adding variability.
- **Horizontal and vertical flip**: Flips the image along the x-axis or y-axis; helps the model learn symmetry.
- **Rotation**: Rotates the image by a small angle to simulate different orientations of the objects.

#### Scaling

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_scale_image(img, scale_factor: float):
    # Start your code here
    return
    # End your code here

In [None]:
# Scale image by half
scaled_image = solution_scale_image(image, 0.5)

if scaled_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (scaled_image, "Scaled"), figsize = (4, 5))

#### Cropping

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_crop_image(img, x: int, y: int, width: int, height: int):
    # Start your code here
    return
    # End your code here

In [None]:
# Crop image to get a 15-by-15 image starting on (x,y): (2,2)
cropped_image = solution_crop_image(image, 2, 2, 15, 15)

if cropped_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (cropped_image, "Cropped"), figsize = (4, 5))

#### Horizontal Flip

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_horizontal_flip_image(img):
    # Start your code here
    return
    # End your code here

In [None]:
# Flip image horizontally
flip_image_horizontal = solution_horizontal_flip_image(image)

if flip_image_horizontal is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (flip_image_horizontal, "Horizontal Flip"), figsize = (4, 5))

#### Vertical Flip

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_vertical_flip_image(img):
    # Start your code here
    return
    # End your code here

In [None]:
# Flip image vertically
flip_image_vertical = solution_vertical_flip_image(image)

if flip_image_vertical is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (flip_image_vertical, "Vertical Flip"), figsize = (4, 5))

#### Rotation

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_rotate_image(img, angle: float):
    # Start your code here
    return
    # End your code here

In [None]:
# Rotate image by 20 degrees
rotated_image = solution_rotate_image(image, 20)

if rotated_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (rotated_image, "Rotated"), figsize = (4, 5))

### Image filtering

Filtering helps reduce noise and enhance specific image features.
These are often used as a form of preprocessing before feeding images into a model:

- **Average filter**: Applies a smoothing effect by replacing each pixel with the average of its neighborhood.
- **Median filter**: Reduces salt-and-pepper noise by replacing each pixel with the median of neighboring pixels.
- **Gaussian filter**: Applies a Gaussian blur to smooth the image, often used to reduce high-frequency noise.

#### Average filter 

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_average_filter(img, kernel_size = (5, 5)):
    # Start your code here
    return
    # End your code here

In [None]:
# Filter image using average filter
average_filter_image = solution_average_filter(image, (3, 3))

if average_filter_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (average_filter_image, "Average filter"), figsize = (4, 5))

#### Median filter

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_median_filter(img, ksize):
    # Start your code here
    return
    # End your code here

In [None]:
# Filter image using median filter
median_filter_image = solution_median_filter(image, 3)

if median_filter_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (median_filter_image, "Median filter"), figsize = (4, 5))

#### Gaussian filter

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_gaussian_filter(img, kernel_size = (5, 5), sigma = 0):
    # Start your code here
    return
    # End your code here

In [None]:
# Filter image using Gaussian filter
gaussian_filter_image = solution_gaussian_filter(image, (7, 7), 0)

if gaussian_filter_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (gaussian_filter_image, "Gaussian filter"), figsize = (4, 5))

### Photometric transformation

Photometric transformations modify the color properties of an image to simulate different lighting conditions and improve model robustness to brightness and contrast changes:

- **Brightness**: Randomly increases or decreases the brightness of the image.
- **Contrast**: Alters the difference between light and dark regions in the image.
- **Saturation**: Modifies the intensity of the colors in the image.

#### Adjust brightness

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_adjust_brightness(img, brightness_value):
    # Start your code here
    return
    # End your code here

In [None]:
# Brighter image (positive brightness value)
brighter_image = solution_adjust_brightness(image, 100)

# Darker image (negative brightness value)
darker_image = solution_adjust_brightness(image, -100)

if brighter_image is not None and darker_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (brighter_image, "Brighter image"), (darker_image, "Darker image"), figsize = (7, 8))

#### Adjust contrast

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_adjust_contrast(img, contrast_value):
    # Start your code here
    return
    # End your code here

In [None]:
# Increase contrast (Value > 1.0)
high_contrast_image = solution_adjust_contrast(image, 2.0)

# Reduce contrast (Value < 1.0)
low_contrast_image = solution_adjust_contrast(image, 0.5)

if high_contrast_image is not None and low_contrast_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (high_contrast_image, "High contrast image"), (low_contrast_image, "Low contrast image"), figsize = (7, 8))

#### Adjust saturation

In [None]:
%reload_ext tutorial.tests.testsuite

In [None]:
%%ipytest

import cv2
def solution_adjust_saturation(img, saturation_factor):
    # Start your code here
    return
    # End your code here

In [None]:
# Decrease saturation
low_saturation_image = solution_adjust_saturation(image, 0.2)

# Increase saturation
high_saturation_image = solution_adjust_saturation(image, 2.5)

if low_saturation_image is not None and high_saturation_image is not None:
    # Use this function to plot images side by side
    plot_multiple_images((image, "Original"), (low_saturation_image, "Low saturation image"), (high_saturation_image, "High saturation image"), figsize = (7, 8))

## Image classifier development using CNNs

Image classification is the task of assigning a label or category to an input image from a predefined set of classes.
It is a fundamental problem in computer vision with widespread applications, including facial recognition, medical imaging, quality control, and autonomous driving. 
This section outlines the key steps involved in developing an image classification model using PyTorch:

- It begins with data preprocessing, which includes splitting the dataset into training, validation, and test sets. 
- Afterwards, it defines data augmentation strategies using the Albumentations library, loads the data as PyTorch datasets, and initialises PyTorch dataloaders to efficiently feed data during training. 
- The next step is model training, where a CNN-based model is initialized and optimised using the training and validation data. 
- After training, the model is evaluated on the test set to assess its performance.
  The evaluation includes metrics such as accuracy and the confusion matrix, which help interpret the model's predictive behavior. 
- Finally, the PyTorch Grad-CAM library is used to visualize the regions of input images that contribute most to the model’s decisions, providing insights into model explainability using representative examples.

### Dataset preprocessing

#### Train, validation, and test sets

```train_test_split``` from Scikit-learn can be used to split the original training set into training and validation sets.
The test set is already defined by the dataset' authors.

In [None]:
from sklearn.model_selection import train_test_split

# Train and validation sets
X_train, y_train = train_set[0], train_set[1]
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.3, random_state = 42)

# Test set
X_test, y_test = test_set[0], test_set[1]

#### Data Augmentation

Data augmentation is a crucial technique in image classification that helps improve the performance and robustness of machine learning models.
It involves generating new training samples by applying random transformations — such as rotation, flipping, cropping, scaling, or color jittering — to the original images. 

Albumentations is one of the most widely used libraries for performing data augmentation in image classification tasks.
It includes augmentation techniques that replicate operations commonly used in image processing, such as:

- ```A.Affine``` for scaling;

- ```A.Rotate``` for rotation;

- ```A.HorizontalFlip``` for horizontal flipping;

- ```A.VerticalFlip``` for vertical flipping;

- ```A.ColorJitter``` for color jittering.

Albumentations can also be used for image normalization (```A.Normalize```), resizing (```A.Resize```), and converting images to PyTorch tensors with the (Channel, Height, Width) format using ```A.ToTensorV2```, which is required for model training.
Apply the following transformations only to the training set, as the validation set should remain as close as possible to the test set. Therefore, no transformations should be applied to it.

```python
A.Affine(scale = (0.2, 1.5), p = 0.1),
A.Rotate(limit = 45, p = 0.1),
A.HorizontalFlip(p = 0.1),
A.VerticalFlip(p = 0.1),
A.ColorJitter(brightness = (0.5, 1.5), contrast = (0.5, 1.5), saturation = (0.5, 1.5), hue = (0,0), p = 0.1)
```

In [None]:
import albumentations as A

# Transformations performed on train set
TARGET_SIZE = 32
train_transform = A.Compose([
    A.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2470, 0.2435, 0.2616)),
    A.Resize(height = TARGET_SIZE, width = TARGET_SIZE),
    A.ToTensorV2()
])

# Transformations performed on validation and test sets
val_transform = A.Compose([
    A.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2470, 0.2435, 0.2616)),
    A.Resize(height = TARGET_SIZE, width = TARGET_SIZE),
    A.ToTensorV2()
])

#### PyTorch Datasets

```ImageDataset``` class is based on PyTorch ```Dataset``` class and is used for loading the images and their corresponding labels, for applying transformations (such as data augmentation), and returns them in a format suitable for model training, validating, and testing.

In [None]:
# Dataset classes necessary for the data loaders
train_dataset = ImageDataset(X_train, y_train, transform = train_transform)
val_dataset = ImageDataset(X_val, y_val, transform = val_transform)
test_dataset = ImageDataset(X_test, y_test, transform = val_transform)

#### PyTorch Dataloaders

```DataLoader``` is essential for training efficiency and performance.
It abstracts the complexity of batching, shuffling, and parallel data access, allowing you to focus on building and training your models.
```batch_size``` specifies the number of samples processed in parallel during each training iteration.
It is typically treated as a hyperparameter, as its optimal value depends on hardware constraints (e.g., GPU memory) and its interaction with training dynamics.
Notably, it is often linearly related with the learning rate. Larger batch sizes generally require proportionally larger learning rates to maintain stable and efficient convergence.
```shuffle``` controls whether the dataset is randomly permuted at the start of each epoch.
Enabling ```shuffle = True``` is typically beneficial, as it helps prevent the model from learning misleading patterns due to class-wise ordering in the dataset, which could hinder generalization and convergence.

In [None]:
from torch.utils.data import DataLoader

# Data loaders needed for the model training
BATCH_SIZE = 64
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE, shuffle = True)
val_dataloader = DataLoader(val_dataset, batch_size = BATCH_SIZE, shuffle = True)
test_dataloader = DataLoader(test_dataset, batch_size = BATCH_SIZE, shuffle = True)

### Model training

Model training comprises a series of steps:

1. First, we must check which devices are available for training the model.
   In case a GPU with Cuda cores is available is should be used as it really improves the speed.
   Otherwise, lets use CPU. 
1. Then, model and training hyperparameters should be defined, such as numer of output classes, number of training epochs, number of consecutive not improving epochs needed for stopping the training in case we use early stopping regularisation, and learning rate.
   Other hyperparameters can be defined, it depends on what the user wants to do during the training.
   In this notebook we are going to define the number of epochs, which are the number of times the model is going to see the training set.
   Early stopping is a way of trying to avoid overfitting where the model evaluates the model every new epoch using a validation set.
   In case the loss obtained for the validation set does not decrease for a long period of time (pre-defined epochs), the model optimisation stops and retrieves the checkpoint where the validation loss got the last decrease (see [Early Stopping](https://paperswithcode.com/method/early-stopping)).
   Learning rate defined how quick the models weights should change during training.
   If it is too high the weights are going to change really quick and might miss minima because they are always jumping from one side to another side.
   If it is too small the model weights might get stuck a local minimum.
   So although this is not done in this notebook, this parameter should be studied in order to choose the best (see [What is learning rate in machine learning?](https://www.ibm.com/think/topics/learning-rate)). 
1. After defining the hyperparameters, we should define the loss function that is going to be used to evaluate the model and it should be sent to the hardware used for training. 
1. Afterwards, the model is defined using ```ImageClassifier``` class and is sent to the device used for training.
1. Next, we should define the optimiser function and also send it to the device used for training.
1. Afterwards, we train the model in case some optimised weights are not available and we explore the learning curves.

### Model Training Overview

Model training involves a sequence of key steps.
The first step is to check which computational devices are available.
If a GPU with CUDA cores is accessible, it should be used, as it significantly accelerates training (```DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")```).
Otherwise, the model will be trained on the CPU. Next, we define the model and training hyperparameters.
These typically include:

- The number of output classes  (```NUMBER_CLASSES = len(CIFAR_10_CLASSES)```);
- The number of training epochs (i.e., how many times the model sees the full training set)  (```EPOCHS = 500```);
- The patience for early stopping (i.e., how many consecutive epochs without improvement are allowed before stopping training)  (```EARLY_STOPPING_LIMIT = EPOCHS // 10```);
- The learning rate (```LR = 0.001```).

Additional hyperparameters may also be configured depending on the training strategy or specific use case.

In this notebook, we focus on setting the number of training epochs. We also discuss **early stopping**, a regularization technique used to prevent overfitting.
During training, the model's performance is evaluated on a validation set at the end of each epoch.
If the validation loss does not improve after a predefined number of epochs, training is stopped, and the model reverts to the best-performing checkpoint (see [Early Stopping](https://paperswithcode.com/method/early-stopping)).
The **learning rate** controls how quickly the model updates its weights during training.
If it's too high, the model may overshoot optimal loss values, leading to instability.
If it's too low, the model may converge very slowly or get stuck in a local minimum.
Although learning rate tuning is not performed in this notebook, it is an essential hyperparameter that should be carefully selected (see [What is learning rate in machine learning?](https://www.ibm.com/think/topics/learning-rate)).

After setting the hyperparameters, we define the **loss function** used to evaluate model performance (```criterion = nn.CrossEntropyLoss()```).
Both the model and loss function should be moved to the selected training device (```criterion = criterion.to(DEVICE)```).
The model is then instantiated using the `ImageClassifier` class (```model = ImageClassifier(in_channels = 3, out_classes = NUMBER_CLASSES)```) and transferred to the training device (```model = model.to(DEVICE)```).
The **optimizer** is also defined and configured on the same device (```optimizer = optim.Adam(model.parameters(), lr = LR)```).

Finally, if no pre-trained weights are available, the training process begins, and we monitor the learning curves to assess the model’s performance over time.

#### Check which device is used for training

In [None]:
import torch

# Check which device is available for training the model


#### Define training hyperparameters

In [None]:
# Get number of output classes
NUMBER_CLASSES = len(CIFAR_10_CLASSES)

# Set the number of training epochs

# Set the number of consecutive not improving epochs needed for stopping the training

# Set the learning rate

#### Loss function

The cross entropy loss function is defined by:

$$
\mathcal{L} = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)
$$

Where:

$\mathcal{L}$: Cross-entropy loss

$C$: Total number of classes

$y_i$: Ground truth indicator for class $i$, where $y_i = 1$ if class $i$ is the correct class, otherwise $y_i = 0$

$\hat{y}_i$: Predicted probability for class $i$, typically from the softmax output, where $0 \leq \hat{y}_i \leq 1$ and $\sum_{i=1}^{C} \hat{y}_i = 1$

In [None]:
import torch.nn as nn

# Initialise the Cross Entropy Loss and send it to the training device


#### Initialise model architecture

In [None]:
# Initialise image classifier and send it to the training device

#### Optimiser function

In this notebook, we are using Adam optimiser (```optimizer = optim.Adam(model.parameters(), lr = LR)```) which is one of the most used optimisers in deep neural network optimisation (see [Gentle Introduction to the Adam Optimisation Algorithm for Deep Learning](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)).

The parameter update at each step is given by:

$$
\begin{aligned}
m_t &= \beta_1 m_{t-1} + (1 - \beta_1) g_t \\
v_t &= \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 \\
\hat{m}_t &= \frac{m_t}{1 - \beta_1^t} \\
\hat{v}_t &= \frac{v_t}{1 - \beta_2^t} \\
\theta_t &= \theta_{t-1} - \alpha \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}
\end{aligned}
$$

Where:

$\theta_t$: Parameters at time step $t$

$g_t$: Gradient of the loss with respect to parameters at step $t$

$m_t$: Exponentially decaying average of past gradients (1st moment)

$v_t$: Exponentially decaying average of past squared gradients (2nd moment)

$\hat{m}_t$, $\hat{v}_t$: Bias-corrected estimates of $m_t$ and $v_t$

$\alpha$: Learning rate

$\beta_1$: Decay rate for the first moment estimate (typically 0.9)

$\beta_2$: Decay rate for the second moment estimate (typically 0.999)

$\epsilon$: Small constant to prevent division by zero (e.g., 1e-8)

In [None]:
import torch.optim as optim

# Initialise the Adam optimiser


#### Train model

Here, we train the model.
First, we initialise the class ```Trainer``` which we are going to use for training and evaluating the model using the PyTorch ```Dataset```s defined before (```trainer = Trainer(model)```).
In case, some model weights are already available, we can skip the training and using them.

In [None]:
# Initialise the Train instance, which is going to be used to train the image classifier

After initialising the trainer instance, check whether a trained model already exists.
If so, load the weights using ```model_weights = torch.load(model_path, weights_only=True)```.
Then, load the weights into the model using (```model.load_state_dict(model_weights)```).
Finally, set the model to evaluation model (```model.eval()```).
This step is essential because certain layers, such as batch normalization and dropout, behave differently during training and evaluation.
Setting the model to evaluation mode ensures they operate correctly during validation or testing. 

If no pre-trained model is available, train a new model using the training and validation sets along with the predefined hyperparameters (```trainer.fit(EPOCHS, train_dataloader, val_dataloader, optimizer, criterion, DEVICE, EARLY_STOPPING_LIMIT)```).
After training, save the best model weights (```torch.save(trainer.model.best_model_weights, model_path)```).

In [None]:
import os
import torch

# Model filename
model_path = "cnn_weights.pt"

if os.path.exists(model_path):
    pass
else:
    pass

#### Learning curves

After training the model, we can analyse the learning curves to assess the training process.
These curves, which typically display the loss over epochs for both the training and validation sets, are crucial for improving model performance.
They can help identify issues like overfitting or underfitting.
Overfitting occurs when the model performs well on the training data but poorly on the validation data, usually indicated by a widening gap between the two curves.
Underfitting, on the other hand, is suggested when both the training and validation curves show poor performance and fail to improve. By monitoring these curves, we can adjust hyperparameters or modify the model architecture to address such issues. 

First, load the log file using ```pandas``` (```training_log = pd.read_csv("training_log.txt")```).
Then, use the ```matplotlib``` library to plot the learning curves.

In [None]:
import pandas as pd
from matplotlib import pyplot as plt

# Load the training log file
training_log = None

plt.figure()
plt.plot(training_log["Train Loss"])
plt.plot(training_log["Val Loss"])
plt.legend(["Train loss", "Val loss"])
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Learning curve of image classification model")

### Model testing

Once the model has been trained and optimized, we can evaluate its performance using the ```Trainer``` class by calling the ```predict``` method with the test dataloader and the device:

```python
original_images, true_labels, predicted_labels = trainer.predict(test_dataloader, DEVICE)
```

This method returns three NumPy arrays:

- ```original_images```: the input images from the test set
- ```true_labels```: the corresponding ground truth labels
- ```predicted_labels```: the model's predicted classes

In [None]:
# Write here the line of code to predict the labels for the test set

### Explore results

To evaluate the results, we display the model's accuracy along with the confusion matrix.
The confusion matrix is a powerful evaluation tool that helps us understand the model’s performance across multiple classes.
It maps the relationship between true and predicted labels, showing the number of instances for each possible prediction-outcome pair.

#### Compute average accuracy

To compute the accuracy, get the number of test samples (```num_test_samples = len(original_images)```), check how many samples were correctly classified (```correct = (true_labels == predicted_labels).sum()```), and get the ratio (```accuracy = correct/num_test_samples```).

In [None]:
# Compute average accuracy
accuracy = 0.0
print("Accuracy:", accuracy)

#### Compute confusion matrix

To compute the confusion matrix, use ```confusion_matrix``` from scikit-learn library:

```python
cm = confusion_matrix(true_labels, predicted_labels)
```

In [None]:
from sklearn.metrics import confusion_matrix
import numpy as np
from matplotlib import pyplot as plt

# Compute confusion matrix
cm = None

# Plot confusion matrix
fig, ax = plt.subplots(figsize=(10, 8))
cax = ax.matshow(cm, cmap='Greens')

# Add labels, title, and ticks
ax.set_xticks(np.arange(NUMBER_CLASSES))
ax.set_yticks(np.arange(NUMBER_CLASSES))
ax.set_xticklabels(CIFAR_10_CLASSES)
ax.set_yticklabels(CIFAR_10_CLASSES)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix for test set of CIFAR10')

# Annotate each cell with the numeric value
for (i, j), val in np.ndenumerate(cm):
    ax.text(j, i, f'{val}', ha='center', va='center', color='black')

# Rotate class names on x-axis
plt.xticks(rotation=45)
plt.show()

### Explain image classifier predictions

Deep neural networks are often described as "black boxes" because their decision-making processes are difficult to understand and interpret.
To address this, researchers have developed various methods to make these models more explainable.
One such method is Grad-CAM (Gradient-weighted Class Activation Mapping).
Grad-CAM computes the gradients of a target class with respect to the final convolutional layers and generates a heatmap that highlights the regions of the input image most influential in the model’s prediction for that class.

#### Prepare image for Grad-CAM

To prepare the image for Grad-CAM visualization:

- First, convert it to (Height, Width, Channels) format using ```img_np = np.transpose(img, (1, 2, 0))  # shape: (H, W, C)```, and normalize its values to the [0, 1] range with ```img_np = (img_np - img_np.min()) / (img_np.max() - img_np.min())```.
  This processed image is used only for visualization, as expected by the PyTorch-GradCAM library.
- Next, modify the original image for model inference by adding a batch dimension: ```img = np.expand_dims(img, axis=0)```, then convert it to a PyTorch tensor: ```img = torch.from_numpy(img)```, and move it to the appropriate computation device using ```img = img.to(DEVICE)```.
- Finally, retrieve the predicted and true labels, as both are required for computing and visualizing the Grad-CAM output.

In [None]:
import numpy as np
import torch

# Get a batch of images
idx = 1
img = original_images[idx]
pred_label = predicted_labels[idx]
true_label = true_labels[idx]

#### Compute GradCAM heatmap

To compute the Grad-CAM heatmap:

- First, ensure that the `requires_grad` attribute of the input image tensor is set to `True` by using `img.requires_grad = True`.
  This enables gradient computation with respect to the image, which is necessary for generating class activation maps.
- Next, specify the layer to inspect using ```target_layers = [model.conv3]```.
- Typically, the last convolutional layer of the image classifier is chosen because it preserves spatial information, which is crucial for identifying the regions of the input image that most strongly influence the model's prediction. 
- Then, define the target class to be explained with ```targets = [ClassifierOutputTarget(pred_label)]```, where ```pred_label``` is the class index corresponding to the model’s predicted output (or any other class of interest).
- Finally, compute the Grad-CAM heatmap using the activations and gradients from the selected layer:
```python
# Create CAM object
with GradCAM(model=model, target_layers=target_layers) as cam:
    grad_cam_matrix = cam(input_tensor=img, targets=targets)
    grad_cam_matrix = grad_cam_matrix[0, :]
```

In [None]:
from pytorch_grad_cam import GradCAM
from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget

# Make sure input requires grad

# Define the layer(s) to inspect

# Define the target class you want to explain

# Compute CAM object

#### Visualise Grad-CAM heatmap with the image

After obtaining the Grad-CAM heatmap, we overlay it on the input image to visualise the regions that contributed most to the model’s prediction (```visualisation = show_cam_on_image(img_np, grad_cam_matrix, use_rgb=True)```).
This helps identify which pixels the model focused on when predicting the class.

In [None]:
from pytorch_grad_cam.utils.image import show_cam_on_image

# Combine CAM with image
visualisation = None

# Plot image with GradCAM output
true_class = CIFAR_10_CLASSES[true_label]
pred_class = CIFAR_10_CLASSES[pred_label]
plot_multiple_images((img_np, f"Original - {true_class}"), (visualisation, f"Grad-CAM - {pred_class}"), figsize = (5,6))