***Deep Learning Applications 2023** course, held by Professor **Andrew David Bagdanov** - University of Florence, Italy*

*Notebook and code created by **Giovanni Colombo** - Mat. 7092745*

Check the dedicated [Repository on GitHub](https://github.com/giovancombo/DLA-Labs/tree/main/lab1).

# Deep Learning Applications: Laboratory #1 - CNNs

In this first laboratory we will work relatively simple architectures to get a feel for working with Deep Models. This notebook is designed to work with PyTorch.

## Exercise 1: Warming Up
In this series of exercises I will duplicate (on a small scale) the results of the ResNet paper:

> [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385), Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, CVPR 2016.

I will do this in steps, firstly using a Multilayer Perceptron on MNIST.

What's important to recall is that the main message of the ResNet paper is that **deeper networks do not guarantee** more reduction in training loss (or in validation accuracy).
Below, I will incrementally build a sequence of experiments to verify this for different architectures, starting with an *MLP*.

The Laboratory requires me to compare multiple training runs, so I took this as a great opportunity to learn to use [Weights and Biases](https://wandb.ai/site) for performance monitoring.

### Exercise 1.1: A baseline MLP

I will now implement a *simple Multilayer Perceptron* to classify the 10 digits of MNIST, and (hopefully) train it to convergence, monitoring Training and Validation losses and accuraces with W&B.

The exercise wants me to think in an *abstract* way: I'll have to instantiate multiple models, with different hyperparameters configurations each, and train them on different datasets.
It could be a good idea to try to generalize the most possible the instantiation of every object of the training workflow. That's why I decided to try to build a single file `config.yaml`, where I put almost every variable that can help me building any model I want.

I define then a `load` function, that passes the dictionary `config` (obtained from my `.yaml` file) as an argument, in order to load the dataset we want (between MNIST and CIFAR10), transformed accordingly, and splitted into *Train*, *Validation* and *Test* sets.

The script file `models.py` contains all the model classes used for this Laboratory:
+ **MLP**, for instantiating a *Multilayer Perceptron*
+ **ResidualMLP**, for instantiating an MLP that implements *Residual Connections*
+ **CNN**, for instantiating *Convolutional Network*, with the possibility of tuning almost every possible parameter
+ **ResidualCNN**, for instantiating a ConvNet that implements *Residual Connections*
+ **ResNet**, for instantiating an actual *ResNet* as defined in the [Paper](https://arxiv.org/abs/1512.03385), in its *[9, 18, 34, 50, 101, 152]* versions.

The `build_model` function instantiates Model, Loss Function and Optimizer chosen with the `config` file, and sends it to `device`, that can be `cuda` (in my case, a *Nvidia GeForce RTX 3060 Laptop*) or `cpu`.

Functions for periodical log of Loss and Accuracy from Training and Evaluation phases.

The training loop lies in the `train` function, that takes all the objects instantiated in the previous steps and uses them to train the model.

The *forward* and *backward* passes are performed batch-wise through the `train_batch` function, that implements a tweak to reshape the input images' sizes accordingly to the model used. Same thing is done in the `validation` and `test` functions.

The `load`, `build_model`, `train` and `test` functions are all contained in a single function, `model_pipeline`, that allows me to wrap all my workflow into a *Weights & Biases* run more efficiently.

In [4]:
import torch
import yaml
import wandb

from pipeline import *
import utils

def main():

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    wandb.login()
    print("Initializing Weights & Biases run...")

    # Loading the configuration file
    with open("config.yaml") as f:
        config = yaml.safe_load(f)

    # Initializing a wandb run for logging losses, accuracies and gradients
    with wandb.init(project = config['project_name'], config = config):
        config = wandb.config

        # 1. Loading the data
        train_loader, val_loader, test_loader = load(config.dataset, config.batch_size)

        # 2. Building the model
        model, criterion, optimizer = build_model(device, config)

        # 3. Training the model
        wandb.watch(model, criterion, log="all", log_freq=config.log_interval)
        train(model, train_loader, val_loader, criterion, optimizer, device, config)

        # 4. Evaluate the model on the test set
        test_loss, test_accuracy = test(model, test_loader, device, config)

        print(f"Testing completed! | Test Loss: {test_loss:.4f}; Test Accuracy = {test_accuracy:.2f}%")
        wandb.log({"Test Loss": test_loss,
                "Test Accuracy": test_accuracy})
        wandb.unwatch(model)
        
        # 5. Saving the model, assigning it a name based on the hyperparameters used
        if config['save_model']:
            utils.save_model(config, model)


if __name__ == "__main__":
    main()

Initializing Weights & Biases run...


Dataset MNIST loaded with 50000 Train samples, 10000 Validation samples, 10000 Test samples.

Model instantiated: MLP
Number of parameters: 332938

MLP(
  (mlp): Sequential(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=128, bias=True)
    (5): ReLU()
    (6): Linear(in_features=128, out_features=128, bias=True)
    (7): ReLU()
    (8): Linear(in_features=128, out_features=128, bias=True)
    (9): ReLU()
    (10): Linear(in_features=128, out_features=128, bias=True)
    (11): ReLU()
    (12): Linear(in_features=128, out_features=128, bias=True)
    (13): ReLU()
    (14): Linear(in_features=128, out_features=128, bias=True)
    (15): ReLU()
    (16): Linear(in_features=128, out_features=128, bias=True)
    (17): ReLU()
    (18): Linear(in_features=128, out_features=128, bias=True)
    (19): ReLU()
    (20): Linear(in_features=128, ou

Training Epochs:   0%|                                                       | 0/20 [00:00<?, ?it/s]

Epoch 1/20 | Train Loss = 2.3033; Train Accuracy = 10.36%
Epoch 1/20 | Train Loss = 2.3028; Train Accuracy = 10.29%
Epoch 1/20 | Train Loss = 2.3022; Train Accuracy = 10.69%
Epoch 1/20 | Train Loss = 2.3024; Train Accuracy = 10.98%


Training Epochs:   5%|██▎                                            | 1/20 [00:34<11:02, 34.84s/it]


End of epoch 1 | Validation Loss: 2.3019; Validation Accuracy: 11.64%

Epoch 2/20 | Train Loss = 2.3021; Train Accuracy = 11.33%
Epoch 2/20 | Train Loss = 2.3023; Train Accuracy = 10.70%
Epoch 2/20 | Train Loss = 2.3022; Train Accuracy = 10.71%
Epoch 2/20 | Train Loss = 2.3015; Train Accuracy = 11.43%


Training Epochs:  10%|████▋                                          | 2/20 [01:11<10:41, 35.66s/it]


End of epoch 2 | Validation Loss: 2.3012; Validation Accuracy: 11.64%

Epoch 3/20 | Train Loss = 2.3017; Train Accuracy = 10.93%
Epoch 3/20 | Train Loss = 2.3014; Train Accuracy = 10.98%
Epoch 3/20 | Train Loss = 2.3016; Train Accuracy = 11.60%
Epoch 3/20 | Train Loss = 2.3022; Train Accuracy = 11.20%


Training Epochs:  15%|███████                                        | 3/20 [01:52<10:50, 38.27s/it]


End of epoch 3 | Validation Loss: 2.3010; Validation Accuracy: 11.64%

Epoch 4/20 | Train Loss = 2.3019; Train Accuracy = 11.08%
Epoch 4/20 | Train Loss = 2.3013; Train Accuracy = 11.12%
Epoch 4/20 | Train Loss = 2.3014; Train Accuracy = 11.02%
Epoch 4/20 | Train Loss = 2.3010; Train Accuracy = 11.43%


Training Epochs:  20%|█████████▍                                     | 4/20 [02:24<09:35, 35.94s/it]


End of epoch 4 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 5/20 | Train Loss = 2.3010; Train Accuracy = 11.18%
Epoch 5/20 | Train Loss = 2.3018; Train Accuracy = 10.91%
Epoch 5/20 | Train Loss = 2.3012; Train Accuracy = 11.24%
Epoch 5/20 | Train Loss = 2.3018; Train Accuracy = 10.99%


Training Epochs:  25%|███████████▊                                   | 5/20 [02:55<08:31, 34.09s/it]


End of epoch 5 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 6/20 | Train Loss = 2.3016; Train Accuracy = 11.00%
Epoch 6/20 | Train Loss = 2.3016; Train Accuracy = 10.88%
Epoch 6/20 | Train Loss = 2.3011; Train Accuracy = 11.35%
Epoch 6/20 | Train Loss = 2.3010; Train Accuracy = 11.47%


Training Epochs:  30%|██████████████                                 | 6/20 [03:25<07:34, 32.49s/it]


End of epoch 6 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 7/20 | Train Loss = 2.3008; Train Accuracy = 11.38%
Epoch 7/20 | Train Loss = 2.3010; Train Accuracy = 11.00%
Epoch 7/20 | Train Loss = 2.3019; Train Accuracy = 11.10%
Epoch 7/20 | Train Loss = 2.3013; Train Accuracy = 11.25%


Training Epochs:  35%|████████████████▍                              | 7/20 [03:55<06:51, 31.69s/it]


End of epoch 7 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 8/20 | Train Loss = 2.3019; Train Accuracy = 11.10%
Epoch 8/20 | Train Loss = 2.3018; Train Accuracy = 10.87%
Epoch 8/20 | Train Loss = 2.3017; Train Accuracy = 10.94%
Epoch 8/20 | Train Loss = 2.3016; Train Accuracy = 11.31%


Training Epochs:  40%|██████████████████▊                            | 8/20 [04:24<06:10, 30.86s/it]


End of epoch 8 | Validation Loss: 2.3009; Validation Accuracy: 11.64%

Epoch 9/20 | Train Loss = 2.3012; Train Accuracy = 11.18%
Epoch 9/20 | Train Loss = 2.3014; Train Accuracy = 11.10%
Epoch 9/20 | Train Loss = 2.3017; Train Accuracy = 10.98%
Epoch 9/20 | Train Loss = 2.3015; Train Accuracy = 11.16%


Training Epochs:  45%|█████████████████████▏                         | 9/20 [04:54<05:36, 30.56s/it]


End of epoch 9 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 10/20 | Train Loss = 2.3023; Train Accuracy = 10.69%
Epoch 10/20 | Train Loss = 2.3008; Train Accuracy = 11.61%
Epoch 10/20 | Train Loss = 2.3014; Train Accuracy = 11.12%
Epoch 10/20 | Train Loss = 2.3013; Train Accuracy = 11.22%


Training Epochs:  50%|███████████████████████                       | 10/20 [05:26<05:10, 31.08s/it]


End of epoch 10 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 11/20 | Train Loss = 2.3009; Train Accuracy = 11.30%
Epoch 11/20 | Train Loss = 2.3017; Train Accuracy = 11.00%
Epoch 11/20 | Train Loss = 2.3028; Train Accuracy = 10.57%
Epoch 11/20 | Train Loss = 2.3004; Train Accuracy = 11.71%


Training Epochs:  55%|█████████████████████████▎                    | 11/20 [06:07<05:08, 34.32s/it]


End of epoch 11 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 12/20 | Train Loss = 2.3019; Train Accuracy = 10.60%
Epoch 12/20 | Train Loss = 2.3008; Train Accuracy = 11.12%
Epoch 12/20 | Train Loss = 2.3012; Train Accuracy = 11.24%
Epoch 12/20 | Train Loss = 2.3015; Train Accuracy = 11.56%


Training Epochs:  60%|███████████████████████████▌                  | 12/20 [06:38<04:24, 33.06s/it]


End of epoch 12 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 13/20 | Train Loss = 2.3013; Train Accuracy = 11.19%
Epoch 13/20 | Train Loss = 2.3012; Train Accuracy = 11.46%
Epoch 13/20 | Train Loss = 2.3015; Train Accuracy = 11.13%
Epoch 13/20 | Train Loss = 2.3013; Train Accuracy = 10.94%


Training Epochs:  65%|█████████████████████████████▉                | 13/20 [07:10<03:49, 32.76s/it]


End of epoch 13 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 14/20 | Train Loss = 2.3016; Train Accuracy = 10.83%
Epoch 14/20 | Train Loss = 2.3001; Train Accuracy = 11.76%
Epoch 14/20 | Train Loss = 2.3020; Train Accuracy = 10.78%
Epoch 14/20 | Train Loss = 2.3016; Train Accuracy = 10.93%


Training Epochs:  70%|████████████████████████████████▏             | 14/20 [07:43<03:18, 33.05s/it]


End of epoch 14 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 15/20 | Train Loss = 2.3017; Train Accuracy = 11.22%
Epoch 15/20 | Train Loss = 2.3012; Train Accuracy = 11.06%
Epoch 15/20 | Train Loss = 2.3001; Train Accuracy = 11.61%
Epoch 15/20 | Train Loss = 2.3022; Train Accuracy = 11.00%


Training Epochs:  75%|██████████████████████████████████▌           | 15/20 [08:15<02:43, 32.63s/it]


End of epoch 15 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 16/20 | Train Loss = 2.3013; Train Accuracy = 11.22%
Epoch 16/20 | Train Loss = 2.3008; Train Accuracy = 11.04%
Epoch 16/20 | Train Loss = 2.3015; Train Accuracy = 10.85%
Epoch 16/20 | Train Loss = 2.3020; Train Accuracy = 10.93%


Training Epochs:  80%|████████████████████████████████████▊         | 16/20 [08:51<02:14, 33.67s/it]


End of epoch 16 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 17/20 | Train Loss = 2.3011; Train Accuracy = 11.16%
Epoch 17/20 | Train Loss = 2.3010; Train Accuracy = 11.27%
Epoch 17/20 | Train Loss = 2.3022; Train Accuracy = 10.55%
Epoch 17/20 | Train Loss = 2.3012; Train Accuracy = 11.23%


Training Epochs:  85%|███████████████████████████████████████       | 17/20 [09:27<01:42, 34.22s/it]


End of epoch 17 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 18/20 | Train Loss = 2.3011; Train Accuracy = 11.47%
Epoch 18/20 | Train Loss = 2.3018; Train Accuracy = 11.03%
Epoch 18/20 | Train Loss = 2.3012; Train Accuracy = 11.05%
Epoch 18/20 | Train Loss = 2.3010; Train Accuracy = 11.40%


Training Epochs:  90%|█████████████████████████████████████████▍    | 18/20 [10:01<01:08, 34.14s/it]


End of epoch 18 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 19/20 | Train Loss = 2.3014; Train Accuracy = 11.27%
Epoch 19/20 | Train Loss = 2.3021; Train Accuracy = 10.50%
Epoch 19/20 | Train Loss = 2.3007; Train Accuracy = 11.63%
Epoch 19/20 | Train Loss = 2.3017; Train Accuracy = 11.01%


Training Epochs:  95%|███████████████████████████████████████████▋  | 19/20 [10:31<00:33, 33.16s/it]


End of epoch 19 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Epoch 20/20 | Train Loss = 2.3013; Train Accuracy = 10.98%
Epoch 20/20 | Train Loss = 2.3019; Train Accuracy = 10.83%
Epoch 20/20 | Train Loss = 2.3008; Train Accuracy = 11.56%
Epoch 20/20 | Train Loss = 2.3013; Train Accuracy = 11.56%


Training Epochs: 100%|██████████████████████████████████████████████| 20/20 [11:02<00:00, 33.12s/it]


End of epoch 20 | Validation Loss: 2.3008; Validation Accuracy: 11.64%

Training completed!





Testing completed! | Test Loss: 2.3010; Test Accuracy = 11.35%


0,1
Epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
Test Accuracy,▁
Test Loss,▁
Train Accuracy,▃▅▃▄▆▃▄▅▅▄▄▄▄▆▂█▇▁▅▃
Train Loss,█▃▄▄▂▄▄▂▂▃▄▃▅▁▃▁▂▄▂▃
Training/Training Accuracy,▆▅▆▃█▆▅▅▃▆▆▂▆▅▅▃▃▃▄▁▄▃▆▄▂▅▄▅▃▅▅▅▅▅▆▃▆▄▆▇
Training/Training Epochs,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
Training/Training Loss,▄▄▁▄▂▄▂▃▆▂▃▅▃▃▆▄▆▆▆█▄▆▂▇▆▂▄▃▆▄▃▂▃▄▃▅▁▅▂▂
Validation Accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Validation Loss,█▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
Epoch,20.0
Test Accuracy,11.35
Test Loss,2.30102
Train Accuracy,11.03276
Train Loss,2.30148
Training/Training Accuracy,12.89062
Training/Training Epochs,20.0
Training/Training Loss,2.29915
Validation Accuracy,11.64
Validation Loss,2.30078


### Exercise 1.2: Rinse and Repeat

I will now repeat the verification I did above, but with **Convolutional** Neural Networks.
This specific part of the exercise focuses on revealing that **deeper** CNNs *without* residual connections do not always work better, and **even deeper** ones *with* residual connections.

**Note**: MNIST is *very* easy to work on (at least up to about 99% accuracy), so I will work on **CIFAR10** from now on.

Launching the `model_pipeline` function with its proper configuration allows me to observe the performance of multiple kinds of Convolutional architectures.

The focus, here, is on playing with the total **depth** (i.e. the number of layers) of the network, while maintaining the general architecture untouched, in order to show that a **deeper** ConvNet provides better performances, **up to a certain depth (!)**.

All logs and trackings of my runs are available on Weights & Biases, at [this link](https://wandb.ai/giovancombo/DLA_Lab1_CNN?workspace=user-giovancombo).

In [None]:
# Change dataset and model settings in config.yaml
# config.dataset = ["MNIST", "CIFAR10"]; config.convnet = True

if __name__ == "__main__":
    main()

...Well, as previously said, reaching a very high Validation Accuracy on **MNIST** is *very* easy.
Let's try then to train some ConvNets on the **CIFAR10** dataset.

-----
## Exercise 2: Choose at Least One

Let's now deepen our understanding of Deep Networks for visual recognition.

+ Firstly, I will find a quantitative answer about *how* and *why* Redidual Networks learn more efficiently than their Convolutional counterparts.
+ Secondly, I will become a *network surgeon*, trying to fully-convolutionalize a network by acting on its final layers.
+ Thirdly, I will try to implement *Class Activation Maps*, in order to see which parts of an image were the most decisive for its classification.

### Exercise 2.1: Explain why Residual Connections are so effective

The question *"Why Residual Networks learn more efficiently than Convolutional Networks?"* can find an answer by looking at the gradient magnitudes passing through the networks, during backpropagation.

`wandb.watch(log = "all")` tells *Weights & Biases* to log *gradients* and *parameters*' evolution in all the layers of the network. This functionality is useful to graphically visualize the concept of **Vanishing Gradients**.

For this exercise, I firstly tried to run a basic *MLP*, and then an *MLP with Residual Connections*. Honestly, at the time, I didn't think that this could be a very clever idea, since I've always seen Residuals been added only on Convolutional Networks, but... I decided to give it a try anyway.

As mentioned before, I compared these two architectures by challenging them on their performance over their **depth** (i.t. their number of layers).

A basic **10-layer MLP** is seen suffering from Vanishing Gradients, with its accuracy dropping all the way down to 10%, that means picking a class **by chance**.

As mentioned in the original [ResNet paper](https://arxiv.org/abs/1512.03385), a higher number of layers leads to not only higher validation loss, but also a *higher training loss*: this means that we are not facing overfitting, but in the "weird" behavior that a deeper model shows itself.

On the contrary, the **10-layer Residual MLP** performed well, confirming the explanation of ResNet authors: Residual Connections allow a network to go **a lot** deeper (with the only limitation of reaching overfitting).

The results can be quantitatively checked by observing the *W&B* logs about gradient magnitudes. The basic **MLP** shows gradients that are very close to zero, meaning that the model is not making any real progress.

Conversely, the **Residual MLP** showed gradients that did not vanish nor explode, and progressively diminishing their magnitude during training, meaning that the model is proceeding towards convergence on a (local, hopefully global) optimum.

The same behaviour can be detected while working on ConvNets and their Residual versions (check gradients on *W&B*).

### Exercise 2.2: Fully-convolutionalize a network.

I decided to save the best model trained so far, the **ResidualCNN** with (..config), and **fully-convolutionalize** it. That is, turn it into a network that can predict classification outputs at *all* pixels in an input image.

One goal of this eercise is trying to turn this into a **detector** of handwritten digits.

**Hint**: To test my fully-convolutionalized network, I might need to write some functions to take random MNIST samples and embed them into a larger image (i.e. in a regular grid or at random positions), in order to create examples on which train the network at *detecting* digits.

In [None]:
# Change dataset and model settings in config.yaml
# config.dataset = "CIFAR10"; config.fullycnn = True

if __name__ == "__main__":
    main()

(Mostrare Plots di 3/4 immagini con detection effettuata)

The ConvNets built in the previous exercise have a global Average Pooling layer and a Fully Connected Layer at the end, in order to merge all infro from the convolutions in a single prediction for all the image, on the 10 MNIST/CIFAR10 classes.

In a Fully Convolutional Network, we need instead to produce a prediction for every single one of the 28x28 (32x32) pixels of an image. I then proceed to do a "network surgery", removing the two layers mentioned above and rearranging the net to have the dimension of the input image as output.

### Exercise 2.3: *Explain* the predictions of a CNN

In order to predict the correct class of an image, a ConvNet exploits its "hierarchical" architecture to create feature maps at different layers of abstraction of information.

The composition of every bit of information extracted determines the whole set of details and peculiarities of an image that links it to a specific class.

A lot of work has been done in recent years to try to look inside the black box, and find a way to quantitatively *explain* how a prediction was made. One of these ways is to implement [*Class Activation Maps*](http://cnnlocalization.csail.mit.edu/#:~:text=A%20class%20activation%20map%20for,decision%20made%20by%20the%20CNN.):

> B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. CVPR'16 (arXiv:1512.04150, 2015).

Let's demonstrate how my trained CNN *attends* to specific image features to recognize *specific* classes.

For this task, I decided to borrow the code from this source (link), in order to try to apply CAMs to some CIFAR10 images.

Moreover, as a passionate photographer, since we're talking about images, I *HAD* to try to create CAMs of some of my favourite photographs. Here are some visual results!

In [None]:
import cv2
import torchvision.utils as vutils
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# model = torch.load(RESIDUALCNN)

def cam_test(model, test_loader, epoch):
    
    classes = ['airplane','bird','car','cat','deer','dog','horse','monkey','ship','truck']

    params = [param for param in model.parameters()]
    
    model.eval()
    with torch.no_grad():
        correct, total = 0, 0
        for images, labels in test_loader:

            images, labels = images.to(device), labels.to(device)
            oututs, b_gap, a_gap = model(images)
            
            _, predicated = torch.max(oututs.data, 1)
            total += labels.size(0)
            correct += (predicated == labels).sum().item()
            
            image_labels, image_paths = [], []

            for i in range(5):

                k = i
                vutils.save_image(images[k], f"Lab1/img/image{i}.jpg")

                # for cam using only the weights of the class predicted
                 
                #weights = params[-2][predicated[i].item()].detach()
                #c = torch.sum(b_gap[k]*weights[:,None, None], dim = 0)

                #using global average pooling parameters

                c = torch.sum(b_gap[k]*a_gap[k][:,None, None], dim = 0)
                
                c = (c-torch.min(c))/(torch.max(c)-torch.min(c))
            
                cam_img = np.uint8(255 * c.cpu().numpy())

                hm = cv2.applyColorMap(cv2.resize(cam_img, (96, 96)), cv2.COLORMAP_JET)
            
                re = hm*0.3+(images[k].permute(1,2,0).cpu().numpy()*255 )*0.4

                cv2.imwrite(f"Lab1/img/CAM{i}.jpg", re)

                image_labels.append(classes[labels[k]]+"-"+classes[predicated[k]])
                image_paths.append(f"Lab1/img/CAM{i}.jpg")

            utils.plot_images(image_paths, image_labels, epoch)
            break

In [None]:
import cv2
import numpy as np
import torch
from PIL import Image
from matplotlib import pyplot as plt
from torch.autograd import Variable
from torch.nn import functional as F
from torchvision import transforms
from torchvision.datasets import CIFAR10
import torchvision.transforms.functional as TF

# same 10 classes of cifar
classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
cifar = True  # xxx else i have an high quality truck image
if cifar:
    # 1 ship
    # 7 frog
    # 27 airplane
    # 42 dog
    image_idx = 18  # indice dell'immagine da testare
    transform = transforms.Compose(
        [transforms.ToTensor()])
    test_set = CIFAR10(root='./data', train=False,
                       download=True, transform=transform)
    image, label = test_set[image_idx]
    pil_image = TF.to_pil_image(image)
    # Display the image
    plt.imshow(pil_image)
    plt.show()
    # Save the image
    image_file = 'images/cifar_' + str(classes[label]) + '.jpg'
    pil_image.save(image_file)
    print("real label:", classes[label])
else:
    image_file = 'images/hd_truck.jpg'
    print("using hd image:", image_file)


finalconv_name = "features"
# net = torch.load("./model/resnet_cnn-ep5-lr0.001-bs512-depth5-residual.pt")
net = torch.load("./model/resnet_to_convergence/cnn-ep5-lr0.004-bs64-depth25-residual.pt")
print(net)
net.eval()

# hook the feature extractor
features_blobs = []


def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())


net._modules.get(finalconv_name).register_forward_hook(hook_feature)

# get the softmax weight
params = list(net.parameters())
weight_softmax = np.squeeze(params[-2].data.cpu().numpy())


def returnCAM(feature_conv, weight_softmax, class_idx):
    # generate the class activation maps upsample to 256x256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    for idx in class_idx:
        cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h * w)))
        cam = cam.reshape(h, w)
        cam = cam - np.min(cam)
        cam_img = cam / np.max(cam)
        cam_img = np.uint8(255 * cam_img)
        output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam


# normalize = transforms.Normalize(
#     mean=[0.485, 0.456, 0.406],
#     std=[0.229, 0.224, 0.225]
# )
preprocess = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    # normalize
])

# load test image
img_pil = Image.open(image_file)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))
logit = net(img_variable.to('cuda'))

h_x = F.softmax(logit, dim=1).data.squeeze()
probs, idx = h_x.sort(0, True)
probs = probs.cpu().numpy()
idx = idx.cpu().numpy()

# output the prediction
for i in range(0, 10):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))

# generate class activation mapping for the top1 prediction
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])

# render the CAM and output
print('output CAM.jpg for the top1 prediction: %s' % classes[idx[0]])
img = cv2.imread(image_file)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
# result = heatmap * 0.3 + img * 0.5
result = heatmap * 0.4 + img * 0.5
if cifar:
    cv2.imwrite('images/CAM_cifar_' + str(classes[label]) + '_idx' + str(image_idx) + '_probs' + str(probs[0]) + '.jpg',
                result)
else:
    cv2.imwrite('images/CAM_hd_truck_probs' + str(probs[0]) + '.jpg', result)