***Deep Learning Applications 2023** course, held by Professor **Andrew David Bagdanov** - University of Florence, Italy*

*Notebook and code created by **Giovanni Colombo** - Mat. 7092745*

Check the [Repository on GitHub](https://github.com/giovancombo/DLA-Labs/tree/main/lab1).

# Deep Learning Applications: Laboratory #1

In this first laboratory we will work relatively simple architectures to get a feel for working with Deep Models. This notebook is designed to work with PyTorch.

## Exercise 1: Warming Up
In this series of exercises I will duplicate (on a small scale) the results of the ResNet paper:

> [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385), Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, CVPR 2016.

I will do this in steps, firstly using a Multilayer Perceptron on MNIST.

What's important to recall is that the main message of the ResNet paper is that **deeper networks do not guarantee** more reduction in training loss (or in validation accuracy).
Below, I will incrementally build a sequence of experiments to verify this for different architectures, starting with an *MLP*.

The Laboratory requires me to compare multiple training runs, so I took this as a great opportunity to learn to use [Weights and Biases](https://wandb.ai/site) for performance monitoring.

### Exercise 1.1: A baseline MLP

I will now implement a *simple Multilayer Perceptron* to classify the 10 digits of MNIST, and (hopefully) train it to convergence, monitoring Training and Validation losses and accuraces with W&B.

The exercise wants me to think in an *abstract* way: I'll have to instantiate multiple models, with different hyperparameters configurations each, and train them on different datasets.
It could be a good idea to try to generalize the most possible the instantiation of every object of the training workflow. That's why I decided to try to build a single file `config.yaml`, where I put almost every variable that can help me building any model I want.

In [1]:
# Imports and dependencies
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import numpy as np
import yaml
# Importing Weights and Biases for tracking and comparing different runs
import wandb

import aux_functions, models

# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
                                      if not mirror.startswith("http://yann.lecun.com")]

  from .autonotebook import tqdm as notebook_tqdm


I define a `load` function, that passes the dictionary `config` (obtained from my `.yaml` file) as an argument, in order to load the dataset we want (between MNIST and CIFAR10), transformed accordingly, and splitted into *Train*, *Validation* and *Test* sets.

In [2]:
def load(config):
    # Transformations applied to the dataset
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307,), (0.3081,)) if config.dataset == "MNIST" else transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])

    train_data = getattr(torchvision.datasets, config.dataset)("./data", train = True, download = True, transform = transform)
    test_set = getattr(torchvision.datasets, config.dataset)("./data", train = False, download = True, transform = transform)
    
    # Splitting the Training Set into Training and Validation Sets
    train_set, val_set = random_split(train_data, [len(train_data) - config.val_size, config.val_size])

    train_loader = DataLoader(train_set, config.batch_size, shuffle = True, num_workers = config.num_workers)
    val_loader = DataLoader(val_set, config.batch_size, num_workers = config.num_workers)
    test_loader = DataLoader(test_set, config.batch_size, num_workers = config.num_workers)

    print(f"Dataset {config.dataset} loaded with {len(train_set)} Train samples, {len(val_set)} Validation samples, {len(test_set)} Test samples.\n")

    return train_loader, val_loader, test_loader

The script file `models.py` contains all the model classes used for this Laboratory:
+ **MLP**, for instantiating a *Multilayer Perceptron*
+ **ResidualMLP**, for instantiating an MLP that implements *Residual Connections*
+ **CNN**, for instantiating *Convolutional Network*, with the possibility of tuning almost every possible parameter
+ **ResidualCNN**, for instantiating a ConvNet that implements *Residual Connections*
+ **ResNet**, for instantiating an actual *ResNet* as defined in the [Paper](https://arxiv.org/abs/1512.03385), in its *[9, 18, 34, 50, 101, 152]* versions.

The `build_model` function instantiates Model, Loss Function and Optimizer chosen with the `config` file, and sends it to `device`, that can be `cuda` (in my case, a *Nvidia GeForce RTX 3060 Laptop*) or `cpu`.

In [3]:
def build_model(config):
    # Building the model
    if config.convnet:
        if config.residual:
            if config.resnet:
                m = models.ResNet(config.resnet_name, config.input_shape, config.resnet_hidden_size, config.classes, config.activation, config.use_bn, config.dropout)
            else:
                m = models.ResidualCNN(config.input_shape, config.CNN_hidden_size, config.classes, config.depth, config.activation, config.use_bn)
        else:
            m = models.CNN(config.input_shape, config.CNN_hidden_size, config.classes, config.depth, config.kernel_size, config.stride, config.padding, config.activation, config.dropout, config.pool, config.pool_size, config.use_bn)
    elif config.residual:
        m = models.ResidualMLP(config.input_size, config.MLP_hidden_size, config.classes, config.activation, config.dropout)
    else:
        m = models.MLP(config.input_size, config.MLP_hidden_size, config.classes, config.activation, config.dropout)
    
    model = m.to(device)

    # Defining the loss function and the optimizer
    criterion = nn.CrossEntropyLoss()

    if config.optimizer == "Adam":
        optimizer = torch.optim.Adam(model.parameters(),
                                     lr = config.learning_rate,
                                     weight_decay = config.weight_decay)
    elif config.optimizer == "SGD" or config.optimizer == "RMSprop":
        optimizer = getattr(torch.optim, config.optimizer)(model.parameters(),
                                                           lr = config.learning_rate,
                                                           momentum = config.momentum,
                                                           weight_decay = config.weight_decay)

    print(f"Model instantiated: {model.__class__.__name__}")
    print(f"Number of parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}\n")
    print(model)
    print(f"Optimizer: {optimizer.__class__.__name__}")
    print(optimizer)
    print(f"Loss: {criterion}")
    print(f"Device: {device}")

    return model, criterion, optimizer

Functions for periodical log of Loss and Accuracy from Training and Evaluation phases.

In [4]:
# Function for periodical log of training data
def log_train(epoch, loss, accuracy, mean_loss, mean_acc, example_ct, config):
    print(f'Epoch {epoch + 1}/{config.epochs} | Train Loss = {mean_loss:.4f}; Train Accuracy = {mean_acc:.2f}%')
    wandb.log({"Training/Training Loss": loss,
                "Training/Training Accuracy": accuracy,
                "Training/Training Epochs": epoch + 1}, step = example_ct)

# Function for log of validation data at the end of an epoch
def log_validation(epoch, mean_loss, val_loss, mean_accuracy, val_accuracy, example_ct):
    print(f'\nEnd of epoch {epoch + 1} | Validation Loss: {val_loss:.4f}; Validation Accuracy: {val_accuracy}%\n')

    wandb.log({"Train Loss": mean_loss, 
               "Validation Loss": val_loss,
               "Epoch": epoch + 1,
               "Train Accuracy": mean_accuracy,
               "Validation Accuracy": val_accuracy}, step = example_ct)

The training loop lies in the `train` function, that takes all the objects instantiated in the previous steps and uses them to train the model.

The *forward* and *backward* passes are performed batch-wise through the `train_batch` function, that implements a tweak to reshape the input images' sizes accordingly to the model used. Same thing is done in the `validation` and `test` functions.

In [5]:
# Training Loop
def train(model, train_loader, val_loader, criterion, optimizer, config):

    # Telling W&B to watch gradients and the model parameters
    wandb.watch(model, criterion, log = "all", log_freq = config.log_interval)
    example_ct = 0

    print("\nStarting training...")
    for epoch in tqdm(range(config.epochs), desc = "Training Epochs", ncols = 100):
        model.train()
        losses, accuracies = [], []
        for batch, (images, labels) in enumerate(train_loader):
            loss, accuracy = train_batch(model, images, labels, criterion, optimizer, config)

            example_ct += len(images)
            losses.append(loss.item())
            accuracies.append(accuracy)
            mean_loss, mean_accuracy = np.mean(losses[-config.log_interval:]), np.mean(accuracies[-config.log_interval:])

            if ((batch + 1) % config.log_interval) == 0:
                log_train(epoch, loss, accuracy, mean_loss, mean_accuracy, example_ct, config)

        # Validation at the end of the epoch
        val_loss, val_accuracy = test(model, val_loader, config)

        # Logging losses and accuracies at the end of the epoch
        log_validation(epoch, mean_loss, val_loss, mean_accuracy, val_accuracy, example_ct)
    
    print("Training completed!")


# Function for training a single batch
def train_batch(model, images, labels, criterion, optimizer, config):
    if not config.convnet:
        images = images.reshape(-1, config.input_size).to(device)
    else:
        images = images.to(device)
    labels = labels.to(device)
    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)

    # Backward pass
    optimizer.zero_grad()
    loss.backward() 
    optimizer.step()

    # Calculating training accuracy
    correct, total = 0, 0
    _, pred = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (pred == labels).sum().item()
        
    accuracy = 100 * correct / total

    return loss, accuracy


# Evaluation Loop (for Validation and Test)
@torch.no_grad()
def test(model, test_loader, config):
    test_loss = 0
    correct, total = 0, 0
    model.eval()
    for images, labels in test_loader:
        if not config.convnet:
            images = images.reshape(-1, config.input_size).to(device)
        else:
            images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)

        test_loss += F.cross_entropy(outputs, labels, reduction = 'sum')
        _, pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (pred == labels).sum().item()
    
    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / total

    return test_loss, test_accuracy

The `load`, `build_model`, `train` and `test` functions are all contained in a single function, `model_pipeline`, that allows me to wrap all my workflow into a *Weights & Biases* run more efficiently.

In [6]:
def model_pipeline(project_name):
    # Loading the yaml file containing the hyperparameter configuration
    with open("config.yaml") as f:
        config = yaml.safe_load(f)

    wandb.login()
    print("Initializing Weights & Biases run...")

    # Initializing a wandb run for logging losses, accuracies and gradients
    with wandb.init(project = project_name, config = config):
        config = wandb.config

        # 1. Load the data
        train_loader, val_loader, test_loader = load(config)

        # 2. Build the model
        model, criterion, optimizer = build_model(config)

        # 3. Train the model
        train(model, train_loader, val_loader, criterion, optimizer, config)

        # 4. Evaluate the model on the test set
        test_loss, test_accuracy = test(model, test_loader, config)

        print(f"Testing completed! | Test Loss: {test_loss:.4f}; Test Accuracy = {test_accuracy:.2f}%")
        wandb.log({"Test Loss": test_loss,
                "Test Accuracy": test_accuracy})

    # 5. Saving the model, assigning it a name based on the hyperparameters used
    if config['save_model']:
        folder = f"models/{config['dataset']}/{model.__class__.__name__}"
        model_name = aux_functions.model_path(config, model.__class__.__name__)
        if not os.path.exists(folder):
            os.makedirs(folder)
        torch.save(model.state_dict(), folder + model_name + ".pt")
        print("Model saved!")

In [7]:
model_pipeline(project_name = "DLA_Lab1_CNN")

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mgiovancombo[0m. Use [1m`wandb login --relogin`[0m to force relogin


Initializing Weights & Biases run...


Files already downloaded and verified
Files already downloaded and verified
Dataset CIFAR10 loaded with 40000 Train samples, 10000 Validation samples, 10000 Test samples.

Model instantiated: CNN
Number of parameters: 657290

CNN(
  (act): ReLU()
  (convlayers): Sequential(
    (0): ConvBlock(
      (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): ReLU()
    (2): Identity()
  )
  (fc): Sequential(
    (0): Dropout(p=0.3, inplace=False)
    (1): Linear(in_features=65536, out_features=10, bias=True)
  )
)
Optimizer: Adam
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0005
    maximize: False
    weight_decay: 0.0001
)
Loss: CrossEntropyLoss()
Device: cuda:0

Starting training...


Training Epochs:   0%|                                                       | 0/20 [00:00<?, ?it/s]

Epoch 1/20 | Train Loss = 7.1316; Train Accuracy = 18.69%
Epoch 1/20 | Train Loss = 4.2960; Train Accuracy = 27.87%
Epoch 1/20 | Train Loss = 2.7594; Train Accuracy = 34.28%
Epoch 1/20 | Train Loss = 1.9859; Train Accuracy = 39.66%


Training Epochs:   5%|██▎                                            | 1/20 [01:08<21:36, 68.25s/it]


End of epoch 1 | Validation Loss: 1.7783; Validation Accuracy: 41.96%

Epoch 2/20 | Train Loss = 1.7108; Train Accuracy = 44.56%
Epoch 2/20 | Train Loss = 1.5636; Train Accuracy = 46.97%
Epoch 2/20 | Train Loss = 1.4298; Train Accuracy = 50.86%
Epoch 2/20 | Train Loss = 1.4389; Train Accuracy = 50.15%


Training Epochs:  10%|████▋                                          | 2/20 [02:15<20:14, 67.48s/it]


End of epoch 2 | Validation Loss: 1.5307; Validation Accuracy: 47.57%

Epoch 3/20 | Train Loss = 1.3855; Train Accuracy = 52.15%
Epoch 3/20 | Train Loss = 1.3447; Train Accuracy = 53.68%
Epoch 3/20 | Train Loss = 1.3222; Train Accuracy = 54.25%
Epoch 3/20 | Train Loss = 1.2920; Train Accuracy = 56.08%


Training Epochs:  15%|███████                                        | 3/20 [03:16<18:20, 64.76s/it]


End of epoch 3 | Validation Loss: 1.4879; Validation Accuracy: 48.22%

Epoch 4/20 | Train Loss = 1.3117; Train Accuracy = 54.20%
Epoch 4/20 | Train Loss = 1.2746; Train Accuracy = 55.98%
Epoch 4/20 | Train Loss = 1.2712; Train Accuracy = 56.31%
Epoch 4/20 | Train Loss = 1.2580; Train Accuracy = 57.88%


Training Epochs:  20%|█████████▍                                     | 4/20 [04:17<16:48, 63.03s/it]


End of epoch 4 | Validation Loss: 1.4986; Validation Accuracy: 50.06%

Epoch 5/20 | Train Loss = 1.2135; Train Accuracy = 58.12%
Epoch 5/20 | Train Loss = 1.1677; Train Accuracy = 59.50%
Epoch 5/20 | Train Loss = 1.1456; Train Accuracy = 60.70%
Epoch 5/20 | Train Loss = 1.1371; Train Accuracy = 60.44%


Training Epochs:  25%|███████████▊                                   | 5/20 [05:17<15:31, 62.11s/it]


End of epoch 5 | Validation Loss: 1.3129; Validation Accuracy: 54.27%

Epoch 6/20 | Train Loss = 1.1173; Train Accuracy = 60.83%
Epoch 6/20 | Train Loss = 1.1033; Train Accuracy = 61.75%
Epoch 6/20 | Train Loss = 1.1019; Train Accuracy = 61.12%
Epoch 6/20 | Train Loss = 1.0732; Train Accuracy = 63.66%


Training Epochs:  30%|██████████████                                 | 6/20 [06:18<14:23, 61.69s/it]


End of epoch 6 | Validation Loss: 1.2919; Validation Accuracy: 54.75%

Epoch 7/20 | Train Loss = 1.0388; Train Accuracy = 64.22%
Epoch 7/20 | Train Loss = 1.0404; Train Accuracy = 63.96%
Epoch 7/20 | Train Loss = 1.0366; Train Accuracy = 64.35%
Epoch 7/20 | Train Loss = 1.0531; Train Accuracy = 63.35%


Training Epochs:  35%|████████████████▍                              | 7/20 [07:18<13:16, 61.29s/it]


End of epoch 7 | Validation Loss: 1.4996; Validation Accuracy: 50.27%

Epoch 8/20 | Train Loss = 1.1258; Train Accuracy = 60.89%
Epoch 8/20 | Train Loss = 1.0522; Train Accuracy = 63.81%
Epoch 8/20 | Train Loss = 1.0085; Train Accuracy = 65.10%
Epoch 8/20 | Train Loss = 1.0370; Train Accuracy = 63.81%


Training Epochs:  40%|██████████████████▊                            | 8/20 [08:30<12:53, 64.42s/it]


End of epoch 8 | Validation Loss: 1.4727; Validation Accuracy: 51.1%

Epoch 9/20 | Train Loss = 1.0411; Train Accuracy = 63.76%
Epoch 9/20 | Train Loss = 0.9577; Train Accuracy = 66.90%
Epoch 9/20 | Train Loss = 0.9660; Train Accuracy = 67.12%
Epoch 9/20 | Train Loss = 1.0319; Train Accuracy = 64.67%


Training Epochs:  45%|█████████████████████▏                         | 9/20 [09:29<11:32, 62.95s/it]


End of epoch 9 | Validation Loss: 1.3681; Validation Accuracy: 54.77%

Epoch 10/20 | Train Loss = 0.9442; Train Accuracy = 67.62%
Epoch 10/20 | Train Loss = 0.9016; Train Accuracy = 68.90%
Epoch 10/20 | Train Loss = 0.9338; Train Accuracy = 68.36%
Epoch 10/20 | Train Loss = 0.9067; Train Accuracy = 68.92%


Training Epochs:  50%|███████████████████████                       | 10/20 [10:39<10:50, 65.02s/it]


End of epoch 10 | Validation Loss: 1.4039; Validation Accuracy: 54.92%

Epoch 11/20 | Train Loss = 0.9321; Train Accuracy = 68.13%
Epoch 11/20 | Train Loss = 0.9071; Train Accuracy = 68.62%
Epoch 11/20 | Train Loss = 0.9023; Train Accuracy = 68.34%
Epoch 11/20 | Train Loss = 0.9124; Train Accuracy = 67.55%


Training Epochs:  55%|█████████████████████████▎                    | 11/20 [11:53<10:10, 67.88s/it]


End of epoch 11 | Validation Loss: 1.3885; Validation Accuracy: 55.39%

Epoch 12/20 | Train Loss = 0.8832; Train Accuracy = 69.35%
Epoch 12/20 | Train Loss = 0.8457; Train Accuracy = 70.79%
Epoch 12/20 | Train Loss = 0.8328; Train Accuracy = 71.49%
Epoch 12/20 | Train Loss = 0.8546; Train Accuracy = 70.91%


Training Epochs:  60%|███████████████████████████▌                  | 12/20 [13:14<09:35, 71.90s/it]


End of epoch 12 | Validation Loss: 1.2670; Validation Accuracy: 57.36%

Epoch 13/20 | Train Loss = 0.7989; Train Accuracy = 72.56%
Epoch 13/20 | Train Loss = 0.8110; Train Accuracy = 72.42%
Epoch 13/20 | Train Loss = 0.8087; Train Accuracy = 72.15%
Epoch 13/20 | Train Loss = 0.8302; Train Accuracy = 70.50%


Training Epochs:  65%|█████████████████████████████▉                | 13/20 [14:24<08:17, 71.10s/it]


End of epoch 13 | Validation Loss: 1.2117; Validation Accuracy: 59.58%

Epoch 14/20 | Train Loss = 0.7609; Train Accuracy = 74.15%
Epoch 14/20 | Train Loss = 0.7738; Train Accuracy = 73.40%
Epoch 14/20 | Train Loss = 0.7846; Train Accuracy = 72.72%
Epoch 14/20 | Train Loss = 0.8160; Train Accuracy = 71.11%


Training Epochs:  70%|████████████████████████████████▏             | 14/20 [15:43<07:20, 73.45s/it]


End of epoch 14 | Validation Loss: 1.3631; Validation Accuracy: 57.61%

Epoch 15/20 | Train Loss = 0.7980; Train Accuracy = 72.47%
Epoch 15/20 | Train Loss = 0.7817; Train Accuracy = 73.08%
Epoch 15/20 | Train Loss = 0.7605; Train Accuracy = 73.69%
Epoch 15/20 | Train Loss = 0.7815; Train Accuracy = 72.10%


Training Epochs:  75%|██████████████████████████████████▌           | 15/20 [17:00<06:12, 74.55s/it]


End of epoch 15 | Validation Loss: 1.3877; Validation Accuracy: 56.31%

Epoch 16/20 | Train Loss = 0.7768; Train Accuracy = 72.77%
Epoch 16/20 | Train Loss = 0.7235; Train Accuracy = 75.31%
Epoch 16/20 | Train Loss = 0.7447; Train Accuracy = 73.85%
Epoch 16/20 | Train Loss = 0.7664; Train Accuracy = 73.76%


Training Epochs:  80%|████████████████████████████████████▊         | 16/20 [17:59<04:40, 70.05s/it]


End of epoch 16 | Validation Loss: 1.2505; Validation Accuracy: 59.67%

Epoch 17/20 | Train Loss = 0.7262; Train Accuracy = 75.25%
Epoch 17/20 | Train Loss = 0.7018; Train Accuracy = 76.20%
Epoch 17/20 | Train Loss = 0.7138; Train Accuracy = 75.29%
Epoch 17/20 | Train Loss = 0.7214; Train Accuracy = 74.80%


Training Epochs:  85%|███████████████████████████████████████       | 17/20 [19:03<03:24, 68.19s/it]


End of epoch 17 | Validation Loss: 1.2408; Validation Accuracy: 59.5%

Epoch 18/20 | Train Loss = 0.6914; Train Accuracy = 76.01%
Epoch 18/20 | Train Loss = 0.6958; Train Accuracy = 75.71%
Epoch 18/20 | Train Loss = 0.6888; Train Accuracy = 76.30%
Epoch 18/20 | Train Loss = 0.7045; Train Accuracy = 75.76%


Training Epochs:  90%|█████████████████████████████████████████▍    | 18/20 [20:23<02:23, 71.64s/it]


End of epoch 18 | Validation Loss: 1.2881; Validation Accuracy: 59.63%

Epoch 19/20 | Train Loss = 0.6806; Train Accuracy = 76.41%
Epoch 19/20 | Train Loss = 0.6670; Train Accuracy = 76.31%
Epoch 19/20 | Train Loss = 0.6932; Train Accuracy = 76.20%
Epoch 19/20 | Train Loss = 0.7129; Train Accuracy = 76.47%


Training Epochs:  95%|███████████████████████████████████████████▋  | 19/20 [21:23<01:08, 68.13s/it]


End of epoch 19 | Validation Loss: 1.3621; Validation Accuracy: 56.84%

Epoch 20/20 | Train Loss = 0.7027; Train Accuracy = 75.25%
Epoch 20/20 | Train Loss = 0.6842; Train Accuracy = 76.00%
Epoch 20/20 | Train Loss = 0.6806; Train Accuracy = 76.11%
Epoch 20/20 | Train Loss = 0.6914; Train Accuracy = 75.34%


Training Epochs: 100%|██████████████████████████████████████████████| 20/20 [22:31<00:00, 67.56s/it]


End of epoch 20 | Validation Loss: 1.3112; Validation Accuracy: 58.73%

Training completed!





Testing completed! | Test Loss: 1.3529; Test Accuracy = 57.97%


0,1
Epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
Test Accuracy,▁
Test Loss,▁
Train Accuracy,▁▃▄▄▅▆▆▆▆▇▆▇▇▇▇▇████
Train Loss,█▅▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁
Training/Training Accuracy,▁▃▅▅▅▅▅▅▆▆▆▆▆▇▆▆▆▇▇▇▇▇▇▇▇▇█▇███▇███████▇
Training/Training Epochs,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
Training/Training Loss,█▃▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Validation Accuracy,▁▃▃▄▆▆▄▅▆▆▆▇█▇▇███▇█
Validation Loss,█▅▄▅▂▂▅▄▃▃▃▂▁▃▃▁▁▂▃▂

0,1
Epoch,20.0
Test Accuracy,57.97
Test Loss,1.35286
Train Accuracy,75.3418
Train Loss,0.69137
Training/Training Accuracy,70.3125
Training/Training Epochs,20.0
Training/Training Loss,0.63798
Validation Accuracy,58.73
Validation Loss,1.3112


Model saved!


### Exercise 1.2: Rinse and Repeat

I will now repeat the verification I did above, but with **Convolutional** Neural Networks.
This specific part of the exercise focuses on revealing that **deeper** CNNs *without* residual connections do not always work better, and **even deeper** ones *with* residual connections.

**Note**: MNIST is *very* easy to work on (at least up to about 99% accuracy), so I will work on **CIFAR10** from now on.

Launching the `model_pipeline` function with its proper configuration allows me to observe the performance of multiple kinds of Convolutional architectures.

The focus, here, is on playing with the total **depth** (i.e. the number of layers) of the network, while maintaining the general architecture untouched, in order to show that a **deeper** ConvNet provides better performances, **up to a certain depth (!)**.

All logs and trackings of my runs are available on Weights & Biases, at [this link](https://wandb.ai/giovancombo/DLA_Lab1_CNN?workspace=user-giovancombo).

...Well, as previously said, reaching a very high Validation Accuracy on **MNIST** is *very* easy.
Let's try then to train some ConvNets on the **CIFAR10** dataset.

-----
## Exercise 2: Choose at Least One

Let's now deepen our understanding of Deep Networks for visual recognition.

+ Firstly, I will find a quantitative answer about *how* and *why* Redidual Networks learn more efficiently than their Convolutional counterparts.
+ Secondly, I will become a *network surgeon*, trying to fully-convolutionalize a network by acting on its final layers.
+ Thirdly, I will try to implement *Class Activation Maps*, in order to see which parts of an image were the most decisive for its classification.

### Exercise 2.1: Explain why Residual Connections are so effective

The question *"Why Residual Networks learn more efficiently than Convolutional Networks?"* can find an answer by looking at the gradient magnitudes passing through the networks, during backpropagation.

`wandb.watch(log = "all")` tells *Weights & Biases* to log *gradients* and *parameters*' evolution in all the layers of the network. This functionality is useful to graphically visualize the concept of **Vanishing Gradients**.

For this exercise, I firstly tried to run a basic *MLP*, and then an *MLP with Residual Connections*. Honestly, at the time, I didn't think that this could be a very clever idea, since I've always seen Residuals been added only on Convolutional Networks, but... I decided to give it a try anyway.

As mentioned before, I compared these two architectures by challenging them on their performance over their **depth** (i.t. their number of layers).

A basic **10-layer MLP** is seen suffering from Vanishing Gradients, with its accuracy dropping all the way down to 10%, that means picking a class **by chance**.

As mentioned in the original [ResNet paper](https://arxiv.org/abs/1512.03385), a higher number of layers leads to not only higher validation loss, but also a *higher training loss*: this means that we are not facing overfitting, but in the "weird" behavior that a deeper model shows itself.

On the contrary, the **10-layer Residual MLP** performed well, confirming the explanation of ResNet authors: Residual Connections allow a network to go **a lot** deeper (with the only limitation of reaching overfitting).

The results can be quantitatively checked by observing the *W&B* logs about gradient magnitudes. The basic **MLP** shows gradients that are very close to zero, meaning that the model is not making any real progress.

Conversely, the **Residual MLP** showed gradients that did not vanish nor explode, and progressively diminishing their magnitude during training, meaning that the model is proceeding towards convergence on a (local, hopefully global) optimum.

In [194]:
trained_model = model_pipeline()

Model name: ./model/CNN/cnn-ep1-lr0.0005-bs2048-depth1
CNN(
  (act): ReLU()
  (convlayers): Sequential(
    (0): ConvBlock(
      (conv): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): ReLU()
    (2): Identity()
  )
  (fc): Sequential(
    (0): Dropout(p=0.4, inplace=False)
    (1): Linear(in_features=50176, out_features=10, bias=True)
  )
)
Number of parameters: 502538

CrossEntropyLoss()
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0005
    maximize: False
    weight_decay: 0.0001
)


  0%|          | 0/1 [00:00<?, ?it/s]

Epoch 1/1 | Training Loss after 20480 examples: 0.9132
Epoch 1/1 | Training Loss after 40960 examples: 0.4682


100%|██████████| 1/1 [00:40<00:00, 40.83s/it]

End of epoch 1 | Validation Accuracy: 89.55%; Validation Loss: 0.4045

End of TRAINING





End of TESTING | Accuracy on the 10000 test images: 89.98%; Test Loss: 0.3920


0,1
Epoch,▁▁▁
Test Accuracy,▁
Test Loss,▁
Train Accuracy,▁
Train Loss,▁
Training/Training Examples,▁█
Training/Training Loss,█▁
Validation Accuracy,▁
Validation Loss,▁

0,1
Epoch,1.0
Test Accuracy,89.98
Test Loss,0.39204
Train Accuracy,74.79411
Train Loss,1.02313
Training/Training Examples,40960.0
Training/Training Loss,0.46818
Validation Accuracy,89.55
Validation Loss,0.40451


The same behaviour can be detected while working on ConvNets and their Residual versions (check gradients on *W&B*).

### Exercise 2.2: Fully-convolutionalize a network.

I decided to save the best model trained so far, the **ResidualCNN** with (..config), and **fully-convolutionalize** it. That is, turn it into a network that can predict classification outputs at *all* pixels in an input image.

One goal of this eercise is trying to turn this into a **detector** of handwritten digits.

**Hint**: To test my fully-convolutionalized network, I might need to write some functions to take random MNIST samples and embed them into a larger image (i.e. in a regular grid or at random positions), in order to create examples on which train the network at *detecting* digits.

In [None]:
model = models.FullyCNN(...)

(Mostrare Plots di 3/4 immagini con detection effettuata)

The ConvNets built in the previous exercise have a global Average Pooling layer and a Fully Connected Layer at the end, in order to merge all infro from the convolutions in a single prediction for all the image, on the 10 MNIST/CIFAR10 classes.

In a Fully Convolutional Network, we need instead to produce a prediction for every single one of the 28x28 (32x32) pixels of an image. I then proceed to do a "network surgery", removing the two layers mentioned above and rearranging the net to have the dimension of the input image as output.

In [198]:
# FINE-TUNING DI UNA CNN CHE ADDESTRERO' IO --> LEVO GLI ULTIMI 2 LAYER E CI METTO LA CONVTRANSPOSE

# Loading a model in order to use one of those already trained in the exercises before
pt_model = torch.load(".\model\CNN\cnn-ep1-lr0.0005-bs2048-depth1.pt")
print(pt_model)

classes = 10
use_bn = True
activation = "ReLU"
# pt_model.head = nn.Sequential(ConvBlock(num_features, classes, 1, 1, 0, use_bn),
#                                     getattr(nn, activation)(),
#                                     nn.ConvTranspose2d(classes, classes, 5, 4, 1))



OrderedDict([('convlayers.0.conv.weight', tensor([[[[-0.1532,  0.0462,  0.2222],
          [ 0.2187,  0.0905,  0.2887],
          [ 0.0404, -0.1038,  0.0551]]],


        [[[ 0.0987,  0.1163,  0.2752],
          [-0.0669, -0.1039, -0.0536],
          [-0.1642, -0.0849,  0.0847]]],


        [[[-0.0128,  0.1220, -0.0878],
          [-0.0406, -0.1903,  0.1247],
          [ 0.2825,  0.0905,  0.3247]]],


        [[[-0.0213, -0.3222, -0.0479],
          [-0.0455,  0.2389,  0.1231],
          [-0.1706, -0.3178, -0.0145]]],


        [[[-0.1670, -0.2437, -0.1241],
          [ 0.1900, -0.1588,  0.2728],
          [ 0.1291,  0.2112, -0.0372]]],


        [[[-0.0977,  0.3187,  0.2365],
          [-0.0891,  0.0213, -0.0519],
          [ 0.2691,  0.1165, -0.0896]]],


        [[[-0.1329, -0.3168,  0.1288],
          [-0.2491,  0.2422,  0.2785],
          [ 0.2875, -0.2085, -0.2786]]],


        [[[-0.2725, -0.1047, -0.0910],
          [-0.2720,  0.0342, -0.1131],
          [ 0.0675, -0.2655,  0.0

### Exercise 2.3: *Explain* the predictions of a CNN

In order to predict the correct class of an image, a ConvNet exploits its "hierarchical" architecture to create feature maps at different layers of abstraction of information.

The composition of every bit of information extracted determines the whole set of details and peculiarities of an image that links it to a specific class.

A lot of work has been done in recent years to try to look inside the black box, and find a way to quantitatively *explain* how a prediction was made. One of these ways is to implement [*Class Activation Maps*](http://cnnlocalization.csail.mit.edu/#:~:text=A%20class%20activation%20map%20for,decision%20made%20by%20the%20CNN.):

> B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. CVPR'16 (arXiv:1512.04150, 2015).

Let's demonstrate how my trained CNN *attends* to specific image features to recognize *specific* classes.

For this task, I decided to borrow the code from this source (link), in order to try to apply CAMs to some CIFAR10 images.

Moreover, as a passionate photographer, since we're talking about images, I *HAD* to try to create CAMs of some of my favourite photographs. Here are some visual results!

In [None]:
import cv2
import numpy as np
import torch
from PIL import Image
from matplotlib import pyplot as plt
from torch.autograd import Variable
from torch.nn import functional as F
from torchvision import transforms
import torchvision.transforms.functional as TF

# 10 classes of CIFAR10
classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
cifar = True  # xxx else i have an high quality truck image
if cifar:
    image_idx = 18                          # indice dell'immagine da testare
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
    test_set = torchvision.datasets.CIFAR10(root = './data', train = False, download = True, transform = transform)
    image, label = test_set[image_idx]
    pil_image = TF.to_pil_image(image)

    # Display the image
    plt.imshow(pil_image)
    plt.show()

    # Save the image
    image_file = 'images/cifar_' + str(classes[label]) + '.jpg'
    pil_image.save(image_file)
    print("real label:", classes[label])
else:
    image_file = 'images/hd_truck.jpg'
    print("using hd image:", image_file)

# networks such as googlenet, resnet, densenet already use global average pooling at the end, so CAM could be used directly.

finalconv_name = "features"
# net = torch.load("./model/resnet_cnn-ep5-lr0.001-bs512-depth5-residual.pt")
net = torch.load("./model/resnet_to_convergence/cnn-ep5-lr0.004-bs64-depth25-residual.pt")
print(net)
net.eval()

# hook the feature extractor
features_blobs = []

def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

net._modules.get(finalconv_name).register_forward_hook(hook_feature)

# get the softmax weight
params = list(net.parameters())
weight_softmax = np.squeeze(params[-2].data.cpu().numpy())



# normalize = transforms.Normalize(
#     mean=[0.485, 0.456, 0.406],
#     std=[0.229, 0.224, 0.225]
# )
preprocess = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    # normalize
])

# load test image
img_pil = Image.open(image_file)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))
logit = net(img_variable.to('cuda'))

h_x = F.softmax(logit, dim=1).data.squeeze()
probs, idx = h_x.sort(0, True)
probs = probs.cpu().numpy()
idx = idx.cpu().numpy()

# output the prediction
for i in range(0, 10):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))

# generate class activation mapping for the top1 prediction
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])

# render the CAM and output
print('output CAM.jpg for the top1 prediction: %s' % classes[idx[0]])
img = cv2.imread(image_file)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
# result = heatmap * 0.3 + img * 0.5
result = heatmap * 0.4 + img * 0.5
if cifar:
    cv2.imwrite('images/CAM_cifar_' + str(classes[label]) + '_idx' + str(image_idx) + '_probs' + str(probs[0]) + '.jpg',
                result)
else:
    cv2.imwrite('images/CAM_hd_truck_probs' + str(probs[0]) + '.jpg', result)

In [456]:
# Defining some functions for plotting the Class Activation Maps

def returnCAM(feature_conv, weight_softmax, class_idx):
    # generate the class activation maps upsample to 256x256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    for idx in class_idx:
        cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h * w)))
        cam = cam.reshape(h, w)
        cam = cam - np.min(cam)
        cam_img = cam / np.max(cam)
        cam_img = np.uint8(255 * cam_img)
        output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam


def show_cam(CAMs, width, height, orig_image, class_idx, save_name):
    for i, cam in enumerate(CAMs):
        heatmap = cv2.applyColorMap(cv2.resize(cam,(width, height)), cv2.COLORMAP_JET)
        result = heatmap * 0.5 + orig_image * 0.5
        # put class label text on the result
        cv2.putText(result, str(int(class_idx[i])), (20, 40), 
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        cv2.imshow('CAM', result/255.)
        cv2.waitKey(0)
        cv2.imwrite(f"outputs/CAM_{save_name}.jpg", result)

In [None]:
# run for all the images in the `input` folder
for image_path in glob.glob('input/*'):
    # read the image
    image = cv2.imread(image_path)
    orig_image = image.copy()
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = np.expand_dims(image, axis=2)
    height, width, _ = orig_image.shape
    # apply the image transforms
    image_tensor = transform(image)
    # add batch dimension
    image_tensor = image_tensor.unsqueeze(0)
    # forward pass through model
    outputs = model(image_tensor)
    # get the softmax probabilities
    probs = F.softmax(outputs).data.squeeze()
    # get the class indices of top k probabilities
    class_idx = topk(probs, 1)[1].int()
    
    # generate class activation mapping for the top1 prediction
    CAMs = returnCAM(features_blobs[0], weight_softmax, class_idx)
    # file name to save the resulting CAM image with
    save_name = f"{image_path.split('/')[-1].split('.')[0]}"
    # show and save the results
    show_cam(CAMs, width, height, orig_image, class_idx, save_name)