<a href="https://colab.research.google.com/github/sudimuk2017/AI-for-people-Workshop/blob/master/04_representation_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Incremental learning on image classification

Today's practical session is an exploratory session designed to introduce you to some of the logging and visualisation techniques used by ML researchers to ensure that experiements are working well and for comparig new ideas to baselines, etc.

The code comes from the [github repository](https://github.com/manuelemacchia/incremental-learning-image-classification) attached to the literary analysis paper of the same name which presents ideas in representation and incremental learning as of CVPR 2017.

There will be very little coding today, so the emphasis will be on using a few tools to analyze whether or not new techniques are working and if they are, in fact, performing better than baseline.

The tools we will explore are:
- tensorboard: this package was devloped by google for tensorflow, though it works equally well with PyTroch. It allows users to log training data and watch the logs change during the training procedure. This package has been baked into the original notebook and will run during training so you can watch the loss and accuracy change over training.
- [tSNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding): Similar to PCA, ICA, and AutoEncoders, this tool is used to reduce the dimmensionality of a set of inputs for better visualisation. Since the goal of incremental learning is to stop the model from forgetting what it has already learned, tSNE can be used to ensure that the representations of old classes remain similar after training. **<ins>You will need to find the three lines in each execution cell and comment them back in in order to enable tSNE,</ins>** but once you do, the full trianing round will create a png image of the 4 training splits, and we will ask you to analyze these plots for better understanding.
- Confusion Matrices: These tools are incredibly valuable for showing classifier accuracy. The premise is to create a grid where the y dimmension represents the ground truth, and the x simmension represents predicted classes, and the values in each cell (apart from the diagonals) show misclassification (and the diagonals show correct clasifications). Using this plot, we can visualise the effect of catastrophic forgetting. **<ins>This code is included as well, but you will once again need to find where it is called in each executino block and uncomment it.</ins>**

As mentioned, the coding for this task is quite straight forward, so we want you to come at this as if it is your own project and you are trying to make sure that the results of your experiment are correct.

That and, of course, **<ins>we have snuck in a tiny bit of sabotage to the training of one of the methods, use the tools provided to determine where ths error is, and correct it using details from the other models.</ins>**

**<ins>Keep an eye out for the insight questions asked in certain sections of the notebook - discuss them with your neighbors and see if you can convince them and yourself of the answers using both qualitative and quantitative information from the notebook.</ins>**

## Libraries and packages


In [None]:
import os
import urllib
import logging
from importlib import reload
import time

import torch
import torchsummary
import torch.nn as nn
import torch.nn.init as init
import torch.optim as optim
from torch.utils.data import Dataset, Subset, DataLoader, ConcatDataset
from torch.backends import cudnn

import torchvision
from torchvision import transforms
from torchvision.models import resnet34

from PIL import Image
from copy import deepcopy

import numpy as np

import pandas as pd

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

%load_ext tensorboard

from torch.utils.tensorboard import SummaryWriter

In [None]:
# Download packages from repository
!git clone https://github.com/manuelemacchia/incremental-learning-image-classification.git
!mv -v incremental-learning-image-classification/* .
!rm -rf incremental-learning-image-classification README.md
!gdown https://drive.google.com/drive/folders/10LKjtXO6ep8R2vQ6IG8B8Rb6mt94JS4N?usp=sharing -O /content/code_edits --folder
!cp -f /content/code_edits/icarl.py /content/model/icarl.py
!cp -f /content/code_edits/lwf.py /content/model/lwf.py
!cp -f /content/code_edits/manager.py /content/model/manager.py
!cp -f /content/code_edits/plot.py /content/utils/plot.py
!cp -f /content/code_edits/tsne.py /content/tsne.py
!rm -rf /content/code_edits

In [None]:
# This code block imports the modules included in the git repo, which is why it has to come after the git extraction
import model
from data.cifar100 import Cifar100
from model.resnet_cifar import resnet32
from model.manager import Manager
from model.lwf import LWF
from model.icarl import Exemplars
from model.icarl import iCaRL
from utils import plot
from tsne import FeatureTSNE

In [None]:
ls

## Arguments

These are constants, they will also not produce any output.

In [None]:
# Directories
DATA_DIR = 'data'       # Directory where the dataset will be downloaded

# Settings
DEVICE = 'cuda'

# Dataset
#RANDOM_STATES = [658, 423, 422]      # For reproducibility of results
RANDOM_STATES = [658]                # Note: different random states give very different
                                     # splits and therefore very different results.

NUM_CLASSES = 100       # Total number of classes

VAL_SIZE = 0.1          # Proportion of validation set with respect to training set (between 0 and 1)

# Training
BATCH_SIZE = 64         # Batch size (iCaRL sets this to 128)
LR = 0.1                 # Initial learning rate
# Hmmmmm....?


MOMENTUM = 0.9          # Momentum for stochastic gradient descent (SGD)
WEIGHT_DECAY = 1e-5     # Weight decay from iCaRL

NUM_RUNS = 1            # Number of runs of every method
                        # Note: this should be at least 3 to have a fair benchmark

NUM_EPOCHS = 35         # Total number of training epochs
MILESTONES = [24, 31]   # Step down policy from iCaRL (MultiStepLR)
                        # Decrease the learning rate by gamma at each milestone
GAMMA = 0.2             # Gamma factor from iCaRL

NUM_SPLITS = 4          # Number of splits (maximum is 10)

## Fine tuning

### Data preparation

The next two blocks are used to instnatiate the several training and testing splits needed for the experiments. The training sets each contain ten different classes, while the test sets include every class from the current and previous splits.

In [None]:
# Define transformations for training
train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor(), # Turn PIL Image to torch.Tensor
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Define transformations for evaluation
test_transform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

In [None]:
train_dataloaders = [[] for i in range(NUM_RUNS)]
val_dataloaders = [[] for i in range(NUM_RUNS)]
test_dataloaders = [[] for i in range(NUM_RUNS)]

for run_i in range(NUM_RUNS):
    test_subsets = []
    random_state = RANDOM_STATES[run_i]

    for split_i in range(10):
        # Download dataset only at first instantiation
        if run_i+split_i == 0:
            download = True
        else:
            download = False

        # Create CIFAR100 dataset
        train_dataset = Cifar100(DATA_DIR, train=True, download=download, random_state=random_state, transform=train_transform)
        test_dataset = Cifar100(DATA_DIR, train=False, download=False, random_state=random_state, transform=test_transform)

        # Subspace of CIFAR100 of 10 classes
        train_dataset.set_classes_batch(train_dataset.batch_splits[split_i])
        test_dataset.set_classes_batch([test_dataset.batch_splits[i] for i in range(0, split_i+1)])

        # Define train and validation indices
        train_indices, val_indices = train_dataset.train_val_split(VAL_SIZE, random_state)

        train_dataloaders[run_i].append(DataLoader(Subset(train_dataset, train_indices),
                                                   batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True))

        val_dataloaders[run_i].append(DataLoader(Subset(train_dataset, val_indices),
                                                 batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True))

        # Dataset with all seen class
        test_dataloaders[run_i].append(DataLoader(test_dataset,
                                                  batch_size=BATCH_SIZE, shuffle=True, num_workers=2))


In [None]:
# Sanity check: visualize a batch of images
dataiter = test_dataloaders[0][0] # The first index controls which run (these vary by random seeds). The second index controls the split, play with this to see how we add classes per split.
for images, labels in dataiter:
    plot.image_grid(images, one_channel=False)
    print(set(labels.tolist()))
    break

### Execution

To log the output the below box will be constantly printing the epochs and their stats above the tensorboard subwindow. Click the gear icon in the top right and check the "Reload data" box. As soon as the training starts the box below will show the training progress. If it doesn't show the traning progress, click the refresh button in the top right.

In [None]:
%tensorboard --logdir=runs --reload_interval=30

This block actually trains and tests the model. The progress will be shown above. This loop is quite important.

**TASK**: Try to understand what is happening and find the lines that are needed to output the tSNE and confusion matrix plots.

In [None]:
logs = [[] for _ in range(NUM_RUNS)]
for run_i in range(NUM_RUNS):
    net = resnet32()

    criterion = nn.BCEWithLogitsLoss()
    featureTSNE_ft = FeatureTSNE('Fine_Tuning',DEVICE)
    for split_i in range(NUM_SPLITS):
        writer = SummaryWriter(f'runs/Fine_Tuning_{split_i}')
        print(f"## Split {split_i} of run {run_i} ##")
        start = time.time()
        parameters_to_optimize = net.parameters()
        optimizer = optim.SGD(parameters_to_optimize, lr=LR, momentum=MOMENTUM, weight_decay=WEIGHT_DECAY)
        scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=MILESTONES, gamma=GAMMA)

        manager = Manager(DEVICE, net, criterion, optimizer, scheduler,
                          train_dataloaders[run_i][split_i],
                          val_dataloaders[run_i][split_i],
                          test_dataloaders[run_i][split_i], writer)

        scores = manager.train(NUM_EPOCHS)  # train the model

        logs[run_i].append({})

        # score[i] = dictionary with key:epoch, value: score
        logs[run_i][split_i]['train_loss'] = scores[0]
        logs[run_i][split_i]['train_accuracy'] = scores[1]
        logs[run_i][split_i]['val_loss'] = scores[2]
        logs[run_i][split_i]['val_accuracy'] = scores[3]

        # Test the model on classes seen until now
        test_accuracy, all_targets, all_preds = manager.test()

        logs[run_i][split_i]['test_accuracy'] = test_accuracy
        logs[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))

        featureTSNE_ft.get_feature_reps(manager.test_dataloader,manager.net,split_i)

        print(f'Confusion Matrix: Split {split_i}')
        plot.heatmap_cm(all_targets.cpu(), all_preds.cpu()) #the confusion matrix

        # Add 10 nodes to last FC layer
        manager.increment_classes(n=10)
        print('Total Split Duration: ', time.time() - start)
    featureTSNE_ft.plot_tsne_grids() ## For all tSNE plots

**INSIGHT:**  The training and validation accuracy seem to go up after each split, why is the model getting better even though it's seeing new classes?

**INSIGHT:** How do the confusion matrices after split 0 prove that there is catastrophic forgetting?

**INSIGHT:** What do the tSNE plots for Fine Tuning imply about the feature representations learned by the model? (Use the colours of the points to find where old class representations end up in newer splits)

## Plot

In [None]:
# This block simply collates and prepares the training and validation logs (separate ones form tensorboards) for plotting.
train_loss = [[logs[run_i][i]['train_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
train_accuracy = [[logs[run_i][i]['train_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_loss = [[logs[run_i][i]['val_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_accuracy = [[logs[run_i][i]['val_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
test_accuracy = [[logs[run_i][i]['test_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]

train_loss = np.array(train_loss)
train_accuracy = np.array(train_accuracy)
val_loss = np.array(val_loss)
val_accuracy = np.array(val_accuracy)
test_accuracy = np.array(test_accuracy)

train_loss_stats = np.array([train_loss.mean(0), train_loss.std(0)]).transpose()
train_accuracy_stats = np.array([train_accuracy.mean(0), train_accuracy.std(0)]).transpose()
val_loss_stats = np.array([val_loss.mean(0), val_loss.std(0)]).transpose()
val_accuracy_stats = np.array([val_accuracy.mean(0), val_accuracy.std(0)]).transpose()
test_accuracy_stats = np.array([test_accuracy.mean(0), test_accuracy.std(0)]).transpose()

And these next two plot the train/val and test stats

In [None]:
plot.train_val_scores(train_loss_stats, train_accuracy_stats, val_loss_stats, val_accuracy_stats)

In [None]:
plot.test_scores(test_accuracy_stats)

## Learning Without Forgetting

The code following this uses the same structure as in the Fine Tuning. Data prep will produce no output, ensure the necessary lines are included in the execution block, watch the training carefully, and plot all the stats, confusion matrix, and tSNE plots.

### Data preparation

In [None]:
# Transformations for Learning Without Forgetting
train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor(), # Turn PIL Image to torch.Tensor
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

In [None]:
train_dataloaders = [[] for i in range(NUM_RUNS)]
val_dataloaders = [[] for i in range(NUM_RUNS)]
test_dataloaders = [[] for i in range(NUM_RUNS)]

for run_i in range(NUM_RUNS):
    test_subsets = []
    random_state = RANDOM_STATES[run_i]

    for split_i in range(NUM_SPLITS):
        # Download dataset only at first instantiation
        if run_i+split_i == 0:
            download = True
        else:
            download = False

        # Create CIFAR100 dataset
        train_dataset = Cifar100(DATA_DIR, train=True, download=download, random_state=random_state, transform=train_transform)
        test_dataset = Cifar100(DATA_DIR, train=False, download=False, random_state=random_state, transform=test_transform)

        # Subspace of CIFAR100 of 10 classes
        train_dataset.set_classes_batch(train_dataset.batch_splits[split_i])
        test_dataset.set_classes_batch([test_dataset.batch_splits[i] for i in range(0, split_i+1)])

        # Define train and validation indices
        train_indices, val_indices = train_dataset.train_val_split(VAL_SIZE, random_state)

        train_dataloaders[run_i].append(DataLoader(Subset(train_dataset, train_indices),
                                                   batch_size=BATCH_SIZE, shuffle=True, num_workers=4, drop_last=True))

        val_dataloaders[run_i].append(DataLoader(Subset(train_dataset, val_indices),
                                                 batch_size=BATCH_SIZE, shuffle=True, num_workers=4, drop_last=True))

        # Dataset with all seen class
        test_dataloaders[run_i].append(DataLoader(test_dataset,
                                                  batch_size=BATCH_SIZE, shuffle=True, num_workers=4))

In [None]:
# Sanity check: visualize a batch of images
dataiter = test_dataloaders[0][0] # The first index controls which run (these vary by random seeds). The second index controls the split, play with this to see how we add classes per split.
for images, labels in dataiter:
    plot.image_grid(images, one_channel=False)
    print(set(labels.tolist()))
    break

### Execution

In [None]:
# Arguments for Learning without Forgetting
BATCH_SIZE = 128
LR = 2
%tensorboard --logdir=runs

In [None]:
logs = [[] for _ in range(NUM_RUNS)]
# Iterate over runs
for run_i in range(NUM_RUNS):
    net = resnet32()

    criterion = nn.BCEWithLogitsLoss()
    #featureTSNE_lwf = FeatureTSNE('LWF',DEVICE)

    for split_i in range(NUM_SPLITS):
        print(f"## Split {split_i} of run {run_i} ##")
        writer = SummaryWriter(f'runs/LWF_{split_i}')
        start = time.time()
        # Redefine optimizer at each split (pass by reference issue)
        parameters_to_optimize = net.parameters()
        optimizer = optim.SGD(parameters_to_optimize, lr=LR,
                                momentum=MOMENTUM, weight_decay=WEIGHT_DECAY)
        scheduler = optim.lr_scheduler.MultiStepLR(optimizer,
                                                    milestones=MILESTONES, gamma=GAMMA)

        num_classes = 10*(split_i+1)

        if num_classes == 10: # old network == None
            lwf = LWF(DEVICE, net, None, criterion, optimizer, scheduler,
                            train_dataloaders[run_i][split_i],
                            val_dataloaders[run_i][split_i],
                            test_dataloaders[run_i][split_i],
                            num_classes,writer)
        else:
            lwf = LWF(DEVICE, net, old_net, criterion, optimizer, scheduler,
                            train_dataloaders[run_i][split_i],
                            val_dataloaders[run_i][split_i],
                            test_dataloaders[run_i][split_i],
                            num_classes,writer)

        scores = lwf.train(NUM_EPOCHS)  # train the model

        logs[run_i].append({})

        # score[i] = dictionary with key:epoch, value: score
        logs[run_i][split_i]['train_loss'] = scores[0]
        logs[run_i][split_i]['train_accuracy'] = scores[1]
        logs[run_i][split_i]['val_loss'] = scores[2]
        logs[run_i][split_i]['val_accuracy'] = scores[3]

        # Test the model on classes seen until now
        test_accuracy, all_targets, all_preds = lwf.test()
        #featureTSNE_lwf.get_feature_reps(lwf.test_dataloader,lwf.net,split_i)

        logs[run_i][split_i]['test_accuracy'] = test_accuracy
        logs[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))

        print(f'Confusion Matrix: Split {split_i}')
        plot.heatmap_cm(all_targets.cpu(), all_preds.cpu())

        old_net = deepcopy(lwf.net)

        lwf.increment_classes()
        print('Total Split Duration: ', time.time() - start)
    featureTSNE_lwf.plot_tsne_grids()

**INSIGHT:** The confusion matrices for LWF look much better, but there is still a problem. What happens as we see more and more classes?

**INSIGHT:** The tSNE plots for LWF show a slightly different modification from split to split than the Fine Tuning plots do, what does this new insight imply about the feature representations? (bare in mind that feature represenations for similar classes would be similar if we used all the data)

### Plots

In [None]:
train_loss = [[logs[run_i][i]['train_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
train_accuracy = [[logs[run_i][i]['train_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_loss = [[logs[run_i][i]['val_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_accuracy = [[logs[run_i][i]['val_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
test_accuracy = [[logs[run_i][i]['test_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]

train_loss = np.array(train_loss)
train_accuracy = np.array(train_accuracy)
val_loss = np.array(val_loss)
val_accuracy = np.array(val_accuracy)
test_accuracy = np.array(test_accuracy)

train_loss_stats = np.array([train_loss.mean(0), train_loss.std(0)]).transpose()
train_accuracy_stats = np.array([train_accuracy.mean(0), train_accuracy.std(0)]).transpose()
val_loss_stats = np.array([val_loss.mean(0), val_loss.std(0)]).transpose()
val_accuracy_stats = np.array([val_accuracy.mean(0), val_accuracy.std(0)]).transpose()
test_accuracy_stats = np.array([test_accuracy.mean(0), test_accuracy.std(0)]).transpose()

In [None]:
plot.train_val_scores(train_loss_stats, train_accuracy_stats, val_loss_stats, val_accuracy_stats)

In [None]:
plot.test_scores(test_accuracy_stats)

## iCaRL

The code following this uses the same structure as in the Fine Tuning and LWF. Data prep will produce no output, ensure the necessary lines are included in the execution block, watch the training carefully, and plot all the stats, confusion matrix, and tSNE plots.

### Data preparation

In [None]:
# Transformations for Learning Without Forgetting
train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4),
                                      transforms.RandomHorizontalFlip(),
                                      transforms.ToTensor(), # Turn PIL Image to torch.Tensor
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

In [None]:
train_subsets = [[] for i in range(NUM_RUNS)]
val_subsets = [[] for i in range(NUM_RUNS)]
test_subsets = [[] for i in range(NUM_RUNS)]

for run_i in range(NUM_RUNS):
    for split_i in range(10):
        if run_i+split_i == 0: # Download dataset only at first instantiation
            download = True
        else:
            download = False

        # Create CIFAR100 dataset
        train_dataset = Cifar100(DATA_DIR, train=True, download=download, random_state=RANDOM_STATES[run_i], transform=train_transform)
        test_dataset = Cifar100(DATA_DIR, train=False, download=False, random_state=RANDOM_STATES[run_i], transform=test_transform)

        # Subspace of CIFAR100 of 10 classes
        train_dataset.set_classes_batch(train_dataset.batch_splits[split_i])
        test_dataset.set_classes_batch([test_dataset.batch_splits[i] for i in range(0, split_i+1)])

        # Define train and validation indices
        train_indices, val_indices = train_dataset.train_val_split(VAL_SIZE, RANDOM_STATES[run_i])

        # Define subsets
        train_subsets[run_i].append(Subset(train_dataset, train_indices))
        val_subsets[run_i].append(Subset(train_dataset, val_indices))
        test_subsets[run_i].append(test_dataset)

### Execution

In [None]:
# iCaRL hyperparameters
LR = 2
MOMENTUM = 0.9
WEIGHT_DECAY = 0.00001
MILESTONES = [24, 31]
GAMMA = 0.2
NUM_EPOCHS = 35
BATCH_SIZE = 64
%tensorboard --logdir=runs

In [None]:
# Define what tests to run
TEST_ICARL = True # Run test with iCaRL (exemplars + train dataset)
TEST_HYBRID1 = False # Run test with hybrid1

# Initialize logs
logs_icarl = [[] for _ in range(NUM_RUNS)]
logs_hybrid1 = [[] for _ in range(NUM_RUNS)]
for run_i in range(NUM_RUNS):
    net = resnet32()
    icarl = iCaRL(DEVICE, net, LR, MOMENTUM, WEIGHT_DECAY, MILESTONES, GAMMA, NUM_EPOCHS, BATCH_SIZE, train_transform, test_transform)
    featureTSNE_ic = FeatureTSNE('iCARL',DEVICE)
    for split_i in range(NUM_SPLITS):
        print(f"## Split {split_i} of run {run_i} ##")
        writer = SummaryWriter(f'runs/iCaRL_{split_i}')
        start = time.time()
        icarl.writer = writer
        icarl.train_step = 0
        train_logs = icarl.incremental_train(split_i, train_subsets[run_i][split_i], val_subsets[run_i][split_i])

        if TEST_ICARL:
            logs_icarl[run_i].append({})

            acc, all_targets, all_preds = icarl.test(test_subsets[run_i][split_i], train_subsets[run_i][split_i])

            print(f'Confusion Matrix: Split {split_i}')
            plot.heatmap_cm(all_targets.cpu(), all_preds.cpu())

            featureTSNE_ic.get_feature_reps(icarl.test_dataloader,icarl.net,split_i)

            logs_icarl[run_i][split_i]['accuracy'] = acc
            logs_icarl[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))

            logs_icarl[run_i][split_i]['train_loss'] = train_logs[0]
            logs_icarl[run_i][split_i]['train_accuracy'] = train_logs[1]
            logs_icarl[run_i][split_i]['val_loss'] = train_logs[2]
            logs_icarl[run_i][split_i]['val_accuracy'] = train_logs[3]

        if TEST_HYBRID1:
            logs_hybrid1[run_i].append({})

            acc, all_targets, all_preds = icarl.test_without_classifier(test_subsets[run_i][split_i])

            logs_hybrid1[run_i][split_i]['accuracy'] = acc
            logs_hybrid1[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))
        print('Total Split Duration: ', time.time() - start)
    featureTSNE_ic.plot_tsne_grids()

In [None]:
# Define what tests to run
TEST_ICARL = True # Run test with iCaRL (exemplars + train dataset)
TEST_HYBRID1 = False # Run test with hybrid1

# Initialize logs
logs_icarl = [[] for _ in range(NUM_RUNS)]
logs_hybrid1 = [[] for _ in range(NUM_RUNS)]
for run_i in range(NUM_RUNS):
    net = resnet32()
    icarl = iCaRL(DEVICE, net, LR, MOMENTUM, WEIGHT_DECAY, MILESTONES, GAMMA, NUM_EPOCHS, BATCH_SIZE, train_transform, test_transform)
    featureTSNE_ic = FeatureTSNE('iCARL',DEVICE)
    for split_i in range(NUM_SPLITS):
        print(f"## Split {split_i} of run {run_i} ##")
        writer = SummaryWriter(f'runs/iCaRL_{split_i}')
        start = time.time()
        icarl.writer = writer
        icarl.train_step = 0
        train_logs = icarl.incremental_train(split_i, train_subsets[run_i][split_i], val_subsets[run_i][split_i])

        if TEST_ICARL:
            logs_icarl[run_i].append({})

            acc, all_targets, all_preds = icarl.test(test_subsets[run_i][split_i], train_subsets[run_i][split_i])

            print(f'Confusion Matrix: Split {split_i}')
            plot.heatmap_cm(all_targets.cpu(), all_preds.cpu())

            featureTSNE_ic.get_feature_reps(icarl.test_dataloader,icarl.net,split_i)

            logs_icarl[run_i][split_i]['accuracy'] = acc
            logs_icarl[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))

            logs_icarl[run_i][split_i]['train_loss'] = train_logs[0]
            logs_icarl[run_i][split_i]['train_accuracy'] = train_logs[1]
            logs_icarl[run_i][split_i]['val_loss'] = train_logs[2]
            logs_icarl[run_i][split_i]['val_accuracy'] = train_logs[3]

        if TEST_HYBRID1:
            logs_hybrid1[run_i].append({})

            acc, all_targets, all_preds = icarl.test_without_classifier(test_subsets[run_i][split_i])

            logs_hybrid1[run_i][split_i]['accuracy'] = acc
            logs_hybrid1[run_i][split_i]['conf_mat'] = confusion_matrix(all_targets.to('cpu'), all_preds.to('cpu'))
        print('Total Split Duration: ', time.time() - start)
    featureTSNE_ic.plot_tsne_grids()

**INSIGHT:** iCaRL produces very high quality confusion matrices, but what could be done to improve them even more? (this is not a trick question, just remember that we are working with very limited resources)

**INSIGHT:** The tSNE plots also show a lot of a promise for iCaRL. Looking particularly at the plot for split 3, what characteristic of the plot shows us the feature representations are high quality?

**INSIGHT:** The training process for iCaRL and LWF have an additional loss term called disillation loss. How is this additional loss term reflected in the training and validation curves?

### Plots

In [None]:
train_loss = [[logs_icarl[run_i][i]['train_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
train_accuracy = [[logs_icarl[run_i][i]['train_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_loss = [[logs_icarl[run_i][i]['val_loss'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
val_accuracy = [[logs_icarl[run_i][i]['val_accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]
test_accuracy = [[logs_icarl[run_i][i]['accuracy'] for i in range(NUM_SPLITS)] for run_i in range(NUM_RUNS)]

train_loss = np.array(train_loss)
train_accuracy = np.array(train_accuracy)
val_loss = np.array(val_loss)
val_accuracy = np.array(val_accuracy)
test_accuracy = np.array(test_accuracy)

train_loss_stats = np.array([train_loss.mean(0), train_loss.std(0)]).transpose()
train_accuracy_stats = np.array([train_accuracy.mean(0), train_accuracy.std(0)]).transpose()
val_loss_stats = np.array([val_loss.mean(0), val_loss.std(0)]).transpose()
val_accuracy_stats = np.array([val_accuracy.mean(0), val_accuracy.std(0)]).transpose()
test_accuracy_stats = np.array([test_accuracy.mean(0), test_accuracy.std(0)]).transpose()

In [None]:
plot.train_val_scores(train_loss_stats, train_accuracy_stats, val_loss_stats, val_accuracy_stats)

In [None]:
plot.test_scores(test_accuracy_stats)