***
## Learning without Forgetting - LwF

Experiment reproducing **Learning without Forgetting** method:  
hybrid of Distillation Networks and fine-tuning, which refers to the re-training with a low learning rate an already trained model M with new and more specific dataset, D<sub>new</sub>, with respect to the dataset, D<sub>old</sub>, with which the given model M was originally trained.

LwF, as opposed to other continual learning techniques, only uses the new data, so it assumes that past data used to pre-train the network is unavailable.  
It is a *transfer learning technique*.

`References:`
- Learning without Forgetting: https://arxiv.org/abs/1606.09282
- Three scenarios for continual learning: https://arxiv.org/abs/1904.07734
***

### Arguments to customize training
import argparse
parser = argparse.ArgumentParser()

parser.add_argument(**'--lwf_alpha'**, nargs='+', type=float,
                    default=[0, 0.5, 1.333, 2.25, 3.2],
                    help='Penalty hyperparameter for LwF. It can be either
                    a list with multiple elements (one alpha per 
                    experience) or a list of one element (same alpha 
                    for all experiences).')

parser.add_argument(**'--softmax_temperature'**, type=float, default=1,
                    help='Temperature for softmax used in distillation')

parser.add_argument(**'--lr'**, type=float, default=1e-3, help='Learning rate.')

parser.add_argument(**'--hs'**, type=int, default=256, help='MLP hidden size.')

parser.add_argument(**'--epochs'**, type=int, default=10,
                    help='Number of training epochs.')

parser.add_argument(**'--minibatch_size'**, type=int, default=128,
                    help='Minibatch size.')

parser.add_argument(**'--cuda'**, type=int, default=0,
                    help='Specify GPU id to use. Use CPU if -1.')

args = parser.parse_args()
***

In [1]:
# Testing framework and test runner
import unittest

# Avalanche library
import avalanche
from avalanche.evaluation import metrics as metrics

import torch
from torch.nn import CrossEntropyLoss
from torch.optim import Adam, SGD

# NN model (must be a torch.nn.model)
from models import MLP 

from utils import create_default_args, get_average_metric, get_target_result


class LwF(unittest.TestCase): #TestCase class

    ## ------- Split MNIST benchmark ------- ##
    def test_smnist(self, override_args=None):
       
        # --- Strategy instantiation --- # 
        
        # -> ADDITIONAL ARGUMENTS allow to customize training
        args = create_default_args({'cuda': 0,              # GPU
                                    'lwf_alpha': 1,         # Penalty hyperparameter for LwF
                                    'lwf_temperature': 1,   # Temperature for softmax used in distillation
                                    'epochs': 10,           # Training epochs
                                    'layers': 1,            # 
                                    'hidden_size': 256,     # MLP hidden size
                                    'learning_rate': 0.001, # Learning rate
                                    'train_mb_size': 128}, override_args) # Minibatch size

        # Set up and run CUDA operations,
        # if CUDA is available, utilize GPUs for computation.
        device = torch.device(f"cuda:{args.cuda}"
                              if torch.cuda.is_available() and args.cuda >= 0 
                              else "cpu")
         
        # -> BENCHMARK
        # stream composed of 5 experiences from SplitMNIST dataset
        benchmark = avalanche.benchmarks.SplitMNIST(5, return_task_id=False)
        
        # -> NN MODEL and LOSS FUNCTION
        model = MLP(hidden_size=args.hidden_size, hidden_layers=args.layers)
        criterion = CrossEntropyLoss()

        # Avalanche logging module, displays a progress bar during training and evaluation
        interactive_logger = avalanche.logging.InteractiveLogger()
        
        # -> EVALUATION PLUGIN
        # Metrics of main interest to be tracked,
        evaluation_plugin = avalanche.training.plugins.EvaluationPlugin(
            metrics.accuracy_metrics(epoch=True, experience=True, stream=True),
            loggers=[interactive_logger], benchmark=benchmark)

        # -> CONTINUAL LEARNING STRATEGY: LwF
        cl_strategy = avalanche.training.LwF(model,
                                             SGD(model.parameters(), lr=args.learning_rate), # optimizer
                                             criterion,
                                             # additional arguments
                                             alpha=args.lwf_alpha, 
                                             temperature=args.lwf_temperature,
                                             train_mb_size=args.train_mb_size, 
                                             train_epochs=args.epochs,
                                             device=device,
                                             # evaluation
                                             evaluator=evaluation_plugin)
        
        # --- Training loop --- #
        for experience in benchmark.train_stream:
            # train
            cl_strategy.train(experience)
            # test (returns a dictionary which contains all the metric values)
            res = cl_strategy.eval(benchmark.test_stream)

        # access the metric values
        avg_stream_acc = get_average_metric(res)
        print(f"LwF-SMNIST Average Stream Accuracy: {avg_stream_acc:.2f}")

        # compare the metric with the target values
        target_acc = float(get_target_result('lwf', 'smnist'))
        if args.check:
            self.assertAlmostEqual(target_acc, avg_stream_acc, delta=0.02) #delta is too small??

### Run and Evaluate the experiment
- Create an instance of the strategy object
- Execute the strategy on a benchmark

In [2]:
# Create the strategy
s = LwF()

# Run the experiment with custom parameters
s.test_smnist()

-- >> Start of training phase << --
-- Starting training on experience 0 (Task 0) from train stream --
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 31.89it/s]
Epoch 0 ended.
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.8388
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 32.56it/s]
Epoch 1 ended.
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9663
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 32.54it/s]
Epoch 2 ended.
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9719
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 32.58it/s]
Epoch 3 ended.
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9750
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 32.32it/s]
Epoch 4 ended.
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9768
100%|███████████████████████████████████████████| 92/92 [00:02<00:00, 32.49it/s]
Epoch 5 ended.
	Top1_Acc_Epoch/train_phase/tra

100%|███████████████████████████████████████████| 16/16 [00:00<00:00, 34.93it/s]
> Eval on experience 1 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp001 = 0.6655
-- Starting eval on experience 2 (Task 0) from test stream --
100%|███████████████████████████████████████████| 15/15 [00:00<00:00, 34.12it/s]
> Eval on experience 2 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp002 = 0.9495
-- Starting eval on experience 3 (Task 0) from test stream --
100%|███████████████████████████████████████████| 17/17 [00:00<00:00, 34.79it/s]
> Eval on experience 3 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp003 = 0.0000
-- Starting eval on experience 4 (Task 0) from test stream --
100%|███████████████████████████████████████████| 16/16 [00:00<00:00, 35.24it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.0000
-- >> End of eval phas

AssertionError: 0.31 != 0.3879 within 0.02 delta (0.07790000000000002 difference)