***
## Training

The `training` module in Avalanche includes two main components:

- Strategies: popular baselines already implemented for you which you can use for comparisons or as base classes to define a custom strategy.

- Plugins: classes that allow to add some specific behaviour to your own strategy. 

The plugin system allows to define reusable components which can be easily combined together (e.g. a replay strategy, a regularization strategy). They are also used to automatically manage logging and evaluation.

Keep in mind that many Avalanche components are independent from Avalanche strategies.

### Strategy Instantiation
Most strategies require only 3 mandatory arguments:
- **model:** this must be a torch.nn.Module.
- **optimizer:** torch.optim.Optimizer already initialized on your model.
- **loss:** a loss function such as those in torch.nn.functional.

Additional arguments are optional and allow to customize training (batch size, epochs, ...) or strategy specific parameters (buffer size, regularization strength, ...).

In [3]:
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.models import SimpleMLP
from avalanche.training.strategies import Naive, CWRStar, Replay, GDumb, Cumulative, LwF, GEM, AGEM, EWC

model = SimpleMLP(num_classes=10)

optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)

criterion = CrossEntropyLoss()

cl_strategy = Naive(model, optimizer, criterion, train_mb_size=100, train_epochs=4, eval_mb_size=100)#finetuning

### Training and Evaluation
Each strategy object offers two main methods: train and eval. 
Both of them, accept either a single experience or a list of them, for maximum flexibility.

We can train the model continually by iterating over the train_stream provided by the scenario

In [4]:
from avalanche.benchmarks.classic import SplitMNIST

# scenario
benchmark = SplitMNIST(n_experiences=5, seed=1)

# TRAINING LOOP
print('Starting experiment...')
results = []

for experience in benchmark.train_stream:
    print("Start of experience: ", experience.current_experience)
    print("Current Classes: ", experience.classes_in_this_experience)

    cl_strategy.train(experience)
    print('Training completed')

    print('Computing accuracy on the whole test set')
    results.append(cl_strategy.eval(benchmark.test_stream))

Starting experiment...
Start of experience:  0
Current Classes:  [5, 6]
-- >> Start of training phase << --
-- Starting training on experience 0 (Task 0) from train stream --
100%|█████████████████████████████████████████| 114/114 [00:03<00:00, 35.62it/s]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.4051
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.8865
100%|█████████████████████████████████████████| 114/114 [00:03<00:00, 36.47it/s]
Epoch 1 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.1014
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9712
100%|█████████████████████████████████████████| 114/114 [00:03<00:00, 36.59it/s]
Epoch 2 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.0851
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9742
100%|█████████████████████████████████████████| 114/114 [00:03<00:00, 37.61it/s]
Epoch 3 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.0761
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 

100%|███████████████████████████████████████████| 21/21 [00:00<00:00, 40.61it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Loss_Exp/eval_phase/test_stream/Task000/Exp004 = 7.5538
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.0000
-- >> End of eval phase << --
	Loss_Stream/eval_phase/test_stream/Task000 = 5.7476
	Top1_Acc_Stream/eval_phase/test_stream/Task000 = 0.1940
Start of experience:  3
Current Classes:  [9, 3]
-- >> Start of training phase << --
-- Starting training on experience 3 (Task 0) from train stream --
100%|█████████████████████████████████████████| 121/121 [00:03<00:00, 36.59it/s]
Epoch 0 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.8612
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.8137
100%|█████████████████████████████████████████| 121/121 [00:03<00:00, 36.39it/s]
Epoch 1 ended.
	Loss_Epoch/train_phase/train_stream/Task000 = 0.1199
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.9646
100%|████████████████████████████████

***
## Evaluation

The `evaluation` module offers all the basic functionalities to evaluate and keep track of a continual learning experiment.

This is mostly done through the Metrics: a set of classes which implement the main continual learning metrics computation like Accuracy_, Forgetting_, Memory Usage_, Running Times_, etc. At the moment, in Avalanche we offer a number of pre-implemented metrics you can use for your own experiments. 

We made sure to include all the major accuracy-based metrics but also the ones related to computation and memory.

Each metric comes with a standalone class and a set of plugin classes aimed at emitting metric values on specific moments during training and evaluation.

In [9]:
import torch
from avalanche.evaluation.metrics import Accuracy

task_labels = 0  # work with a single task

# create an instance of the standalone Accuracy metric
# initial accuracy is 0 for each task
acc_metric = Accuracy()
print("Initial Accuracy: ", acc_metric.result()) #  output {}

# two consecutive metric updates
real_y = torch.tensor([1, 2]).long()
predicted_y = torch.tensor([1, 0]).float()

# monitor the average accuracy over a stream of <input, target> pairs
acc_metric.update(real_y, predicted_y, task_labels) # .update = update the current average accuracuy
acc = acc_metric.result() # .result = print the current average accuracy

print("Average Accuracy: ", acc) # output 0.5 on task 0

predicted_y = torch.tensor([1, 2]).float()
acc_metric.update(real_y, predicted_y, task_labels)
acc = acc_metric.result()
print("Average Accuracy: ", acc) # output 0.75 on task 0

# reset accuracy
acc_metric.reset() # .reset to set the current average accuracy to zero
print("After reset: ", acc_metric.result()) # output {}

Initial Accuracy:  {}
Average Accuracy:  {0: 0.5}
Average Accuracy:  {0: 0.75}
After reset:  {}


### Plugin metrics 
If you want to integrate the available metrics automatically in the training and evaluation flow, you can use plugin metrics, like EpochAccuracy which logs the accuracy after each training epoch, or ExperienceAccuracy which logs the accuracy after each evaluation experience. 

Each of these metrics emits a curve composed by its values at different points in time (e.g. on different training epochs). 

In order to simplify the use of these metrics, we provided utility functions with which you can create different plugin metrics in one shot. The results of these functions can be passed as parameters directly to the EvaluationPlugin.

In [10]:
from avalanche.evaluation.metrics import accuracy_metrics, \
    loss_metrics, forgetting_metrics, bwt_metrics,\
    confusion_matrix_metrics, cpu_usage_metrics, \
    disk_usage_metrics, gpu_usage_metrics, MAC_metrics, \
    ram_usage_metrics, timing_metrics

# you may pass the result to the EvaluationPlugin
metrics = accuracy_metrics(epoch=True, experience=True)

### Evaluation plugin
The Evaluation Plugin is the object in charge of configuring and controlling the evaluation procedure. This object can be passed to a Strategy as a "special" plugin through the evaluator attribute.

The Evaluation Plugin accepts as inputs the plugin metrics you want to track. In addition, you can add one or more loggers to print the metrics in different ways (on file, on standard output, on Tensorboard...).

It is also recommended to pass to the Evaluation Plugin the benchmark instance used in the experiment. This allows the plugin to check for consistency during metrics computation. For example, the Evaluation Plugin checks that the strategy.eval calls are performed on the same stream or sub-stream. Otherwise, same metric could refer to different portions of the stream.
These checks can be configured to raise errors (stopping computation) or only warnings.

In [11]:
from torch.nn import CrossEntropyLoss
from torch.optim import SGD
from avalanche.benchmarks.classic import SplitMNIST
from avalanche.evaluation.metrics import forgetting_metrics, \
accuracy_metrics, loss_metrics, timing_metrics, cpu_usage_metrics, \
confusion_matrix_metrics, disk_usage_metrics
from avalanche.models import SimpleMLP
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin
from avalanche.training.strategies import Naive

benchmark = SplitMNIST(n_experiences=5)

# MODEL CREATION
model = SimpleMLP(num_classes=benchmark.n_classes)

# DEFINE THE EVALUATION PLUGIN
# The evaluation plugin manages the metrics computation.
# It takes as argument a list of metrics, collectes their results and returns 
# them to the strategy it is attached to.

eval_plugin = EvaluationPlugin(
    accuracy_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    loss_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    timing_metrics(epoch=True),
    forgetting_metrics(experience=True, stream=True),
    cpu_usage_metrics(experience=True),
    confusion_matrix_metrics(num_classes=benchmark.n_classes, save_image=False, stream=True),
    disk_usage_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    loggers=[InteractiveLogger()],
    benchmark=benchmark,
    strict_checks=False
)

# CREATE THE STRATEGY INSTANCE (NAIVE)
cl_strategy = Naive(
    model, SGD(model.parameters(), lr=0.001, momentum=0.9),
    CrossEntropyLoss(), train_mb_size=500, train_epochs=1, eval_mb_size=100,
    evaluator=eval_plugin)

# TRAINING LOOP
print('Starting experiment...')
results = []

for experience in benchmark.train_stream:
    # train returns a dictionary which contains all the metric values
    res = cl_strategy.train(experience)
    print('Training completed')

    print('Computing accuracy on the whole test set')
    # test also returns a dictionary which contains all the metric values
    results.append(cl_strategy.eval(benchmark.test_stream))

Starting experiment...
-- >> Start of training phase << --
-- Starting training on experience 0 (Task 0) from train stream --
100%|███████████████████████████████████████████| 24/24 [00:02<00:00,  8.46it/s]
Epoch 0 ended.
	DiskUsage_Epoch/train_phase/train_stream/Task000 = 67667.5186
	DiskUsage_MB/train_phase/train_stream/Task000 = 67667.5186
	Loss_Epoch/train_phase/train_stream/Task000 = 1.1961
	Loss_MB/train_phase/train_stream/Task000 = 0.2875
	Time_Epoch/train_phase/train_stream/Task000 = 2.8364
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.7228
	Top1_Acc_MB/train_phase/train_stream/Task000 = 0.9755
-- >> End of training phase << --
Training completed
Computing accuracy on the whole test set
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|███████████████████████████████████████████| 20/20 [00:00<00:00, 37.10it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	CPUUsage_Exp/eval_phase/test_stream/Task000/Exp000 = 100.

	DiskUsage_Epoch/train_phase/train_stream/Task000 = 67667.5186
	DiskUsage_MB/train_phase/train_stream/Task000 = 67667.5186
	Loss_Epoch/train_phase/train_stream/Task000 = 2.0560
	Loss_MB/train_phase/train_stream/Task000 = 0.5735
	Time_Epoch/train_phase/train_stream/Task000 = 2.8085
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.4405
	Top1_Acc_MB/train_phase/train_stream/Task000 = 0.9159
-- >> End of training phase << --
Training completed
Computing accuracy on the whole test set
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|███████████████████████████████████████████| 20/20 [00:00<00:00, 37.63it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	CPUUsage_Exp/eval_phase/test_stream/Task000/Exp000 = 100.6872
	DiskUsage_Exp/eval_phase/test_stream/Task000/Exp000 = 67667.5186
	ExperienceForgetting/eval_phase/test_stream/Task000/Exp000 = 0.9837
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 2.9410
	Top1_Acc_Exp/eval_phase/

	DiskUsage_MB/train_phase/train_stream/Task000 = 67667.5186
	Loss_Epoch/train_phase/train_stream/Task000 = 2.1456
	Loss_MB/train_phase/train_stream/Task000 = 1.0290
	Time_Epoch/train_phase/train_stream/Task000 = 2.7978
	Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.3330
	Top1_Acc_MB/train_phase/train_stream/Task000 = 0.8589
-- >> End of training phase << --
Training completed
Computing accuracy on the whole test set
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
100%|███████████████████████████████████████████| 20/20 [00:00<00:00, 37.76it/s]
> Eval on experience 0 (Task 0) from test stream ended.
	CPUUsage_Exp/eval_phase/test_stream/Task000/Exp000 = 100.7431
	DiskUsage_Exp/eval_phase/test_stream/Task000/Exp000 = 67667.5186
	ExperienceForgetting/eval_phase/test_stream/Task000/Exp000 = 0.9822
	Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 2.6043
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp000 = 0.0015
-- Starting eval on experie

#### Accessing metric values
If you want to access all the metrics computed during training and evaluation, you have to make sure that collect_all=True is set when creating the EvaluationPlugin (default option is True). This option maintains an updated version of all metric results in the plugin, which can be retrieved by calling evaluation_plugin.get_all_metrics(). You can call this methods whenever you need the metrics.

The result is a dictionary with full metric names as keys and a tuple of two lists as values. The first list stores all the x values recorded for that metric. Each x value represents the time step at which the corresponding metric value has been computed. The second list stores metric values associated to the corresponding x value.

In [12]:
eval_plugin2 = EvaluationPlugin(
    accuracy_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    loss_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    forgetting_metrics(experience=True, stream=True),
    timing_metrics(epoch=True),
    cpu_usage_metrics(experience=True),
    confusion_matrix_metrics(num_classes=benchmark.n_classes, save_image=False, stream=True),
    disk_usage_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    collect_all=True, # this is default value anyway
    loggers=[InteractiveLogger()],
    benchmark=benchmark
)

# since no training and evaluation has been performed, this will return an empty dict.
metric_dict = eval_plugin2.get_all_metrics()
print(metric_dict)

defaultdict(<function EvaluationPlugin.__init__.<locals>.<lambda> at 0x7f9501f95ee0>, {})
