# Avalanche tutorial

**experience** = Set composed of one or multiple samples which can be used to update the model (oftern referred as batch or task)

***
## Modules

Every continual learning algorithm needs a model to train incrementally.

The `models` sub-module provides for you the most commonly used architectures in the CL literature.

In [None]:
from avalanche.models import SimpleCNN
from avalanche.models import SimpleMLP
from avalanche.models import SimpleMLP_TinyImageNet
from avalanche.models import MobilenetV1

model = SimpleCNN()

print(model)

A continual learning model may change over time. As an example, a classifier may add new units for previously unseen classes, while progressive networks add a new set units after each experience. Avalanche provides `DynamicModules` to support these use cases. 

`DynamicModules` are torch.nn.Modules that provide an addition method, adaptation, that is used to update the model's architecture. The method takes a single argument, the data from the current experience.

In [None]:
# Import benchmark
from avalanche.benchmarks import SplitMNIST

# Import model
from avalanche.models import IncrementalClassifier

# Stream composed of 5 experiences from SplitMNIST dataset
benchmark = SplitMNIST(n_experiences=5, shuffle=False)

model = IncrementalClassifier(in_features=784)
print(model)

for exp in benchmark.train_stream:
    model.adaptation(exp.dataset)
    # After each call to the adaption method, the model adds 2 new units to account for new classes.
    # NO learning occurs at this point, the method only modifies the model's architecture.
    print(model)
    
# When you use Avalanche strategies you don't have to call the adaptation yourself. 
# Avalanche strategies automatically call the model's adaptation and update the optimizer to include new parameters.

***
## Benchmarks
The `benchmark` module generates the data stream that the targeted system, powered by a CL strategy, is required to learn from experiences in order to improve its performance or expand its set of capabilities.

### Dataset 
Collection of examples that can be used for training or testing purposes, not already organized to be processed as a stream of batches or tasks.

In [None]:
import torch
import torchvision

# Import datasets
from avalanche.benchmarks.datasets import MNIST

#-- Create train and test sets from the imported datasets --#

# Create TRAIN set from MNIST dataset
train_MNIST = MNIST('./data/mnist', train=True, download=True, transform=torchvision.transforms.ToTensor())

# Create TEST  set from MNIST dataset
test_MNIST  = MNIST('./data/mnist', train=False, download=True, transform=torchvision.transforms.ToTensor())

# Iterate on the generated set to get the examples one by one
for i, example in enumerate(train_MNIST):
    pass
print("Num. examples processed: {}".format(i))

# or use a Pytorch DataLoader
train_loader = torch.utils.data.DataLoader(train_MNIST, batch_size=32, shuffle=True)
for i, (x, y) in enumerate(train_loader):
    pass
print("Num. mini-batch processed: {}".format(i))

### Benchmarks basics
The Avalanche benchmarks (instances of the Scenario class), contains several attributes that characterize the benchmark. However, the most important ones are the train and test streams.

In Avalanche we often suppose to have access to these two parallel stream of data (even though some benchmarks may not provide such feature, but contain just a unique test set).

Each of these streams are iterable, indexable and sliceable objects that are composed of unique experiences. Experiences are batch of data (or "tasks") that can be provided with or without a specific task label.

`Efficiency`: All the data belonging to a stream are not loaded into the RAM beforehand. Avalanche actually loads the data when a specific mini-batches are requested at training/test time based on the policy defined by each Dataset implementation. This means that memory requirements are very low, while the speed is guaranteed by a multi-processing data loading system based on the one defined in Pytorch.

`Scenarios`: Particular setting, specificities about the continual stream of data, a continual learning algorithm will face; each scenario object in Avalanche has several useful attributes that characterizes the benchmark, including the two important train and test streams. 

In [None]:
from avalanche.benchmarks.classic import SplitMNIST

split_mnist = SplitMNIST(n_experiences=5, seed=1) #seed is the starting point for the sequence

# Original train/test sets
print('--- Original datasets:')
print(split_mnist.original_train_dataset)
print(split_mnist.original_test_dataset)

# A list describing which training patterns are assigned to each experience.
# Patterns are identified by their id w.r.t. the dataset found in the
# original_train_dataset field.
print('--- Train patterns assignment:')
print(split_mnist.train_exps_patterns_assignment)

# A list describing which test patterns are assigned to each experience.
# Patterns are identified by their id w.r.t. the dataset found in the
# original_test_dataset field
print('--- Test patterns assignment:')
print(split_mnist.test_exps_patterns_assignment)

# the task label of each experience.
print('--- Task labels:')
print(split_mnist.task_labels)

# train and test streams
print('--- Streams:')
print(split_mnist.train_stream)
print(split_mnist.test_stream)

# A list that, for each experience (identified by its index/ID),
# stores a set of the (optionally remapped) IDs of classes of patterns
# assigned to that experience.
print('--- Classes in each experience:')
split_mnist.classes_in_experience

#### Train and Test streams
The train and test streams can be used for training and testing purposes, respectively.

In [None]:
# each stream has a name: "train" or "test"
train_stream = split_mnist.train_stream
print(train_stream.name)

# we have access to the scenario from which the stream was taken
train_stream.benchmark

# we can slice and reorder the stream as we like!
substream = train_stream[0]
substream = train_stream[0:2]
substream = train_stream[0,2,1]

len(substream)

#### Experiences
Each stream can in turn be treated as an iterator that produces a unique experience, containing all the useful data regarding a batch or task in the continual stream our algorithms will face.

In [None]:
# we get the first experience
experience = train_stream[0]

# task label and dataset are the main attributes
t_label = experience.task_label
dataset = experience.dataset

# but you can recover additional info
experience.current_experience
experience.classes_in_this_experience
experience.classes_seen_so_far
experience.previous_classes
experience.future_classes
experience.origin_stream
experience.benchmark

# As always, we can iterate over it normally or with a pytorch data loader.
# For instance, we can use tqdm to add a progress bar.
from tqdm import tqdm
for i, data in enumerate(tqdm(dataset)):
  pass

print("\nNumber of examples:", i + 1)
print("Task Label:", t_label)

### Classic Benchmarks
Now that we know how our benchmarks work in general through scenarios, streams and experiences objects, in this section we are going to explore common benchmarks already available for you with one line of code yet flexible enough to allow proper tuning based on your needs.

Many of the classic benchmarks will download the original datasets they are based on automatically and put it under the "~/.avalanche/data" directory.

In [None]:
from avalanche.benchmarks.classic import CORe50, SplitTinyImageNet,   \
SplitCIFAR10, SplitCIFAR100, SplitCIFAR110, SplitMNIST, RotatedMNIST, \
PermutedMNIST, SplitCUB200, SplitImageNet

# creating PermutedMNIST (Task-Incremental)
perm_mnist = PermutedMNIST(n_experiences=2,seed=1234)

In [None]:
## -- Test of the classic PermutedMNIST benchmark -- ##

# creating the benchmark instance (scenario object)
perm_mnist = PermutedMNIST(n_experiences=3, seed=1234)

# recovering the train and test streams
train_stream = perm_mnist.train_stream
test_stream  = perm_mnist.test_stream

# iterating over the train stream
for experience in train_stream:
  print("Start of task ", experience.task_label)
  print('Classes in this task:', experience.classes_in_this_experience)

  # The current Pytorch training set can be easily recovered through the experience
  current_training_set = experience.dataset
  # ...as well as the task_label
  print('Task {}'.format(experience.task_label))
  print('This task contains', len(current_training_set), 'training examples')

  # we can recover the corresponding test experience in the test stream
  current_test_set = test_stream[experience.current_experience].dataset
  print('This task contains', len(current_test_set), 'test examples')
  print('\n')