# Basic tutorial: HPO and benchmarking
#### Author: Matteo Caorsi

This tutorial is focused around **hyperparameter optimisation** (HPO), a rather unique feature of giotto-deep compared to other deep-learning frameworks.

## Scope

Neural network are very complex beasts and it is not at all intuitive to understand whether the change in some of the structural parameters would lead to an improved performance and higher stability. For example, is it always better to increase the depth of a feed-forward network? The answer is, in general, "no": it really depends on the complexity of your problem (*VC dimension and friends*). However, given that it is many time impossible to compute a priori quantities like the [VC dimension](https://en.wikipedia.org/wiki/Vapnik–Chervonenkis_dimension), it is more effective to empirically try out different structural parameters and benchmark the results. This is what HPO is all about.

## Plan for the tutorial

The main steps of the tutorial are the following:
 1. Creation of a dataset
 2. Creation of a model
 3. HPO on the initial model and dataset
 4. Benchmarking
 5. HPO-ing in each benchmark

In [None]:
# imports and notebook auto-reloader
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import torch
from torch import nn
from torch.utils.tensorboard.writer import SummaryWriter
from torch.utils.data.sampler import SubsetRandomSampler
import torchvision.models as models
from gtda.diagrams import BettiCurve
from gtda.plotting import plot_betti_surfaces
from sklearn.model_selection import StratifiedKFold, KFold
import optuna
from torch.optim import SGD, Adam, RMSprop
from torch.optim.lr_scheduler import ExponentialLR

from gdeep.data.preprocessors import ToTensorImage
from gdeep.trainer import Trainer
from gdeep.data.datasets import DatasetBuilder, DataLoaderBuilder

# today's protagonists
from gdeep.search import Benchmark
from gdeep.search import HyperParameterOptimization, GiottoSummaryWriter
from gdeep.models import FFNet
from gdeep.visualization import persistence_diagrams_of_activations


# Initialize the tensorboard writer

In order to analyse the results of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualization results.

In this example, we use our modified version of the writer, as we believe it displays better results in the `hparams` dashboard.

In [None]:
writer = GiottoSummaryWriter()


# Create your dataset

In the next cell we subsample the [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and prepare the data loaders. Note that a preprocessing step is required to transformed the images into tensors.

In [None]:
# download the dataset
bd = DatasetBuilder(name="CIFAR10")
ds_tr, _, _ = bd.build(download=True)


In [None]:
# Preprocessing steps

transformation = ToTensorImage((32, 32))
transformation.fit_to_dataset(ds_tr)  # this is useless for this transformation

transformed_ds_tr = transformation.attach_transform_to_dataset(ds_tr)

# use only 320 images from cifar10
train_indices = list(range(32 * 10))
dl_tr, *_ = DataLoaderBuilder((transformed_ds_tr,)).build(
    ({"batch_size": 32, "sampler": SubsetRandomSampler(train_indices)},)
)


## Define your model

In the next section we build a torch model with a `str` parameter. The type of parameter can of course be changed to `int`: the example is to show the potential of the code.

In [None]:
# parametric model with string value
class model2(nn.Module):
    def __init__(self, n_nodes="100"):
        super(model2, self).__init__()
        self.md = nn.Sequential(
            nn.Sequential(
                models.resnet18(weights=True), nn.Linear(1000, eval(n_nodes))
            ),
            nn.Linear(eval(n_nodes), 10),
        )

    def forward(self, x):
        return self.md(x)


model = model2()


## Training without HPO

This step is the normal, non HPO, step that you would do to train your model. Starting from the next section on, we will dive into the HPO framework.

In [None]:
# initialise loss
loss_fn = nn.CrossEntropyLoss()

# initialise pipeline class
pipe = Trainer(
    model, [dl_tr], loss_fn, writer, k_fold_class=StratifiedKFold(2, shuffle=True)
)


In [None]:
# the following is a simple cross-validated training (no HPO)
# we also add the n_accumulated_grads=5, which is useful to avoid OOM results when training on the GPU
pipe.train(SGD, 2, True, {"lr": 0.001}, n_accumulated_grads=5)


# HyperParameter Optimization

One of the unique features of gotto-deep is the possibility to run advanced hyperparameters searches in a few lines of code: it is enough to define the dictionaries of hyperparameters, initialise the class `HyperParameterOptimization` and run it with the method `start`.
We run a search over different hyperparameters: 
 - the learning rate `lr`
 - the batch size `batch_size`
 - the network parameter `arch`
 
The scope of the search is to find the optimum set of hyperparameters. The "optimum" depends on either the accuracy (or the user-defined metric) or the loss.

In [None]:
# initialise gridsearch
search = HyperParameterOptimization(pipe, "accuracy", 2, best_not_last=True)

# if you want to store pickle files of the models instead of the state_dicts
search.store_pickle = True

# dictionaries of hyperparameters
optimizers_params = {"lr": [0.001, 0.01]}
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["200"]}

# starting the HPO
search.start(
    (SGD, Adam),
    3,
    False,
    optimizers_params,
    dataloaders_params,
    models_hyperparams,
    n_accumulated_grads=2,
)


In [None]:
print(
    "These are the best results we have found so far: ",
    search.best_val_acc_gs,
    search.best_val_loss_gs,
)


In [None]:
# get the results
df_res = search._results()
df_res


The line in the dataframe with the top accuracy contains the optimum hyperparameters. You can visualise them interactively in the the `HPARAMS` of the tesorboard.

In [None]:
# starting the gridsearch, this time with a LR scheduler

# here we wat to grid-search over the LR parameters as well!
schedulers_params = {"gamma": [0.5, 0.9]}

search.start(
    (SGD, Adam),
    2,
    False,
    dataloaders_params=dataloaders_params,
    models_hyperparams=models_hyperparams,
    lr_scheduler=ExponentialLR,
    schedulers_params=schedulers_params,
)


# Benchmarking

Benchmarking means fixing a set of models and a set of datasets and trying all possible pairs of *(model, dataset)*. The most common usecase is actually to also fix the model and to run it over many datasets.

Of course, only compatible models with compatiible datasets will be benchmarked.

Just to clarify further: at this stage, there is no hyperparameter search involved!

## Preparing multiple datasets

Store your different dataloaders into a dictionary for benchmarking: `dataloaders_dicts`

In [None]:
dataloaders_dicts = []
bd = DatasetBuilder(name="CIFAR10")

ds_tr, *_ = bd.build()
transformation = ToTensorImage((32, 32))

transformed_ds_tr = transformation.attach_transform_to_dataset(ds_tr)


test_indices = [64 * 5 + x for x in range(32 * 3)]
train_indices = [x for x in range(32 * 2)]

dl = DataLoaderBuilder((transformed_ds_tr, transformed_ds_tr))
dl_tr, dl_val, _ = dl.build(
    (
        {"batch_size": 32, "sampler": SubsetRandomSampler(train_indices)},
        {"batch_size": 32, "sampler": SubsetRandomSampler(test_indices)},
    )
)


temp_dict = {}
temp_dict["name"] = "CIFAR10_1000"
temp_dict["dataloaders"] = (dl_tr, dl_val, _)

dataloaders_dicts.append(temp_dict)

db = DatasetBuilder(name="DoubleTori")
ds_tr, ds_val, _ = db.build()

dl_tr, dl_ts, _ = DataLoaderBuilder((ds_tr, ds_val)).build(
    ({"batch_size": 48}, {"batch_size": 32})
)

temp_dict = {}
temp_dict["name"] = "double_tori"
temp_dict["dataloaders"] = (dl_tr,)

dataloaders_dicts.append(temp_dict)


## Preparing multiple models
Store your different models into a dictionary for benchmarking: `models_dicts`

In [None]:
models_dicts = []

model = model2()

temp_dict = {}
temp_dict["name"] = "resnet18"
temp_dict["model"] = model

models_dicts.append(temp_dict)

# avoid having exposed paramters that wll not be gridsearched on
class model_no_param(nn.Module):
    def __init__(self):
        super(model_no_param, self).__init__()
        self.mod = FFNet([3, 5, 5, 2])

    def forward(self, x):
        return self.mod(x)


model5 = model_no_param()
temp_dict = {}
temp_dict["name"] = "ffnn"
temp_dict["model"] = model5

models_dicts.append(temp_dict)


## Start the benchmarking!

After initialising the class with the dictionaries of models and dataloaders, we can run the actual benchmark.

In [None]:
# initialise the benchmarking class. When we do not specify it, it will use KFold with 5 splits
bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer)

# start the benchmarking
bench.start(SGD, 2, False, {"lr": 0.01}, {"batch_size": 32}, n_accumulated_grads=2)


# Benchmarking + HyperParameter Optimization + CV

In this last section we consider the possibility of running an HPO within each pair *(model, dataset)*.

This can be achieved by initialising a benchmark class and use the benchmark as input for the gridsearch class.

With these commands, we are basically looking for the best set of hyperparamets for each pair of *(model, dataset)*.

In [None]:
# standard pytorch loss
loss_fn = nn.CrossEntropyLoss()

# initialise benchmark
bench = Benchmark(
    models_dicts, dataloaders_dicts, loss_fn, writer, k_fold_class=KFold(3)
)

# initialise gridsearch with benchmark instance
search2 = HyperParameterOptimization(bench, "loss", 2)

# yperparameters
optimizers_params = {"lr": [0.001, 0.01, None, True]}  # to have the log sampler
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["500", "200"]}
search2.start(
    (SGD, Adam), 2, True, optimizers_params, dataloaders_params, models_hyperparams
)

writer.close()  # let's not forget to close the tensorboard writer once all is done


In [None]:
print(
    "Best validation accuracy: ",
    search2.best_val_acc_gs,
    "\nBest validation loss value: ",
    search2.best_val_loss_gs,
)


## Custom pruner and sampler

It is possible to pass to the HyperParameterOptimization class a customer `optuna.Pruners` and `optuna.Samplers`.

The pruner is used to stop a trial when it is clearly not reaching acceptable performances (hence sparing a bit of computational costs), while a sampler is used to sample the hyperparameter space using different techniques.

In [None]:
from optuna.pruners import MedianPruner
from optuna.samplers import TPESampler

# initialise te HPO
gs = HyperParameterOptimization(
    pipe,
    "accuracy",
    5,
    best_not_last=False,
    pruner=MedianPruner(
        n_startup_trials=2, n_warmup_steps=0, interval_steps=1, n_min_trials=1
    ),
    sampler=TPESampler(),
)

# dictionaries of hyperparameters
optimizers_params = {"lr": [0.001, 0.01]}
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["500", "200"]}

# starting the HPO
gs.start(
    (SGD, Adam),
    3,
    False,
    optimizers_params,
    dataloaders_params,
    models_hyperparams,
    n_accumulated_grads=2,
)
