# Basic tutorial: gridsearch and benchmarking
#### Author: Matteo Caorsi

This short tutorial provides you with the basic functioning of *giotto-deep* API.

The main steps of the tutorial are the following:
 1. creation of a dataset
 2. creation of a model
 3. gridsearch on the initial model and dataset
 4. benchmarking
 5. gridsearching in each benchmark

In [None]:
# imports and notebook auto-reloader
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np

import torch
from torch import nn

from gdeep.models import FFNet

from gdeep.visualisation import  persistence_diagrams_of_activations

from torch.utils.tensorboard import SummaryWriter
from torch.utils.data.sampler import SubsetRandomSampler
import torchvision.models as models

from torch.optim import SGD, Adam, RMSprop
from torch.optim.lr_scheduler import ExponentialLR
from gdeep.pipeline import Pipeline
from gdeep.data.datasets import TorchDataLoader
# today's protagonists
from gdeep.search import Benchmark
from gdeep.search import Gridsearch, GiottoSummaryWriter

from gtda.diagrams import BettiCurve

from gtda.plotting import plot_betti_surfaces

from sklearn.model_selection import StratifiedKFold, KFold

import optuna

# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.

In this example, we use our modified version of the writer, as we believe it displays better results in the `hparams` dashboard.

In [None]:
writer = GiottoSummaryWriter()

# Create your dataset

In the next cell we subsample the `CIFAR10` dataset and prepare the data loaders

In [None]:
dl = TorchDataLoader(name="CIFAR10")
train_indices = list(range(32*10))

print(len(train_indices))

dl_tr, dl_temp = dl.build_dataloaders(batch_size=32, 
                                      sampler=SubsetRandomSampler(train_indices))

print(len(dl_tr))

test_indices = [32*10 + x for x in list(range(32*2))]

dl_ts, dl_temp = dl.build_dataloaders(batch_size=25, sampler=SubsetRandomSampler(test_indices))

dl_val = dl_ts

print(len(dl_ts))

## Define your model

In the next section we build a torchh model with a `str` parameter. The type of parameter can of course be changed to `int`: the example is to show the potential of the code.

In [None]:
# parametric model with string value
class model2(nn.Module):
    def __init__(self, n_nodes = "100"):
        super(model2, self).__init__()
        self.md = nn.Sequential(nn.Sequential(models.resnet18(pretrained=True), 
                                                  nn.Linear(1000,eval(n_nodes))), 
                                    nn.Linear(eval(n_nodes),10))
    
    def forward(self, x):
        return self.md(x)
    
    
model = model2()


In [None]:
# initialise loss
loss_fn = nn.CrossEntropyLoss()

# initialise pipeline class
pipe = Pipeline(model, [dl_tr, dl_val, dl_ts], loss_fn, writer, StratifiedKFold(2, shuffle=True))

In [None]:
# the following is a simple cross-validated training (no Gridsearch)

# we also add the n_accumulated_grads=5, which is useful to avoid OOM results when training on the GPU
pipe.train(SGD, 2, True, {"lr": 0.001}, n_accumulated_grads=5)

# Gridsearch

We run a gridsearch over different hyperparametrs: 
 - the learning rate `lr`
 - the batch size `batch_size`
 - the network parameter `arch`
 
The scope of the gridsearch is to find the optimum set of hyperparameters.

In [None]:

# initialise gridsearch
search = Gridsearch(pipe, "accuracy", 2, best_not_last=True)

# if you want to store pickle files of the models instead of the state_dicts
search.store_pickle = True

# dictionaries of hyperparameters
optimizers_params = {"lr": [0.001, 0.01]}
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["200"]}

# starting the gridsearch
search.start((SGD, Adam), 3, False, optimizers_params, dataloaders_params, models_hyperparams, n_accumulated_grads=2)

In [None]:
print(search.best_val_acc_gs, search.best_val_loss_gs)

In [None]:
# get the results
df_res = search._results()
df_res

The line in the dataframe with the top accuracy contains the optimum hyperparameters. You can visualise them interactively in the the `HPARAMS` of the tesorboard.

In [None]:
# starting the gridsearch, this time with a LR scheduler

# here we wat to grid-search over the LR parameters as well!
schedulers_params = {"gamma": [0.5, 0.9]}

search.start((SGD, Adam), 2, False, 
             dataloaders_params=dataloaders_params, 
             models_hyperparams=models_hyperparams,
             lr_scheduler=ExponentialLR, schedulers_params=schedulers_params)

# Benchmarking

Benchmarking means fixing a set of models and a set of datasets and trying all possible pairs of *(model, dataset)*. 

Of course, only compatible models with compatiible datasets will be benchmarked.

Just to clarify further: at this stage, there is no hyperparameter search involved!

## Preparing multiple datasets

Store your different dataloaders into a dictionary for benchmarking: `dataloaders_dicts`

In [None]:
dataloaders_dicts = []
dl = TorchDataLoader(name="CIFAR10")

train_indices = list(range(64*5))

dl_tr, dl_temp = dl.build_dataloaders(batch_size=32, sampler=SubsetRandomSampler(train_indices))

test_indices = [64*5 + x for x in list(range(64))]

dl_ts, dl_temp = dl.build_dataloaders(batch_size=32, sampler=SubsetRandomSampler(test_indices))

temp_dict = {}
temp_dict["name"] = "CIFAR10_1000"
temp_dict["dataloaders"] = (dl_tr, dl_ts)

dataloaders_dicts.append(temp_dict)

dl = TorchDataLoader(name="DoubleTori")
dl_tr, dl_ts = dl.build_dataloaders(batch_size=48)

temp_dict = {}
temp_dict["name"] = "double_tori"
temp_dict["dataloaders"] = (dl_tr, dl_ts)

dataloaders_dicts.append(temp_dict)

## Preparing multiple models
Store your different models into a dictionary for benchmarking: `models_dicts`

In [None]:
models_dicts = []

model = model2()

temp_dict = {}
temp_dict["name"] = "resnet18"
temp_dict["model"] = model

models_dicts.append(temp_dict)

# avoid having exposed paramters that wll not be gridsearched on
class model_no_param(nn.Module):
    def __init__(self):
        super(model_no_param, self).__init__()
        self.mod = FFNet([3,5,5,2])
        
    def forward(self, x):
        return self.mod(x)

model5 = model_no_param()
temp_dict = {}
temp_dict["name"] = "ffnn"
temp_dict["model"] = model5

models_dicts.append(temp_dict)

## Start the benchmarking!

after initialising th class with the dictionaries of models and dataloaders, we can run the actual benchmark.

In [None]:
# initialise the benchmarking class. When we do not specify it, it will use KFold with 5 splits
bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer)

# start the benchmarking
bench.start(SGD, 2, False, {"lr" : 0.01}, {"batch_size" : 32}, n_accumulated_grads=2)

# Benchmarking + Gridsearch + CV

In this last section we consider the possibility of running a gridsearch within each pair *(model, dataset)*.

This can be achieved by initialising a benchmark class and use the benchmark as input for the gridsearch class.

With these commands, we are basically looking for the best set of hyperparamets for each pair of *(model, dataset)*.

In [None]:
# standard pytorch loss
loss_fn = nn.CrossEntropyLoss()

# initialise benchmark
bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer, KFold(3))

# initialise gridsearch with benchmark instance
search2 = Gridsearch(bench, "loss", 2)

# yperparameters
optimizers_params = {"lr": [0.001, 0.01, None, True]}  # to have the log sampler
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["500", "200"]}
search2.start((SGD, Adam), 2, True, optimizers_params, dataloaders_params, models_hyperparams)

writer.close()  # let's recall to close the tensorboard writer once all is done

In [None]:
print("Best validation accuracy: ", 
      search2.best_val_acc_gs, 
      "\nBest validation loss value: ", 
      search2.best_val_loss_gs)

## Custom pruner and sampler

It is possible to pass to the gridsearch class a customer `optuna.Pruners` and `optuna.Samplers`

In [None]:
from optuna.pruners import MedianPruner
from optuna.samplers import TPESampler

# initialise gridsearch
gs = Gridsearch(pipe, "accuracy", 8, best_not_last=False, pruner=MedianPruner(n_startup_trials=2,
                                                                              n_warmup_steps=0,
                                                                              interval_steps=1,
                                                                              n_min_trials=1), sampler=TPESampler())

# dictionaries of hyperparameters
optimizers_params = {"lr": [0.001, 0.01]}
dataloaders_params = {"batch_size": [32, 64, 16]}
models_hyperparams = {"n_nodes": ["500", "200"]}

# starting the gridsearch
gs.start((SGD, Adam), 3, False, optimizers_params, dataloaders_params, models_hyperparams, n_accumulated_grads=2)