# Basic tutorial: gridsearch and benchmarking
#### Author: Matteo Caorsi

This short tutorial provides you with the basic functioning of *giotto-deep* API.

The main steps of the tutorial are the following:
 1. creation of a dataset
 2. creation of a model
 3. define metrics and losses
 4. run benchmarks
 5. visualise results interactively

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np

import torch
from torch import nn

from gdeep.models import FFNet

from gdeep.visualisation import  persistence_diagrams_of_activations

from torch.utils.tensorboard import SummaryWriter
from gdeep.data import TorchDataLoader


from gtda.diagrams import BettiCurve

from gtda.plotting import plot_betti_surfaces

import optuna

# Initialize the tensorboard writer

In order to analyse the reuslts of your models, you need to start tensorboard.
On the terminal, move inside the `/example` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualisation results.

In [2]:
writer = SummaryWriter()

# Create your dataset

In [3]:
from torch.utils.data.sampler import SubsetRandomSampler

dl = TorchDataLoader(name="CIFAR10")
train_indices = list(range(32*10))

print(len(train_indices))

dl_tr, dl_temp = dl.build_dataloader(batch_size=32, 
                                     sampler=SubsetRandomSampler(train_indices))

print(len(dl_tr))

test_indices = [32*10 + x for x in list(range(32*2))]

dl_ts, dl_temp = dl.build_dataloader(batch_size=32, sampler=SubsetRandomSampler(test_indices))

dl_val = dl_ts

print(len(dl_ts))

320
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=0.0, max=170498071.0), HTML(value='')))


Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified
10
Files already downloaded and verified
Files already downloaded and verified
2


## Define and train your model

In [4]:
import torchvision.models as models
from gdeep.pipeline import Pipeline

model = nn.Sequential(models.resnet18(pretrained=True), nn.Linear(1000,10))

In [5]:
from torch.optim import SGD, Adam, RMSprop
from gdeep.search import gridsearch


loss_fn = nn.CrossEntropyLoss()

pipe = Pipeline(model, [dl_tr, dl_val, dl_ts], loss_fn, writer)

# train the model
pipe.train(SGD, 1, cross_validation = True, batch_size = 512, lr=0.01)



Dataset CIFAR10
    Number of datapoints: 50000
    Root location: data
    Split: Train
    StandardTransform
Transform: ToTensor()
TOTAL EPOCHS  1
Epoch 1
-------------------------------
Training loss: 2.423478  [10/10]
Time taken for this epoch: 3s
Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000094 

Test results: 
 Accuracy: 0.0%,                 Avg loss: 0.000097 

Done!


# Gridsearch

In [6]:
from gdeep.search.gridsearch import Gridsearch
from torch.optim import SGD, Adam, RMSprop

loss_fn = nn.CrossEntropyLoss()

pipe = Pipeline(model, [dl_tr, dl_val, dl_ts], loss_fn, writer)

search = Gridsearch(pipe, "loss", 2)
search.start([SGD, Adam], 1, lr=(0.001, 0.01))

[32m[I 2021-10-04 01:14:31,328][0m A new study created in memory with name: no-name-fb95ce46-e507-47ec-b63a-77d109334a47[0m


Epoch 1
-------------------------------
Training loss: 39.277275  [ 9/10]

[32m[I 2021-10-04 01:14:35,838][0m Trial 0 finished with value: 19.257434844970703 and parameters: {'optimizer': 'Adam', 'lr': 0.009843888435088329}. Best is trial 0 with value: 19.257434844970703.[0m


Training loss: 19.257435  [10/10]
Time taken for this epoch: 4s
Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.001017 

Done!
Epoch 1
-------------------------------
Training loss: 15.517584  [ 9/10]

[32m[I 2021-10-04 01:14:39,472][0m Trial 1 finished with value: 5.438361644744873 and parameters: {'optimizer': 'SGD', 'lr': 0.007229603306934275}. Best is trial 1 with value: 5.438361644744873.[0m


Training loss: 5.438362  [10/10]
Time taken for this epoch: 3s
Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000586 

Done!
Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  5.438361644744873


In [7]:
df_res = search.results()
df_res

Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  5.438361644744873


Unnamed: 0,model,dataset,optimizer,lr,Metric value
0,model,dataset,Adam,19.257435,
1,model,dataset,Adam,0.009844,19.257435
2,model,dataset,SGD,5.438362,
3,model,dataset,SGD,0.00723,5.438362
4,model,dataset,Adam,19.257435,
5,model,dataset,Adam,0.009844,19.257435
6,model,dataset,SGD,5.438362,
7,model,dataset,SGD,0.00723,5.438362


# Benchmarking a single model on multiple datasets

## Preparing multiple datasets

In [8]:
dataloaders_dicts = []
dl = TorchDataLoader(name="CIFAR10")

train_indices = list(range(64*10))

dl_tr, dl_temp = dl.build_dataloader(batch_size=32, sampler=SubsetRandomSampler(train_indices))

test_indices = [64*10 + x for x in list(range(64*2))]

dl_ts, dl_temp = dl.build_dataloader(batch_size=32, sampler=SubsetRandomSampler(test_indices))

temp_dict = {}
temp_dict["name"] = "CIFAR10_500"
temp_dict["dataloaders"] = (dl_tr, dl_ts)

dataloaders_dicts.append(temp_dict)

train_indices = list(range(64*5))

dl_tr, dl_temp = dl.build_dataloader(batch_size=32, sampler=SubsetRandomSampler(train_indices))

test_indices = [64*5 + x for x in list(range(64))]

dl_ts, dl_temp = dl.build_dataloader(batch_size=32, sampler=SubsetRandomSampler(test_indices))

temp_dict = {}
temp_dict["name"] = "CIFAR10_1000"
temp_dict["dataloaders"] = (dl_tr, dl_ts)

dataloaders_dicts.append(temp_dict)



Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


# Benchmarking a single dataset on multiple models

## Preparing multiple models

In [9]:
models_dicts = []

model = nn.Sequential(models.resnet18(pretrained=True), nn.Linear(1000,10))
temp_dict = {}
temp_dict["name"] = "resnet18"
temp_dict["model"] = model

models_dicts.append(temp_dict)

model = nn.Sequential(models.vgg16(pretrained=True), nn.Linear(1000,10))
temp_dict = {}
temp_dict["name"] = "vgg16"
temp_dict["model"] = model

models_dicts.append(temp_dict)

## Benchmarking both

In [10]:
from gdeep.search.benchmark import Benchmark

bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer)

bench.start(optimizer = SGD, epochs = 1, batch_size = 32, lr = 0.01)

Benchmarking Started
******************************
Training on Dataset: CIFAR10_500, Model: resnet18
Dataset CIFAR10
    Number of datapoints: 50000
    Root location: data
    Split: Train
    StandardTransform
Transform: ToTensor()
TOTAL EPOCHS  1
Epoch 1
-------------------------------
Training loss: 2.382942  [16/16]
Time taken for this epoch: 5s
Validation results: 
 Accuracy: 0.1%,                 Avg loss: 0.000199 

Test results: 
 Accuracy: 0.1%,                 Avg loss: 0.000185 

Done!
******************************
Training on Dataset: CIFAR10_500, Model: vgg16
Dataset CIFAR10
    Number of datapoints: 50000
    Root location: data
    Split: Train
    StandardTransform
Transform: ToTensor()
TOTAL EPOCHS  1
Epoch 1
-------------------------------
Training loss: 2.155665  [16/16]
Time taken for this epoch: 29s
Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000181 

Test results: 
 Accuracy: 0.0%,                 Avg loss: 0.000170 

Done!
***************

## Benchmarking + Gridsearch

In [11]:
from gdeep.search.benchmark import Benchmark
from gdeep.search.gridsearch import Gridsearch
from torch.optim import SGD, Adam, RMSprop

loss_fn = nn.CrossEntropyLoss()

bench = Benchmark(models_dicts, dataloaders_dicts, loss_fn, writer)

search = Gridsearch(bench, "loss", 2)
search.start((SGD, Adam), 1, 64, lr=(0.001, 0.01))



[32m[I 2021-10-04 01:15:50,145][0m A new study created in memory with name: no-name-21c96d88-e66a-4a2e-bf8d-ea45175a9498[0m


****************************************
Performing Gridsearch on Dataset: CIFAR10_500, Model: resnet18
Epoch 1
-------------------------------
Training loss: 3.132328  [16/16]
Time taken for this epoch: 10s


[32m[I 2021-10-04 01:16:00,506][0m Trial 0 finished with value: 3.1323282718658447 and parameters: {'optimizer': 'Adam', 'lr': 0.002504190727127988}. Best is trial 0 with value: 3.1323282718658447.[0m


Validation results: 
 Accuracy: 0.1%,                 Avg loss: 0.000255 

Done!
Epoch 1
-------------------------------
Training loss: 2.073715  [16/16]
Time taken for this epoch: 9s


[32m[I 2021-10-04 01:16:10,188][0m Trial 1 finished with value: 2.0737147331237793 and parameters: {'optimizer': 'SGD', 'lr': 0.00955426690355359}. Best is trial 1 with value: 2.0737147331237793.[0m
[32m[I 2021-10-04 01:16:10,359][0m A new study created in memory with name: no-name-f4804cb1-a6c4-4ab5-b523-9fc4907ec480[0m


Validation results: 
 Accuracy: 0.1%,                 Avg loss: 0.000167 

Done!
Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  2.0737147331237793
****************************************
Performing Gridsearch on Dataset: CIFAR10_500, Model: vgg16
Epoch 1
-------------------------------
Training loss: 1.731036  [16/16]
Time taken for this epoch: 47s


[32m[I 2021-10-04 01:17:00,964][0m Trial 0 finished with value: 1.7310364246368408 and parameters: {'optimizer': 'SGD', 'lr': 0.0020176149818730837}. Best is trial 0 with value: 1.7310364246368408.[0m


Validation results: 
 Accuracy: 0.2%,                 Avg loss: 0.000142 

Done!
Epoch 1
-------------------------------
Training loss: 2.352016  [16/16]6]
Time taken for this epoch: 56s


[32m[I 2021-10-04 01:17:59,726][0m Trial 1 finished with value: 2.352015733718872 and parameters: {'optimizer': 'Adam', 'lr': 0.0014788205112112622}. Best is trial 0 with value: 1.7310364246368408.[0m
[32m[I 2021-10-04 01:17:59,895][0m A new study created in memory with name: no-name-eb035ef3-eb3d-4980-bbc2-d0e695737471[0m


Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000277 

Done!
Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  1.7310364246368408
****************************************
Performing Gridsearch on Dataset: CIFAR10_1000, Model: resnet18
Epoch 1
-------------------------------
Training loss: 4.986741  [ 8/ 8]]
Time taken for this epoch: 5s


[32m[I 2021-10-04 01:18:04,954][0m Trial 0 finished with value: 4.986740589141846 and parameters: {'optimizer': 'Adam', 'lr': 0.006156067536648929}. Best is trial 0 with value: 4.986740589141846.[0m


Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000386 

Done!
Epoch 1
-------------------------------
Training loss: 6.622722  [ 8/ 8]]
Time taken for this epoch: 5s


[32m[I 2021-10-04 01:18:10,021][0m Trial 1 finished with value: 6.6227216720581055 and parameters: {'optimizer': 'Adam', 'lr': 0.0015249001567833671}. Best is trial 0 with value: 4.986740589141846.[0m
[32m[I 2021-10-04 01:18:10,185][0m A new study created in memory with name: no-name-8154ac41-2953-47f5-a403-24e607c9863b[0m


Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000190 

Done!
Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  4.986740589141846
****************************************
Performing Gridsearch on Dataset: CIFAR10_1000, Model: vgg16
Epoch 1
-------------------------------
Training loss: 2.377026  [ 8/ 8]
Time taken for this epoch: 29s


[32m[I 2021-10-04 01:18:41,129][0m Trial 0 finished with value: 2.377025842666626 and parameters: {'optimizer': 'Adam', 'lr': 0.0018690598238489402}. Best is trial 0 with value: 2.377025842666626.[0m


Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000093 

Done!
Epoch 1
-------------------------------
Training loss: 2.328606  [ 8/ 8]
Time taken for this epoch: 29s


[32m[I 2021-10-04 01:19:12,281][0m Trial 1 finished with value: 2.328605890274048 and parameters: {'optimizer': 'Adam', 'lr': 0.0018118381281911327}. Best is trial 1 with value: 2.328605890274048.[0m


Validation results: 
 Accuracy: 0.0%,                 Avg loss: 0.000095 

Done!
Study statistics: 
Number of finished trials:  2
Number of pruned trials:  0
Number of complete trials:  2
Best trial:
Metric Value for best trial:  2.328605890274048
