# Basic tutorial: Training a topological model
### Author: Raphael Reinauer
### Date: 2022-04-05
 This short tutorial shows how to use the GDeep framework to train topological
 models using the topological datasets provided by the GDeep dataset cloud.
 The main steps of the tutorial are the following:
 1. Specify the dataset you want to use.
 2. Specify the model and the hyperparameter space you want to use.
 3. Run a large scale hyperparameter search to find the good hyperparameters.

In [1]:
# This snippet will deactivate autoreload if this file
# is run as a script and activate it if it is run as a notebook.
from gdeep.utility.utils import autoreload_if_notebook

autoreload_if_notebook()
# Include necessary imports
from os.path import join

# Import the GDeep hpo module
from gdeep.search import PersformerHyperparameterSearch


No TPUs...


 ## Training a topological model with the Dataset Cloud
 In this tutorial we will use the our custom datasets storage
 on [Google Cloud Datastore](https://cloud.google.com/datastore/) to
 load datasets and train a topological model.
 The dataset cloud storage contain a variety of topological datasets
 that can be easily used in GDeep.
 We will use the Mutag dataset from the
 [Mutagenicity Benchmark](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5276825/)
 to do performance benchmarking for the Persformer model.
 For this benchmark we will use the GDeep
 [Persformer](https://doi.org/10.48550/arXiv.2112.15210) model,
 the GDeep pipeline and the GDeep hyperparameter search.
 With only a few lines of code we can train multiple topological models
 with different hyperparameters and evaluate the performance of the model.

In [2]:
# This is how you use the api to search for the best hyperparameters for
# the MutagDataset using the PersformerHyperparameterSearch class.
# The search is performed using the hyperparameter
# search space described in hpo_space file provided.
# Please customize the file to your own dataset.
# The results are written to the path_writer directory.

dataset_name="MutagDataset"  # name of the dataset - has to exist in the datacloud buckets
download_directory = join("data", "DatasetCloud")  # directory where the dataset is downloaded
path_hpo_metadata = join('hpo_space', 'Mutag_hyperparameter_space.json')  # file describing the hyperparameter search space
path_writer = join("runs", "auto_ml")  # directory where the runs are stored using the tensorboard writer

# Initialize the search object with the search parameters.
hpo = PersformerHyperparameterSearch(dataset_name=dataset_name,
                               download_directory=download_directory,
                               path_hpo_metadata=path_hpo_metadata,
                               path_writer=path_writer)

# Start the hyperparameter search.
hpo.search()


[32m[I 2022-04-09 15:47:49,195][0m A new study created in memory with name: 0IXGZBEFSW3WFTPADFY6[0m


Dataset 'MutagDataset' already downloaded


********** Fold  1 **************
Epoch 1
-------------------------------
Epoch training loss: 0.708505 	Epoch training accuracy: 35.11%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00000000
Validation results: 
 Accuracy: 12.77%,                 Avg loss: 0.713309 

Epoch 2
-------------------------------
Epoch training loss: 0.684511 	Epoch training accuracy: 47.87%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00219622
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.681674 

Epoch 3
-------------------------------
Epoch training loss: 0.632176 	Epoch training accuracy: 63.83%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00439243
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.629779 

Epoch 4
-------

[32m[I 2022-04-09 15:47:49,507][0m Trial 0 finished with value: 0.6648936170212766 and parameters: {'optimizer': 'AdamW', 'lr': 0.013177295296018863, 'weight_decay': 4.5681682750733654e-05, 'batch_size': 30, 'dropout_dec': 0.30000000000000004, 'dropout_enc': 0.0, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}. Best is trial 0 with value: 0.6648936170212766.[0m


Epoch training loss: 0.685360 	Epoch training accuracy: 47.87%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00219622
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.687396 

Epoch 3
-------------------------------
Epoch training loss: 0.640607 	Epoch training accuracy: 67.02%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00439243
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.625888 

Epoch 4
-------------------------------
Epoch training loss: 0.698771 	Epoch training accuracy: 67.02%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00658865
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.633365 

Epoch 5
-------------------------------
Epoch training loss: 0.743156 	Epoch training accuracy: 67.02%                               

[32m[I 2022-04-09 15:47:49,794][0m Trial 1 finished with value: 0.6648936170212766 and parameters: {'optimizer': 'AdamW', 'lr': 0.04393209252571044, 'weight_decay': 0.000359744125351368, 'batch_size': 28, 'dropout_dec': 0.1, 'dropout_enc': 0.0, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}. Best is trial 0 with value: 0.6648936170212766.[0m


Epoch training loss: 0.610481 	Epoch training accuracy: 65.96%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.02196605
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.630797 

Epoch 5
-------------------------------
Epoch training loss: 0.524147 	Epoch training accuracy: 70.21%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.02928806
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.650382 



********** Fold  2 **************
Epoch 1
-------------------------------
Epoch training loss: 0.738910 	Epoch training accuracy: 35.11%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00000000
Validation results: 
 Accuracy: 34.04%,                 Avg loss: 0.734908 

Epoch 2
-------------------------------
Epoch training loss: 0.658850 	Epoch training accuracy: 5

[32m[I 2022-04-09 15:47:50,132][0m Trial 2 finished with value: 0.6648936170212766 and parameters: {'optimizer': 'AdamW', 'lr': 0.0019454399041960568, 'weight_decay': 0.000549901254144583, 'batch_size': 20, 'dropout_dec': 0.05, 'dropout_enc': 0.25, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}. Best is trial 0 with value: 0.6648936170212766.[0m


Epoch training loss: 0.723574 	Epoch training accuracy: 32.98%                                            
Time taken for this epoch: 0.00s
Learning rate value: 0.00032424
Validation results: 
 Accuracy: 34.04%,                 Avg loss: 0.715425 

Epoch 3
-------------------------------
Epoch training loss: 0.712319 	Epoch training accuracy: 32.98%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00064848
Validation results: 
 Accuracy: 34.04%,                 Avg loss: 0.701606 

Epoch 4
-------------------------------
Epoch training loss: 0.695034 	Epoch training accuracy: 45.74%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00097272
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.679992 

Epoch 5
-------------------------------
Epoch training loss: 0.670750 	Epoch training accuracy: 67.02%                                            

[32m[I 2022-04-09 15:47:50,429][0m Trial 3 finished with value: 0.3351063829787234 and parameters: {'optimizer': 'AdamW', 'lr': 0.0005439482205305226, 'weight_decay': 8.985772124674708e-06, 'batch_size': 24, 'dropout_dec': 0.2, 'dropout_enc': 0.5, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}. Best is trial 0 with value: 0.6648936170212766.[0m

invalid value encountered in subtract


invalid value encountered in true_divide


invalid value encountered in true_divide



Study statistics: 
Number of pruned trials:  0
Number of complete trials:  4
******************** BEST TRIAL: ********************
Metric Value for best trial:  0.6648936170212766
Parameters Values for best trial:  {'optimizer': 'AdamW', 'lr': 0.013177295296018863, 'weight_decay': 4.5681682750733654e-05, 'batch_size': 30, 'dropout_dec': 0.30000000000000004, 'dropout_enc': 0.0, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}
DateTime start of the best trial:  2022-04-09 15:47:49.196387
