# Basic tutorial: Training a topological model
### Author: Raphael Reinauer
### Date: 2022-04-05
 This short tutorial shows how to use the GDeep framework to train topological
 models using the topological datasets provided by the GDeep dataset cloud.
 The main steps of the tutorial are the following:
 1. Specify the dataset you want to use.
 2. Specify the model and the hyperparameter space you want to use.
 3. Run a large scale hyperparameter search to find the good hyperparameters.

In [1]:
# This snippet will deactivate autoreload if this file
# is run as a script and activate it if it is run as a notebook.
from gdeep.utility.utils import autoreload_if_notebook

autoreload_if_notebook()
# Include necessary imports
from os.path import join

# Import the GDeep hpo module
from gdeep.search import PersformerHyperparameterSearch


No TPUs...


 ## Training a topological model with the Dataset Cloud
 In this tutorial we will use the our custom datasets storage
 on [Google Cloud Datastore](https://cloud.google.com/datastore/) to
 load datasets and train a topological model.
 The dataset cloud storage contain a variety of topological datasets
 that can be easily used in GDeep.
 We will use the Mutag dataset from the
 [Mutagenicity Benchmark](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5276825/)
 to do performance benchmarking for the Persformer model.
 For this benchmark we will use the GDeep
 [Persformer](https://doi.org/10.48550/arXiv.2112.15210) model,
 the GDeep pipeline and the GDeep hyperparameter search.
 With only a few lines of code we can train multiple topological models
 with different hyperparameters and evaluate the performance of the model.

In [2]:
# This is how you use the api to search for the best hyperparameters for
# the MutagDataset using the PersformerHyperparameterSearch class.
# The search is performed using the hyperparameter
# search space described in hpo_space file provided.
# Please customize the file to your own dataset.
# The results are written to the path_writer directory.

dataset_name="MutagDataset"  # name of the dataset - has to exist in the datacloud buckets
download_directory = join("data", "DatasetCloud")  # directory where the dataset is downloaded
path_hpo_metadata = join('hpo_space', 'Mutag_hyperparameter_space.json')  # file describing the hyperparameter search space
path_writer = join("runs", "auto_ml")  # directory where the runs are stored using the tensorboard writer

# Initialize the search object with the search parameters.
hpo = PersformerHyperparameterSearch(dataset_name=dataset_name,
                               download_directory=download_directory,
                               path_hpo_metadata=path_hpo_metadata,
                               path_writer=path_writer)

# Start the hyperparameter search.
hpo.search()


[32m[I 2022-04-07 20:39:52,331][0m A new study created in memory with name: M9B6QOBYUDBQJVNMX5L8[0m


Dataset 'MutagDataset' already downloaded


********** Fold  1 **************
Epoch 1
-------------------------------
Epoch training loss: 0.708660 	Epoch training accuracy: 34.04%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00000000
Validation results: 
 Accuracy: 32.98%,                 Avg loss: 0.712919 

Epoch 2
-------------------------------
Epoch training loss: 0.693695 	Epoch training accuracy: 43.62%                                                          
Time taken for this epoch: 0.00s
Learning rate value: 0.00322646
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.666268 

Epoch 3
-------------------------------
Epoch training loss: 0.634310 	Epoch training accuracy: 65.96%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00645291
Validation results: 
 Accuracy: 67.02%,                 Avg loss: 0.629491 

Epoch 4
------

[32m[I 2022-04-07 20:39:52,830][0m Trial 0 finished with value: 0.6648936170212766 and parameters: {'optimizer': 'AdamW', 'lr': 0.019358743359385245, 'weight_decay': 2.3570364049932964e-06, 'batch_size': 28, 'dropout_dec': 0.1, 'dropout_enc': 0.4, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2, 'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'num_cycles': 1, 'num_training_steps': 5, 'num_warmup_steps': 6}. Best is trial 0 with value: 0.6648936170212766.[0m


Epoch training loss: 0.641869 	Epoch training accuracy: 67.02%                                                         
Time taken for this epoch: 0.00s
Learning rate value: 0.00967937
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.615109 

Epoch 5
-------------------------------
Epoch training loss: 0.648341 	Epoch training accuracy: 67.02%                                                        
Time taken for this epoch: 0.00s
Learning rate value: 0.01290583
Validation results: 
 Accuracy: 65.96%,                 Avg loss: 0.602573 

******************** RESULTS ********************
 
Model:  Persformer 
Model Hyperparameters: {'activation': 'gelu', 'bias_attention': 'False', 'hidden_dim': '16', 'num_inds': '16', 'layer_norm': 'False', 'layer_norm_pooling': 'False', 'num_heads': '2', 'attention_type': 'self_attention', 'dropout_dec': 0.1, 'dropout_enc': 0.4, 'dim_input': 6, 'n_layer_dec': 1, 'n_layer_enc': 2, 'dim_output': 2}
Optimizer: AdamW (
Parameter Group 0



Degrees of freedom <= 0 for slice


invalid value encountered in subtract


divide by zero encountered in true_divide


invalid value encountered in multiply

