## Outline
In this notebook, we will explore the process of  classifying graph datasets using **Persistence Signals** vectorization and a Vanilla Random Forest Classifier. The main steps include:

1. *Data Generation*: We generate the extended persistence diagrams using HKS and extended persistence from [PersLay](https://arxiv.org/abs/1904.09378).
2. *Diagrams Embedding, Model Training, and Evaluation*: We embed the extended diagrams using the Persistence Signals vectorization method. Subsequently, we perform a 10-fold classification using Random Forests and finally evaluate our model.

In [6]:
import spectral.utils as spu
import os

### Data Generation
In this section, we generate the extended persistence diagrams. 

In [2]:
dataset = "MUTAG"
filtrations=['0.1-hks', '10.0-hks']
spu.compute_extended_persistence(dataset=dataset, filtrations=filtrations)

Dataset: MUTAG
Number of observations: 188


### Diagrams Embedding, Model Training, and Evaluation
We train a Vanilla Random Forest Classifier on Persisitence Signals embedding of the extended persistence diagrams. We utilize a 10-fold cross-validation approach to ensure the reliability and generalizability of our model. Finally, we evaluate the performance of our classifier. T

In [3]:
dataset_list = ["MUTAG"]
algorithms = ["WAVELET"]
wave_list = ["coif1"]
grid_list = [(20, 20)]
filtrations=["0.1-hks", '10.0-hks']
graph_dtypes = ["dgmOrd0", "dgmExt0", "dgmRel1", "dgmExt1"]
repeat = 1 #Run only one time 10 fold classification
verbose = True

In [5]:
for dataset in dataset_list:
    graph_folder = os.path.join("./data/", dataset)
    all_diags, array_indices = spu.load_and_prepare_data(graph_folder, graph_dtypes, filtrations)
    length = len(array_indices) // 10
    sampling = "index"

    for alg in algorithms:
        for grid in grid_list:
            if alg == "WAVELET":
                for wave in wave_list:
                    spu.evaluate_model(dataset, alg, grid, wave, graph_folder, all_diags, array_indices, length, sampling, graph_dtypes, filtrations, repeat, verbose)
            else:
                spu.evaluate_model(dataset, alg, grid, None, graph_folder, all_diags, array_indices, length, sampling, graph_dtypes, filtrations, repeat, verbose)

Fold 1, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.97
  (Test score) accuracy_score: 0.89
Fold 2, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.96
  (Test score) accuracy_score: 0.94
Fold 3, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.96
  (Test score) accuracy_score: 0.89
Fold 4, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.97
  (Test score) accuracy_score: 0.89
Fold 5, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.98
  (Test score) accuracy_score: 0.83
Fold 6, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.97
  (Test score) accuracy_score: 0.89
Fold 7, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.96
  (Test score) accuracy_score: 0.89
Fold 8, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.97
  (Test score) accuracy_score: 0.89
Fold 9, embedding has size: {4608: 170}
  (Train score) accuracy_score: 0.96
  (Test score) accu