## Outline
In this notebook, we will explore the process of generating and classifying orbits using **Persistence Signals** vectorization and a Vanilla Random Forest Classifier. The main steps include:

1. *Data Generation*: Creating synthetic orbit data.
2. *Data Processing*: Preparing the data by computing persistent diagrams for $H_0$ and $H_1$ for each sample.
3. *Diagrams Embedding*: Each diagram from the dataset is embedded in a vector space using Persistence Signals.
3. *Model Training and Evaluation*: Building and training a RandomForest classifier and assess the model's performance.


In [8]:
import numpy as np
import spectral.utils as spu
from gudhi.representations.preprocessing import BirthPersistenceTransform
from spectral.signals import Signals
import warnings
warnings.filterwarnings('ignore')
dataset = "ORBIT5K"

### Data Generation
In this section, we generate synthetic data representing different orbits. This involves creating a series of points that simulate the path of an object in an orbital motion, based on specified parameters.


In [2]:
spu.compute_persistence(dataset)

Generating 1000 dynamical particles and PDs for r = 2.5...
Generating 1000 dynamical particles and PDs for r = 3.5...
Generating 1000 dynamical particles and PDs for r = 4.0...
Generating 1000 dynamical particles and PDs for r = 4.1...
Generating 1000 dynamical particles and PDs for r = 4.3...


### Data Visualization
In the plot below, a representative sample from each class is displayed to illustrate the diversity and characteristics of the different categories in our dataset.


![Optional Image Title](data/ORBIT5K/orbits.png)


### Data Processing
Here, we prepare the generated data for analysis by computing persistent diagrams for $H_0$ and $H_1$ homology groups for each sample. 


In [2]:
diagrams_dict, labels, n_data = spu.get_data(dataset)

Dataset: ORBIT5K
Number of observations: 5000
Number of classes: 5


### Diagrams Embedding
In this step, each persistence diagram from the dataset is embedded in a vector space using the **Persistence Signals** vectorization method. This transformation facilitates the application of machine learning algorithms by representing the complex topological data in a more accessible form.


In [9]:
data=[]
max_point = spu.max_measures(diagrams_dict)
shift = BirthPersistenceTransform()
model = Signals(function = "WAVELET",
                resolution = (128, 128),
                global_max = max_point,
                wave = "coif2")
for i in range(n_data):
    PD0, PD1 = diagrams_dict["H0"][i], diagrams_dict["H1"][i]
    PD0 = PD0[~np.isinf(PD0).any(axis=1)]
    shift.fit([PD0, PD1])
    diagrms = shift.transform([PD0, PD1])
    model.fit(diagrms)
    data.append(model.transform(diagrms))

### Model Training and Evaluation
In this section, we focus on both building and evaluating our classifier. Initially, we train the classifier using the processed data. Subsequently, we assess its performance by evaluating its accuracy and reliability in classifying different types of orbits.


In [10]:
mean, std = spu.evaluate_classifier_orbits(data, labels)

Run 1: Accuracy = 0.848
Run 2: Accuracy = 0.8453333333333334
Run 3: Accuracy = 0.83
Run 4: Accuracy = 0.8373333333333334
Run 5: Accuracy = 0.8413333333333334
Run 6: Accuracy = 0.8293333333333334
Run 7: Accuracy = 0.8313333333333334
Run 8: Accuracy = 0.8326666666666667
Run 9: Accuracy = 0.8413333333333334
Run 10: Accuracy = 0.8446666666666667
Overall Mean Accuracy across 10 runs: 0.8381333333333334
Standard Deviation across 10 runs: 0.006578078071223475
