## Outline
In this notebook, we will explore the process of generating and classifying orbits using **QuPID** vectorization and a Vanilla Random Forest Classifier. The main steps include:

1. *Data Generation*: Creating synthetic orbit data.
2. *Data Processing*: Preparing the data by computing persistent diagrams for $H_0$ and $H_1$ for each sample.
3. *Diagrams Embedding*: Each diagram from the dataset is embedded in a vector space using QuPID.
3. *Model Training and Evaluation*: Building and training a RandomForest classifier and assess the model's performance.


In [1]:
import numpy as np
import qupid.utils as spu
from qupid.qupid import QuPID
import warnings
warnings.filterwarnings('ignore')
dataset = "ORBIT5K"

### Data Generation
In this section, we generate synthetic data representing different orbits. This involves creating a series of points that simulate the path of an object in an orbital motion, based on specified parameters.


In [2]:
spu.compute_persistence(dataset)

Generating 1000 dynamical particles and PDs for r = 2.5...
Generating 1000 dynamical particles and PDs for r = 3.5...
Generating 1000 dynamical particles and PDs for r = 4.0...
Generating 1000 dynamical particles and PDs for r = 4.1...
Generating 1000 dynamical particles and PDs for r = 4.3...


### Data Visualization
In the plot below, a representative sample from each class is displayed to illustrate the diversity and characteristics of the different categories in our dataset.


![Optional Image Title](./data/ORBIT5K/orbits.png)


### Data Processing
Here, we prepare the generated data for analysis by computing persistent diagrams for $H_0$ and $H_1$ homology groups for each sample. 


In [2]:
diagrams_dict, labels, n_data = spu.get_data(dataset)

Dataset: ORBIT5K
Number of observations: 5000
Number of classes: 5


### Diagrams Embedding
In this step, each persistence diagram from the dataset is embedded in a vector space using the **QuPID** vectorization method. This transformation facilitates the application of machine learning algorithms by representing the complex topological data in a more accessible form.


In [3]:
samplesH0, samplesH1 = spu.process_diagrams(diagrams_dict)

max_point0, max_point1 = spu.max_measures({"H0": samplesH0}), spu.max_measures({"H1": samplesH1})
params = {"function": "wvt", "wave": "coif1", "global_min": (0, 0)}
modelH0 = QuPID(**params, resolution=(1, 32), global_max=max_point0, alpha=(0, 1e3))
modelH1 = QuPID(**params, resolution=(32, 32), global_max=max_point1, alpha=(5e2, 1e3))

data = []
for i in range(n_data):
    modelH0.fit([samplesH0[i]])
    modelH1.fit([samplesH1[i]])
    data.append(np.concatenate((modelH0.transform(samplesH0[i]), modelH1.transform(samplesH1[i]))))

### Model Training and Evaluation 
In this section, we focus on both building and evaluating our classifier. Initially, we train the classifier using the processed data. Subsequently, we assess its performance by evaluating its accuracy and reliability in classifying different types of orbits.

In [17]:
mean, std = spu.evaluate_classifier_orbits(data, labels)

Run 1: Accuracy = 0.8753333333333333
Run 2: Accuracy = 0.884
Run 3: Accuracy = 0.8766666666666667
Run 4: Accuracy = 0.8746666666666667
Run 5: Accuracy = 0.886
Run 6: Accuracy = 0.8886666666666667
Run 7: Accuracy = 0.876
Run 8: Accuracy = 0.876
Run 9: Accuracy = 0.882
Run 10: Accuracy = 0.8866666666666667
Overall Mean Accuracy across 10 runs: 0.8805999999999999
Standard Deviation across 10 runs: 0.0051506202431249965
