# Can the spectral gap of a filtration classify topological spaces? 

In this notebook I assess whether the smallest nonzero eigenvalues obtained from a filtration of point-cloud data can be used to classify it. 
The goal is to distinguish between pointclounds of $S^2$ and a 'swiss roll', which is homeomorphic to the unit square $I^2$.

1. For each of the two spaces, 50 datasets with 50 noisy points in each are generated. 
2. A filtration is generated from the data using Gudhi's implementation of Alpha complexes.
3. 20 uniformly spaced indices $I$ in the filtration are selected, and for each of the approximately 200 simplicial complex pairs $(i, j) \in I^2$ the smallest nonzero eigenvalue $\lambda^q_{i,j}$ of the persistent Laplacian is computed at each simplicial complex dimension $q$. 
4. A Logistic regression model is trained using the nonpersistent eigenvalues in dimension 1: $(\lambda^1_{i,i})_{i \in I}$. This is the _non-persistent_ model.
5. A Logistic regression model is trained using the eigenvalues in dimension 1: $(\lambda^1_{i,j})_{(i,j) \in I^2}$. This is the _persistent_ model.
6. A paired t-test is run to assess whether the persistent model performs differently than the non-persistent model. 

## Generate data

In [1]:
import tadasets
dataset = [(tadasets.dsphere(n=50, d=2, r=2, noise=0.1, seed=i), 0) for i in range(50)] + [(tadasets.swiss_roll(n=50, r=2, noise=0.1, seed=i), 1) for i in range(50)]

## Cross-validation scaffolding

In [2]:
from data_analysis.cross_validation import run_cross_validation

## Nonpersistent smallest nonzero eigenvalue 

In [3]:
from persistent_laplacians.eigenvalues import compute_eigenvalues
import numpy as np
def extract_nonpersistent_feature(data):
    result = compute_eigenvalues(
        data,
        num_indices=20,
        use_scipy=True,
        use_stepwise_schur=True,
        zero_tol=1e-6
    )
    # Filter result to nonpersistent dim 2 features
    nonpersistent_dim2 = [
        (k[0], v)
        for k, v in result[2].items()
        if k[0] == k[1]
    ]
    nonpersistent_dim2.sort(key=lambda x: x[0])
    # Return first element of each or zero if missing
    return np.array([vec[0] if vec else 0 for _, vec in nonpersistent_dim2])

In [4]:
accuracies_nonpersistent = run_cross_validation(
    dataset=dataset,
    feature_extractor=extract_nonpersistent_feature,
    n_splits=5,
    random_state=42
)

mean_acc = np.mean(accuracies_nonpersistent)
std_acc = np.std(accuracies_nonpersistent)
print(f"Cross-validated accuracies: {accuracies_nonpersistent}")
print(f"Mean accuracy: {mean_acc:.3f} ± {std_acc:.3f}")

  return pl.smallest_eigenvalue(


Cross-validated accuracies: [1.0, 1.0, 1.0, 1.0, 1.0]
Mean accuracy: 1.000 ± 0.000


## Persistent smallest eigenvalue 

In [5]:
from persistent_laplacians.eigenvalues import compute_eigenvalues
import numpy as np
def extract_persistent_feature(data):
    result = compute_eigenvalues(
        data,
        num_indices=20,
        use_scipy=True,
        use_stepwise_schur=True,
        zero_tol=1e-6
    )
    dim1_result = [x for x in result[1].items()]
    dim1_result.sort(key=lambda x: x[1])
    dim1_result.sort(key=lambda x: x[0])
    # Return first element of each or zero if missing
    return np.array([vec[0] if vec else 0 for _, vec in dim1_result])

In [6]:
import numpy as np
accuracies_persistent = run_cross_validation(
    dataset=dataset,
    feature_extractor=extract_persistent_feature,
    classifier=None,
    n_splits=5,
    random_state=42
)

mean_acc = np.mean(accuracies_persistent)
std_acc = np.std(accuracies_persistent)
print(f"Cross-validated accuracies: {accuracies_persistent}")
print(f"Mean accuracy: {mean_acc:.3f} ± {std_acc:.3f}")

  return pl.smallest_eigenvalue(


## Paired t-test

In [7]:
from scipy.stats import ttest_rel

# Paired t-test
t_stat, p_val = ttest_rel(accuracies_persistent, accuracies_nonpersistent)
print(f"paired t-test p = {p_val:.3f}")

paired t-test p = 0.374
