<a href="https://colab.research.google.com/github/fbeilstein/topological_data_analysis/blob/master/lecture_13_reeb_graph_and_mapper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[KeplerMapper Applications](https://kepler-mapper.scikit-tda.org/en/latest/applications.html)

[An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists](https://arxiv.org/pdf/1710.04019)

[An Introduction to Topological Data Analysis for Physicists: From LGM to FRBs](https://arxiv.org/pdf/1904.11044)

## Configure the Mapper pipeline

Given a dataset ${\cal D}$ of points $x \in \mathbb{R}^n$, the basic steps behind Mapper are as follows:

1. Map ${\cal D}$ to a lower-dimensional space using a **filter function** $f: \mathbb{R}^n \to \mathbb{R}^m$. Common choices for the filter function include projection onto one or more axes via PCA or density-based methods. In ``giotto-tda``, you can import a variety of filter functions as follows:

```python,
from gtda.mapper.filter import FilterFunctionName
```

2. Construct a cover of the filter values ${\cal U} = (U_i)_{i\in I}$, typically in the form of a set of overlapping intervals which have constant length. As with the filter, a choice of cover can be imported as follows:

```python
from gtda.mapper.cover import CoverName
```

3. For each interval $U_i \in {\cal U}$ cluster the points in the preimage $f^{-1}(U_i)$ into sets $C_{i,1}, \ldots , C_{i,k_i}$. The choice of clustering algorithm can be any of ``scikit-learn``'s [clustering methods](https://scikit-learn.org/stable/modules/clustering.html) or an implementation of agglomerative clustering in ``giotto-tda``:

```python,
# scikit-learn method,
from sklearn.cluster import ClusteringAlgorithm,
# giotto-tda method,
from gtda.mapper.cluster import FirstSimpleGap,
```

4. Construct the topological graph whose vertices are the cluster sets $(C_{i,j})_{i\in I, j \in \{1,\ldots,k_i\}}$ and an edge exists between two nodes if they share points in common: $C_{i,j} \cap C_{k,l} = \emptyset$. This step is handled automatically by ``giotto-tda``.

These four steps are implemented in the ``MapperPipeline``  object that mimics the ``Pipeline`` class from ``scikit-learn``. We provide a convenience function ``make_mapper_pipeline`` that allows you to pass the choice of filter function, cover, and clustering algorithm as arguments. For example, to project our data onto the $x$- and $y$-axes, we could setup the pipeline as follows:

In [None]:
!pip install "numpy<1.26" --force-reinstall
!pip install giotto-tda

In [1]:
# Data wrangling
import numpy as np
import pandas as pd  # Not a requirement of giotto-tda, but is compatible with the gtda.mapper module

# Data viz
from gtda.plotting import plot_point_cloud

# TDA magic
from gtda.mapper import (
    CubicalCover,
    make_mapper_pipeline,
    Projection,
    plot_static_mapper_graph,
    plot_interactive_mapper_graph,
    MapperInteractivePlotter
)

# ML tools
from sklearn import datasets
from sklearn.cluster import DBSCAN
from sklearn.decomposition import PCA

In [2]:
data, _ = datasets.make_circles(n_samples=5000, noise=0.05, factor=0.3, random_state=42)

plot_point_cloud(data)


In [3]:
# Define filter function – can be any scikit-learn transformer
filter_func = Projection(columns=[0, 1])
# Define cover
cover = CubicalCover(n_intervals=10, overlap_frac=0.3)
# Choose clustering algorithm – default is DBSCAN
clusterer = DBSCAN()

# Configure parallelism of clustering step
n_jobs = 1

# Initialise pipeline
pipe = make_mapper_pipeline(
    filter_func=filter_func,
    cover=cover,
    clusterer=clusterer,
    verbose=False,
    n_jobs=n_jobs,
)


In [4]:
fig = plot_static_mapper_graph(pipe, data)
fig.show(config={'scrollZoom': True})


In [None]:
!pip install giotto-tda[plotting] trimesh

In [15]:
import trimesh
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from gtda.mapper import (
    CubicalCover,
    make_mapper_pipeline,
    Projection, Eccentricity,
    plot_static_mapper_graph,
    plot_interactive_mapper_graph,
    MapperInteractivePlotter
)

mesh = trimesh.load_mesh('https://graphics.stanford.edu/pub/3Dscanrep/bunny.tar.gz', process=True, allow_remote=True)
points = mesh.sample(1000)

filter_func = Eccentricity()
cover = CubicalCover(n_intervals=10, overlap_frac=0.3)
clusterer = DBSCAN(eps=0.1, min_samples=5)

mapper_pipeline = make_mapper_pipeline(
    filter_func=filter_func,
    cover=cover,
    clusterer=clusterer,
    verbose=False
)


fig = plot_static_mapper_graph(mapper_pipeline, points)
fig.show(config={'scrollZoom': True})

In [None]:
!pip install pythreejs

In [19]:
import trimesh
from pythreejs import *
from IPython.display import display

mesh = trimesh.load_mesh('https://graphics.stanford.edu/pub/3Dscanrep/bunny.tar.gz', process=True, allow_remote=True)
scene = mesh.scene()
scene.show()