Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
**/*.egg-info
**/.ipynb_checkpoints
**/*.log
**/docs/build

.coverage
.vscode
Expand Down
34 changes: 34 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
# You can also specify other tool versions:
# nodejs: "19"
# rust: "1.64"
# golang: "1.19"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
- method: pip
path: .
125 changes: 68 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,85 @@
# tda-mapper-python
# tda-mapper

![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg) [![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg)
[![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
[![docs](https://readthedocs.org/projects/tda-mapper/badge/?version=latest)](https://tda-mapper.readthedocs.io/en/latest/?badge=latest)

In recent years, an ever growing interest in **Topological Data Analysis** (TDA) emerged in the field of data science. The core idea of TDA is to gain insights from data by using topological methods that are proved to be reliable with respect to noise, and that behave nicely with respect to dimension. This Python package provides an implementation of the **Mapper Algorithm**, a well-known tool from TDA.

The Mapper Algorithm takes any dataset $X$ and returns a *shape-summary* in the form a graph $G$, called **Mapper Graph**. It's possible to prove, under reasonable conditions, that $X$ and $G$ share the same number of connected components.

## Basics
For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).

* Installation from package: TBD
* Installation from sources: clone this repo and run ```python -m pip install .```
* Documentation: https://tda-mapper.readthedocs.io/en/latest/


## Usage

![In this file](https://github.com/lucasimi/tda-mapper-python/raw/main/tests/example.py) you can find a worked out example that shows how to use this package. We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.

```python
import numpy as np

from sklearn.datasets import load_digits
from sklearn.cluster import AgglomerativeClustering
from sklearn.decomposition import PCA

from tdamapper.core import MapperAlgorithm
from tdamapper.cover import CubicalCover
from tdamapper.clustering import PermissiveClustering
from tdamapper.plot import MapperPlot

# We load a labelled dataset
X, y = load_digits(return_X_y=True)
# We compute the lens values
lens = PCA(2).fit_transform(X)

mapper_algo = MapperAlgorithm(
cover=CubicalCover(
n_intervals=10,
overlap_frac=0.65),
# We prevent clustering failures
clustering=PermissiveClustering(
clustering=AgglomerativeClustering(10),
verbose=False),
n_jobs=1)
mapper_graph = mapper_algo.fit_transform(X, lens)

mapper_plot = MapperPlot(X, mapper_graph,
# We color according to digit values
colors=y,
# Jet colormap, used for classes
cmap='jet',
# We aggregate on graph nodes according to mean
agg=np.nanmean,
dim=2,
iterations=400)
fig_mean = mapper_plot.plot(title='digit (mean)', width=600, height=600)
fig_mean.show(config={'scrollZoom': True})

Let $f$ be any chosen *lens*, i.e. a continuous map $f \colon X \to Y$, being $Y$ any parameter space (*typically* low dimensional). In order to build the Mapper Graph follow these steps:

1. Build an *open cover* for $f(X)$, i.e. a collection of *open sets* whose union makes the whole image $f(X)$.

2. Run clustering on the preimage of each open set. All these local clusters together make a *refined open cover* for $X$.

3. Build the mapper graph $G$ by taking a node for each local cluster, and by drawing an edge between two nodes whenever their corresponding local clusters intersect.
```

To get an idea, in the following picture we have $X$ as an X-shaped point cloud in $\mathbb{R}^2$, with $f$ being the *height function*, i.e. the projection on the $y$-axis. In the leftmost part we cover the projection of $X$ with three open sets. Every open set is represented with a different color. Then we take the preimage of these sets, cluster then, and finally build the graph according to intersections.
![Mapper Graph of the digits dataset, colored according to mean value](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_mean.png)

![Steps](resources/mapper.png)
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation

The choice of the lens is the most relevant on the shape of the Mapper Graph. Some common choices are *statistics*, *projections*, *entropy*, *density*, *eccentricity*, and so forth. However, in order to pick a good lens, specific domain knowledge for the data at hand can give a hint. For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).

## Installation
```python
# We reuse the graph plot with the same positions
fig_std = mapper_plot.with_colors(
colors=y,
# Viridis colormap, used for ranges
cmap='viridis',
# We aggregate on graph nodes according to std
agg=np.nanstd,
).plot(title='digit (std)', width=600, height=600)
fig_std.show(config={'scrollZoom': True})

Clone this repo, and install via `pip` from your local directory
```
python -m pip install .
```
Alternatively, you can use `pip` to install directly from GitHub
```
pip install git+https://github.com/lucasimi/tda-mapper-python.git
```
If you want to install the version from a specific branch, for example `develop`, you can run
```
pip install git+https://github.com/lucasimi/tda-mapper-python.git@develop
```

## A worked out example

![In this file](tests/example.py) you can find a worked out example that shows how to use this package.
We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.

![The mapper graph of the digits dataset, colored according to mean value](resources/digits_mean.png)

It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation

![The mapper graph of the digits dataset, colored according to std](resources/digits_std.png)
![Mapper Graph of the digits dataset, colored according to std](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_std.png)

The mapper graph of the digits dataset shows a few interesting patterns. For example, we can make the following observations:

Expand All @@ -55,25 +88,3 @@ The mapper graph of the digits dataset shows a few interesting patterns. For exa
* Some clusters are not well separated and tend to overlap one on the other. This mixed behavior is present in those digits which can be easily confused one with the other, for example digits 5 and 6.

* Clusters located across the "boundary" of two different digits show a transition either due to a change in distribution or due to distorsions in the hand written text, for example digits 8 and 2.


### Development - Supported Features

- [x] Topology
- [x] custom lenses
- [x] custom metrics

- [x] Cover algorithms:
- [x] `CubicalCover`
- [x] `BallCover`
- [x] `KnnCover`

- [x] Clustering algoritms
- [x] `sklearn.cluster`-compatible algorithms
- [x] `TrivialClustering` to skip clustering
- [x] `CoverClustering` for clustering induced by cover

- [x] Plot
- [x] 2d interactive plot
- [x] 3d interactive plot
- [ ] HTML embeddable plot
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sphinx_rtd_theme
27 changes: 27 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'tda-mapper'
copyright = '2024, Luca Simi'
author = 'Luca Simi'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc', 'sphinx_rtd_theme']

templates_path = ['_templates']
exclude_patterns = []



# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
20 changes: 20 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. tda-mapper documentation master file, created by
sphinx-quickstart on Fri Jan 26 21:56:08 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to tda-mapper's documentation!
======================================

.. toctree::
:maxdepth: 2
:caption: Contents:

modules

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
7 changes: 7 additions & 0 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
API Reference
=============

.. toctree::
:maxdepth: 4

tdamapper
31 changes: 31 additions & 0 deletions docs/source/tdamapper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
tdamapper.core Mapper Algorithm
-------------------------------

.. automodule:: tdamapper.core
:members:
:undoc-members:
:show-inheritance:

tdamapper.cover Cover Algorithms
--------------------------------

.. automodule:: tdamapper.cover
:members:
:undoc-members:
:show-inheritance:

tdamapper.clustering Clustering Algorithms
------------------------------------------

.. automodule:: tdamapper.clustering
:members:
:undoc-members:
:show-inheritance:

tdamapper.plot Mapper Plot
--------------------------

.. automodule:: tdamapper.plot
:members:
:undoc-members:
:show-inheritance:
53 changes: 53 additions & 0 deletions docs/source/tdamapper.utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
tdamapper.utils package
=======================

Submodules
----------

tdamapper.utils.heap module
---------------------------

.. automodule:: tdamapper.utils.heap
:members:
:undoc-members:
:show-inheritance:

tdamapper.utils.quickselect module
----------------------------------

.. automodule:: tdamapper.utils.quickselect
:members:
:undoc-members:
:show-inheritance:

tdamapper.utils.unionfind module
--------------------------------

.. automodule:: tdamapper.utils.unionfind
:members:
:undoc-members:
:show-inheritance:

tdamapper.utils.vptree module
-----------------------------

.. automodule:: tdamapper.utils.vptree
:members:
:undoc-members:
:show-inheritance:

tdamapper.utils.vptree\_flat module
-----------------------------------

.. automodule:: tdamapper.utils.vptree_flat
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: tdamapper.utils
:members:
:undoc-members:
:show-inheritance:
Loading