Skip to content

MRC-Mapper/G-Mapper

Repository files navigation

G-Mapper: Learning a Cover in the Mapper Construction

This project implements an algorithm that we develop in our work on G-Mapper: Learning a Cover in the Mapper Construction.

Abstract

The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly faster than a previous iterative method.

Mapper Construction

In 2007, G. Singh, F. Mémoli, and G. E. Carlsson introduced a network-based visualization technique called Mapper. The algorithm takes as input a point cloud dataset and produces as output a graph reflecting the structure of the underlying data. To apply the Mapper algorithm, the user needs to determine the following parameters that include choosing a lens (or filter) function $f:X \to Y$ from a high-dimensional point cloud $X$ to a lower-dimensional space $Y$, an (open) cover of the target space $Y$, and a clustering algorithm for cover elements. Constructing a Mapper graph consists of the following procedure.

  1. Lens Function
  2. Cover
  3. Clustering Pre-images
  4. Mapper Graph

Mapper Construction

To implement the Mapper construction algorithm, we employ the code from Mapper Interactive developed by Y. Zhou, N. Chalapathi, A. Rathore, Y. Zhao, and B. Wang in 2021 (https://github.com/MapperInteractive/MapperInteractive).

Optimizing these parameters is essential for generating a "nice" Mapper graph. We concentrate on tuning a cover given by a collection of overlapping intervals. While the traditional Mapper algorithm takes a uniform cover with the number of intervals and the same overlapping percent between consecutive intervals to be specified by the user, sophisticated methods have recently been applied to optimize the cover of Mapper.

G-Mapper

We propose a new Mapper construction algorithm called G-Mapper for optimizing a cover of the Mapper graph based on G-means clustering. The G-means clustering algorithm aims to learn the number $k$ of clusters in the $k$-means clustering algorithm according to a statistical test, called the Anderson–Darling test, for the hypothesis that the points in a cluster follow a Gaussian distribution. Our algorithm splits a cover element iteratively, with our splitting decision determined by the Anderson–Darling score. For further elaboration, we split each cover element into two overlapping intervals by employing a Gaussian mixture model (GMM) so that the splits are made according to the characteristics of the cover element rather than yielding uniform intervals. This procedure allows us to take variance into account when forming cover elements, making our algorithm perform well without the initialization of a cover.


G-Mapper

Parameters for G-Mapper are listed:

  • AD_threshold: the critical value corresponding to the significance level $\alpha$ for the statistical test.
  • g_overlap: the amount of overlap when an interval is split in two.
  • (Search) method: 'BFS', 'DFS', and 'randomized'.

Our G-Mapper is inspired by an algorithm called Multipass AIC/BIC. In 2021, motivated by the X-means algorithm for estimating the number of clusters in $k$-means according to the Bayesian Information Criterion (BIC), N. Chalapathi, Y. Zhou, and B. Wang devised the algorithm which repeatedly splits intervals of a coarse cover according to information criteria (https://github.com/tdavislab/mapper-xmean-cover).

Reference

@article{doi:10.1137/24M1641312,
author = {Alvarado, Enrique and Belton, Robin and Fischer, Emily and Lee, Kang-Ju and Palande, Sourabh and Percival, Sarah and Purvine, Emilie},
title = {G-Mapper: Learning a Cover in the Mapper Construction},
journal = {SIAM Journal on Mathematics of Data Science},
volume = {7},
number = {2},
pages = {572-596},
year = {2025},
doi = {10.1137/24M1641312},
URL = {https://epubs.siam.org/doi/abs/10.1137/24M1641312}}
  • (Journal Version) E. Alvarado, R. Belton, E. Fischer, K. J. Lee, S. Palande, S. Percival, and E. Purvine. (2025). G-mapper: Learning a cover in the mapper construction. SIAM Journal on Mathematics of Data Science, 7(2), 572-596.
  • (ArXiv Version) E. Alvarado, R. Belton, E. Fischer, K. J. Lee, S. Palande, S. Percival, and E. Purvine. (2023). G-Mapper: Learning a Cover in the Mapper Construction. arXiv preprint arXiv:2309.06634.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors