This project implements an algorithm that we develop in our work on G-Mapper: Learning a Cover in the Mapper Construction.
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in
In 2007, G. Singh, F. Mémoli, and G. E. Carlsson introduced a network-based visualization technique called Mapper. The algorithm takes as input a point cloud dataset and produces as output a graph reflecting the structure of the underlying data. To apply the Mapper algorithm, the user needs to determine the following parameters that include choosing a lens (or filter) function
- Lens Function
- Cover
- Clustering Pre-images
- Mapper Graph
To implement the Mapper construction algorithm, we employ the code from Mapper Interactive developed by Y. Zhou, N. Chalapathi, A. Rathore, Y. Zhao, and B. Wang in 2021 (https://github.com/MapperInteractive/MapperInteractive).
Optimizing these parameters is essential for generating a "nice" Mapper graph. We concentrate on tuning a cover given by a collection of overlapping intervals. While the traditional Mapper algorithm takes a uniform cover with the number of intervals and the same overlapping percent between consecutive intervals to be specified by the user, sophisticated methods have recently been applied to optimize the cover of Mapper.
We propose a new Mapper construction algorithm called G-Mapper for optimizing a cover of the Mapper graph based on G-means clustering. The G-means clustering algorithm aims to learn the number
Parameters for G-Mapper are listed:
- AD_threshold: the critical value corresponding to the significance level
$\alpha$ for the statistical test. - g_overlap: the amount of overlap when an interval is split in two.
- (Search) method: 'BFS', 'DFS', and 'randomized'.
Our G-Mapper is inspired by an algorithm called Multipass AIC/BIC. In 2021, motivated by the X-means algorithm for estimating the number of clusters in
@article{doi:10.1137/24M1641312,
author = {Alvarado, Enrique and Belton, Robin and Fischer, Emily and Lee, Kang-Ju and Palande, Sourabh and Percival, Sarah and Purvine, Emilie},
title = {G-Mapper: Learning a Cover in the Mapper Construction},
journal = {SIAM Journal on Mathematics of Data Science},
volume = {7},
number = {2},
pages = {572-596},
year = {2025},
doi = {10.1137/24M1641312},
URL = {https://epubs.siam.org/doi/abs/10.1137/24M1641312}}
- (Journal Version) E. Alvarado, R. Belton, E. Fischer, K. J. Lee, S. Palande, S. Percival, and E. Purvine. (2025). G-mapper: Learning a cover in the mapper construction. SIAM Journal on Mathematics of Data Science, 7(2), 572-596.
- (ArXiv Version) E. Alvarado, R. Belton, E. Fischer, K. J. Lee, S. Palande, S. Percival, and E. Purvine. (2023). G-Mapper: Learning a Cover in the Mapper Construction. arXiv preprint arXiv:2309.06634.

