Skip to content

Commit

Permalink
Flag persistence generators (#29)
Browse files Browse the repository at this point in the history
* WIP first sketch (won't compile)

* [CPP] Fix compilation issue, refactor retrieving representative simplices

Currently it should not work

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Retrieve representative simplices from computation in CPP

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Disable sorting of barcodes to keep alignment with representative simplices

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Fix not enough memory pre-allocated for all dimensions

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [PY] Add draft of Python interface and Bindings support

for representative simplices

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Retrieve only "True" representative simplices

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Retrieve representative simplices with an indexing 1:1 with barcodes

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Refactor with clang-format

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Update size of representative simplices, to go from dim 1 to dim N

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Allow to sort barcodes and reproduce the same sort on rep simplices

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Add representative simplicies for dim 0

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [PY] Add output of representative simplices for dim 0 (finite and ess)

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [TEST] Add test for verify that all edges in rpsm are valid in the dm

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Remove array conversions, enforce int type for indices, ensure array shapes are correct

* [PY] Change dtype to int64 in flag generators

* [CPP] Refactor get_youngest_edge_simplex function

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Refactor code related to representative simplicies

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Fix vertices order in dimension 0 for representative simplicies

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [PY/TEST] Add exception if user uses collapser + representatives.

test added to verify this behavior

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [DOC] Add draft for input and output documentation of representative

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [EXAMPLE] Add a section about representative simplices example

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Fix birth simplices for finite 0-dimensional bars

* Actual fix for 0-dimensional persistence generator

* [CPP] Refactor @ulupo fix for representative in dim 0

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Add @ctralie in copyright notice

* Add inline comment as suggested by @MonkeyBreaker

* [CPP] Update as suggested by @ulupo method link of class union_find

The link function now also returns the birth vertex and allows to simplify logic in the computation of 0-dim

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Revert latest commit

link method works on the vertex index and the rank of the vertices
but the logic to retrieve the correct birth (value_t) value requires comparison of birth values
that IMO should not be integrated in the link method. To be discussed later
But for the moment I prefer revert this change

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Attempt fix for link_and_get_birth_vertex

- Extract birth using dset.get_birth
- FIXME avoid calling find twice for u and v by using an "unsafe" version of link_and_get_birth_vertex called _link_and_get_birth_vertex

* Fix _link_and_get_birth_vertex

* Address else branch forgotten in e04203c

* [CPP] Refactor union_find related code

Rename link function as suggested by @ulupo to link_roots_and_get_birth_vertex
Add comments for documentation about shorcuts used compared to original Kruskal's algorithm
Ensure that link is always computed if u != v

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Add @ulupo update on comment for link_roots_and_get_birth_vertex

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [TEST] Add regression test for rpsm with non 0 values on diagonal

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Fix suggested by @ulupo for non 0 values on diagonal of dm

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [TEST] Update comment and add additional test in test_rpsm_non_0_diagonal_dim0

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Rename ret_representative_simplices to return_generators and refactor docs and tests

* Update kwarg name in tests

* Change name to return_generators in bindings

* Mathematically intelligible solution to 0-dimensional generators computation

* Improve jupyter tutorial

* Use np.inf instead of np.infty

* Notebook refactor

* Mention of ripser.py in basic tutorial

* [CPP] Add inline comments in link_roots_and_get_birth_vertex

* [TEST] add to test_..._diagonal_dim0 check for the number of points

Fix that no value in the diagonal should be bigger than a non diagonal value

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [TEST] Add simple test for finite generate at dim > 0 suggested by @ulupo

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Rename representative by generator

Refactor slighly compute_dim_0

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* [CPP] Fix typos in two inline comments

* [CPP] fix small errors in inline comments

* [TEST] Update test for higher dimensions as suggested by @ulupo

If get_youngest_edge_simplex does not implement the correct logic it could per example return [4, 0, 5, 4]

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Fix import in flag_persistence_generators.ipynb

* [PY] Improve test docstring

Co-authored-by: julian <julian.burellaperez@heig-vd.ch>
  • Loading branch information
ulupo and julian committed Nov 1, 2021
1 parent 0cb6a82 commit ec45de5
Show file tree
Hide file tree
Showing 6 changed files with 748 additions and 183 deletions.
213 changes: 98 additions & 115 deletions examples/basic_tutorial.ipynb

Large diffs are not rendered by default.

168 changes: 168 additions & 0 deletions examples/flag_persistence_generators.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a40d1d2e-c23e-4a89-9a0c-694ce37fb8f0",
"metadata": {},
"source": [
"# Tutorial: Flag persistence generators\n",
"\n",
"In this notebook, we show how to retrieve the vertices and edges responsible for the creation/destruction of persistent topological features in a Vietoris–Rips filtration. We use the same setup and kind of dataset as in the [basic tutorial](https://github.com/giotto-ai/giotto-ph/blob/main/examples/basic_tutorial.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c821577f-4928-4403-90c3-d4b6ef4c5440",
"metadata": {},
"outputs": [],
"source": [
"# Install missing dependencies\n",
"import sys\n",
"!{sys.executable} -m pip install giotto-tda"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e4bf17b-f6b4-444c-a344-38ff096c7f4a",
"metadata": {},
"outputs": [],
"source": [
"# Giotto-ph\n",
"from gph import ripser_parallel\n",
"\n",
"# Import utils\n",
"import numpy as np\n",
"from gtda.homology._utils import _postprocess_diagrams\n",
"\n",
"# To generate dataset\n",
"from sklearn import datasets\n",
"\n",
"# Plotting\n",
"from plotly import graph_objects as go\n",
"from gtda.plotting import plot_diagram, plot_point_cloud"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af416ee2-3e72-4e43-b919-78647c90e4f6",
"metadata": {},
"outputs": [],
"source": [
"# Make a noisy circle point cloud\n",
"data = datasets.make_circles(n_samples=100, noise=0.1, factor=0.7, random_state=42)[0]\n",
"\n",
"# Plot the point cloud\n",
"plot_point_cloud(data)"
]
},
{
"cell_type": "markdown",
"id": "37c9dda6-a36b-4615-a2ab-0f48d5149773",
"metadata": {},
"source": [
"The information on vertices and edges responsible for the creation/destruction of persistent topological features can be retrieved by passing `return_generators=True` to `ripser_parallel`. A new entry is added to the output dictionary, with key `\"gens\"` and corresponding value a tuple of length 4 organized schematically as follows:\n",
"\n",
" 0. vertices creating and edges destroying *finite* $0$-dimensional features;\n",
" 1. edges creating and destroying *finite* $d$-dimensional features, $d \\geq 1$;\n",
" 2. vertices creating *infinite* $0$-dimensional features;\n",
" 3. edges creating *infinite* $d$-dimensional features, $d \\geq 1$.\n",
"\n",
"(Vertices are encoded by their indices and edges as pairs of vertex indices.) In the case of entries 1 and 3 (higher dimensions), that information is organized by homology dimension. So, for example, calling `gens` this tuple (value in the dictionary):\n",
"\n",
" - `gens[1]` and `gens[3]` are lists containing `maxdim` 2D integer `numpy` arrays, while `gens[0]` and `gens[2]` are `numpy` arrays;\n",
" - the edges creating and destroying finite features in dimension 1 are stored in `gens_finite_1 = gens[1][0]`;\n",
" - `gens_finite_1` is a 2D integer `numpy` array with as many rows as there are finite features in dimension 1;\n",
" - The `i`th finite feature in the 1-dimensional barcode is created by edge `gens_finite_1[i, :2]` and destroyed by edge `gens_finite_1[i, 2:]`.\n",
"\n",
"This way of presenting persistence birth and death vertices/edges agrees with other persistent homology packages and in particular with [GUDHI](http://gudhi.gforge.inria.fr/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3dd8319-d0dc-4cb1-98ce-b30b126cf62b",
"metadata": {},
"outputs": [],
"source": [
"# Compute the persistence information\n",
"persistence_info = ripser_parallel(data, return_generators=True)\n",
"gens = persistence_info['gens']"
]
},
{
"cell_type": "markdown",
"id": "a4f4cb5a-1db4-43dc-b3a6-90682cb1bfe6",
"metadata": {},
"source": [
"Let us visualize the 1-dimensional generators for our point cloud, labelling them by the positional index of the corresponding feature in the persistence diagram and using blue for creation and red for destruction:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b655265-23f2-4821-80f2-555db7970e35",
"metadata": {},
"outputs": [],
"source": [
"# Plot the point cloud\n",
"fig = plot_point_cloud(data)\n",
"\n",
"# In blue are the edges that create a finite persistent topological features in dimension 1.\n",
"# In red are the edges that destroy a finite persistent topological feature in dimension 1.\n",
"for i, edges in enumerate(gens[1][0]):\n",
" birth_edge = edges[0:2]\n",
" death_edge = edges[2:4]\n",
" x0_create, y0_create = data[birth_edge[0]]\n",
" x1_create, y1_create = data[birth_edge[1]]\n",
" x0_destroy, y0_destroy = data[death_edge[0]]\n",
" x1_destroy, y1_destroy = data[death_edge[1]] \n",
"\n",
" fig.add_shape(type='line',\n",
" x0=x0_create,\n",
" y0=y0_create,\n",
" x1=x1_create,\n",
" y1=y1_create,\n",
" line=dict(color='Blue'))\n",
" fig.add_shape(type='line',\n",
" x0=x0_destroy,\n",
" y0=y0_destroy,\n",
" x1=x1_destroy,\n",
" y1=y1_destroy,\n",
" line=dict(color='Red'))\n",
" fig.add_trace(\n",
" go.Scatter(x=[0.5 * (x0_create + x1_create) + 0.05, 0.5 * (x0_destroy + x1_destroy) + 0.05],\n",
" y=[0.5 * (y0_create + y1_create), 0.5 * (y0_destroy + y1_destroy)],\n",
" text=[str(i), str(i)],\n",
" mode=\"text\")\n",
" )\n",
"\n",
"fig.update_layout(showlegend=False)\n",
"fig.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
22 changes: 16 additions & 6 deletions gph/bindings/ripser_bindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,26 @@ PYBIND11_MODULE(gph_ripser, m)
using namespace pybind11::literals;
m.doc() = "Ripser python interface";

py::class_<flagPersGen>(m, "flagPersGen", py::module_local())
.def_readwrite("finite_0", &flagPersGen::finite_0)
.def_readwrite("finite_higher", &flagPersGen::finite_higher)
.def_readwrite("essential_0", &flagPersGen::essential_0)
.def_readwrite("essential_higher", &flagPersGen::essential_higher);
// Because `ripser` could have two different modules after compilation
// It's necessary to add `py::module_local()` to prevent following issue:
// ImportError: generic_type: type "ripserResults" is already registered!
// When same python module imports gtda_ripser and gtda_ripser_coeff
py::class_<ripserResults>(m, "ripserResults", py::module_local())
.def_readwrite("births_and_deaths_by_dim",
&ripserResults::births_and_deaths_by_dim)
.def_readwrite("flag_persistence_generators_by_dim",
&ripserResults::flag_persistence_generators)
.def_readwrite("num_edges", &ripserResults::num_edges);

m.def(
"rips_dm",
[](py::array_t<value_t>& D, int N, int modulus, int dim_max,
float threshold, int num_threads) {
float threshold, int num_threads, bool return_generators) {
// Setup distance matrix and figure out threshold
auto D_ = static_cast<value_t*>(D.request().ptr);
std::vector<value_t> distances(D_, D_ + N);
Expand All @@ -57,20 +64,21 @@ PYBIND11_MODULE(gph_ripser, m)
ripserResults res;
ripser<compressed_lower_distance_matrix> r(std::move(dist), dim_max,
threshold, ratio,
modulus, num_threads);
modulus, num_threads,
return_generators);
r.compute_barcodes();
r.copy_results(res);
res.num_edges = num_edges;
return res;
},
"D"_a, "N"_a, "modulus"_a, "dim_max"_a, "threshold"_a, "num_threads"_a,
"ripser distance matrix");
"return_generators"_a, "ripser distance matrix");

m.def(
"rips_dm_sparse",
[](py::array_t<index_t>& I, py::array_t<index_t>& J,
py::array_t<value_t>& V, int NEdges, int N, int modulus, int dim_max,
float threshold, int num_threads) {
float threshold, int num_threads, bool return_generators) {
auto I_ = static_cast<index_t*>(I.request().ptr);
auto J_ = static_cast<index_t*>(J.request().ptr);
auto V_ = static_cast<value_t*>(V.request().ptr);
Expand All @@ -80,7 +88,8 @@ PYBIND11_MODULE(gph_ripser, m)
// Setup distance matrix and figure out threshold
ripser<sparse_distance_matrix> r(
sparse_distance_matrix(I_, J_, V_, NEdges, N, threshold),
dim_max, threshold, ratio, modulus, num_threads);
dim_max, threshold, ratio, modulus, num_threads,
return_generators);
r.compute_barcodes();
// Report the number of edges that were added
int num_edges = 0;
Expand All @@ -95,5 +104,6 @@ PYBIND11_MODULE(gph_ripser, m)
return res;
},
"I"_a, "J"_a, "V"_a, "NEdges"_a, "N"_a, "modulus"_a, "dim_max"_a,
"threshold"_a, "num_threads"_a, "ripser sparse distance matrix");
"threshold"_a, "num_threads"_a, "return_generators"_a,
"ripser sparse distance matrix");
}

0 comments on commit ec45de5

Please sign in to comment.