Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bindings for collapser #469

Merged
merged 27 commits into from Sep 8, 2020
Merged

Add bindings for collapser #469

merged 27 commits into from Sep 8, 2020

Conversation

reds-heig
Copy link
Collaborator

@reds-heig reds-heig commented Aug 28, 2020

Reference issues/PRs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Add bindings for collapser from Gudhi new feature.

Screenshots (if appropriate)

Any other comments?

Currently work in progress, there's a bug on the size of the matrix returns by the C++ in python.

Checklist

  • I have read the guidelines for contributing.
  • My code follows the code style of this project. I used flake8 to check my Python changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed. I used pytest to check this on Python tests.

Currently supports Sparse and Dense representation as input
returns a sparse matrix

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@CLAassistant
Copy link

CLAassistant commented Aug 28, 2020

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ulupo
❌ MonkeyBreaker
You have signed the CLA already but the status is still pending? Let us recheck it.

@ulupo
Copy link
Collaborator

ulupo commented Aug 30, 2020

@MonkeyBreaker I have hacked your code to enable edge collapsing with the ripser python interface. I'm sure the C++ is totally stupid, I just hacked a solution which compiled and seems to give the right results.

You can try this in python:

import numpy as np
from gtda.externals import ripser

X = np.random.random((100, 100))
np.fill_diagonal(X, 0)  # Without this line, there might be different results because GUDHI assumes zeros in the diagonal

maxdim = 2

diags_collapsed = ripser(X, metric='precomputed', maxdim=maxdim, collapse_edges=True)['dgms']
diags_not_collapsed = ripser(X, metric='precomputed', maxdim=maxdim, collapse_edges=False)['dgms']

thresh = 0.1
diags_collapsed_thresh = ripser(X, metric='precomputed', maxdim=maxdim, thresh=thresh, collapse_edges=True)['dgms']
diags_not_collapsed_thresh = ripser(X, metric='precomputed', maxdim=maxdim, thresh=thresh, collapse_edges=False)['dgms']

for i in range(2):
    assert np.array_equal(diags_collapsed[i], diags_not_collapsed[i])
    assert np.array_equal(diags_collapsed_thresh[i], diags_not_collapsed_thresh[i])

For now, I do not see substantial speedups and in fact I only see slow-downs. It would be good to see if you can reproduce the performance gains reported at the end of the paper https://hal.inria.fr/hal-02395227/document.

Another point: The thresholding is done in python at the moment, but I guess ideally it would be done in C++. I think this means that we would have to call the edge collapse C++ function inside ripser.cpp, but I'm not sure. Notice that to reproduce the results from the paper you will need to use the thresh parameter.

@ulupo
Copy link
Collaborator

ulupo commented Aug 30, 2020

@MonkeyBreaker to test the bindings you had originally written (and which I "fixed") on the simple GUDHI example, you can do:

from gtda.externals.modules.gtda_collapser import flag_complex_collapse_edges_sparse, flag_complex_collapse_edges_dense
from scipy.sparse import coo_matrix

# Sparse case
X = coo_matrix(([1., 1., 1., 1., 2., 2.], ([0, 1, 2, 3, 0, 1], [1, 2, 3, 0, 2, 3])))
collapsed_edges = flag_complex_collapse_edges_sparse(X)
assert collapsed_edges[1, 3] == 0.

# Dense case
X = X.toarray()
collapsed_edges = flag_complex_collapse_edges_dense(X)
assert collapsed_edges[1, 3] == 0.

julian added 3 commits August 31, 2020 13:53
* All functions have the same name, supporting different input types
* All functions returns now tuple<vector, vector, vector>

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@MonkeyBreaker
Copy link
Collaborator

@ulupo bindings updated and test added

  • Now the bindings support 3 different inputs:
    • Sparse matrix
    • Dense matrix
    • row, col, data vectors
  • Simple test added for each input type
  • As discussed, now each function returns a tuple of 3 vectors <row_idx, col_idx, data>, and it's easier to transform it in python directly if necessary.
    I'll now have a look on the performance.

julian and others added 5 commits September 1, 2020 13:50
Eigen constructor allows vector of vector, this clashes with one of our input
Therefore, we choose to use a different name for each input type

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
julian added 3 commits September 3, 2020 08:07
…mology

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
When it's an sparse matrix, it should be upper/lower triangular or
mirrored on the diagonal => i,j == j,i

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@MonkeyBreaker MonkeyBreaker changed the title [WIP] Add bindings for collapser Add bindings for collapser Sep 3, 2020
@MonkeyBreaker
Copy link
Collaborator

@ulupo seems good to me, on my machine everything runs smoothly :)

I don't know if you're planning on adding more on this PR.
But I was thinking on maybe disabling the computation of cocycles is we don't return them at the moment, it will make computation faster.
Or maybe we plan to use them in the future ?

@ulupo
Copy link
Collaborator

ulupo commented Sep 8, 2020

@MonkeyBreaker I just added a regression test for #465. For me this would be enough to close the PR, but since you raise a good point...

But I was thinking on maybe disabling the computation of cocycles is we don't return them at the moment, it will make computation faster.

We don't really have a good plan for cocycles yet but I'd say this is more because of lack of dedicated research time than because we thought about it and decided they're not useful. Would there be a middle way? Such as producing two separate bindings, and deciding which one to call at runtime?

In any case, do you think it would be OK to do this in another PR to get less confused?

Notice that I have added checks and warnings for when any diagonal entry is non-zero (in either sparse or dense input). I ran some tests and I was getting inconsistent results when collapsing edges, so I think that the edge collapser can give incorrect results if run on input with non-zero diagonal entries. This is why I had to introduce extra checks and changed a little the way in which the lexsort flags are used. Welcome to double check!

@MonkeyBreaker
Copy link
Collaborator

Sure, for me if all test pass, we can let the PR as if and merge it :)

About the cocycles, the "issue" is not related to the compilation it's an input runtime parameter do_cocycles. by default it's enable => meaning that cocycles are computed, but in DRFDM and DRFDMSparse they are not even retrieved after calling ripser.

Which currently means that because by default the we enable the computation of cocycles we waste a bit of runtime on it. The change I was thinking about is until we decide what to do with the cocycles, update as follow:

res = DRFDM(DParam, maxdim, thresh, coeff, do_cocycles=0)
// and
res = DRFDMSparse(
             row.astype(dtype=np.int32, order="C"),
             ...
             coeff,
             do_cocycles=0
             )

When all test passes on the pipeline I'll check locally if it's good for you ?

@ulupo
Copy link
Collaborator

ulupo commented Sep 8, 2020

@MonkeyBreaker oh ok, it's really simple then, just put a zero when the functions are called inside ripser. We can slot this into the current PR then I think!

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@ulupo ulupo merged commit 8dd14f8 into giotto-ai:master Sep 8, 2020
ulupo added a commit that referenced this pull request Oct 9, 2020
* Make path to giotto-tda logo absolute for display on PyPI (#415)

* Fix formatting in CONTRIBUTING.rst (#416)

* Fix typo in Mapper docs for n_sig_figs (#417)

* Adds FlagserPersistence (#339)

* Add pyflagser >= 0.4.0 to requirements, GH pages installation instructions, and README.rst

* Add FlagserPersistence

* Add tests for FlagserPersistence

* Add FlagserPersistence to doc index

* Add See Also entries in simplicial

* Linting in ripser_interface

* Update See also for Cubical

* Fix max_edge_length docs in other simplicial transformers

* Add tests to check max_edge and infinity_values

* Fix low infinity value bug and put dimension padding in _utils

* Add test for low infinity values

* Simplify code in homology/_utils.py

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: Umberto <umberto.lupo@gmail.com>
Co-authored-by: Wojciech Reise <reisewojtus@gmail.com>

* Introduce `nightly_release` variable in azure-pipelines.yml (#421)

* Link Giotto-tda logo to raw.githubusercontent.com

* Add FlagserPersistence to the documentation (#425)

* Refactor GraphGeodesicDistance (#422)

* Refactor GraphGeodesicDistance

- Change return type to list for compatibility with homology transformers
- Use scipy's shortest_path instead of sklearn's graph_shortest_path
- Make clearer rules on the role of zero entries, infinity entries, and non-stored values
- Add parameters directed, unweighted, and method
- Support masked arrays
- Rewrite docstring

* Fix See Also in KNeighborsGraph and TransitionGraph

* Prevent using FW algorithms when some edges are zero

- Work around scipy/scipy#12424
- Introduce user warnings when algorithm cannot be chosen automatically or set to FW
- Fix test ground truth
- Improves array conversion in GraphGeodesicDistance.transform

* Add GraphGeodesicDistance unit tests on sparse/masked array input and more dtypes

* Fix Azure CI for nightly releases (#429)

* Add *.pyo and *.pyd to .gitignore (#430)

Extend list of ignored extensions to other Python bytecode-type files

* Modify check_point_clouds to allow for sparse input (#424)

* Modify check_point_cloud to allow for sparse input

- Allow for sparse input in VietorisRipsPersistence and FlagserPersistence when metric is precomputed.
- Fix docstrings to reflect the changes.
- Fix some typos and wording issues.

* Make docstrings more consistent across simplicial transformers

* Fix reference for FlagserPersistence

* Remove warning for different point cloud embedding dimensions

* Eliminate tests for different embedding dimension warnings

* Convert list input to 3D ndarray when possible

* Update my email address

* Accept sparse inputs even in non-precomputed case for VietorisRips and SparseRips

* Cover sparse cases and more metrics in simplicial unit tests

Co-authored-by: Umberto <u.lupo@l2f.ch>

* Update collaborator email addresses (#435)

* Update @lewtun's and @ulupo's emails in CODE_AUTHORS.rst

* Update @lewtun's and @ulupo's emails in GOVERNANCE.rst

* Refactor _subdiagrams to take all homology dimensions into account (#439)

* Fix mmap settings used by joblib.Parallel (#428)

* Fix mmap settings used by joblib.Parallel in HeatKernel and PersistenceImage

* Add tests

* Slice X first inside parallel calls to _subdiagrams

* Minor simplifications in plot_diagram

* Add @NickSale to CODE_AUTHORS.rst

Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix axis ranges in `plot_diagram` (#437)

* Remove **input_layout kwargs, refactor layouts in plot_diagram

- Fix #409
- Fix calculation of the maximum filtration parameter which incorrectly included the homology dimensions
- Simplify code

* Transpose the output of PairwiseDistance to match scikit-learn convention (#420)

* Transpose output shape in PairwiseDistance.transform

* Update tests

* Simplify _parallel_pairwise

Co-authored-by: Umberto Lupo <46537483+ulupo@users.noreply.github.com>

* Replace pip install with python -m pip install in docs and Readme (#440)

* Remove incorrect assumptions in Filtering (#436)

* Remove `_sort` and refactor `filter` in gtda/diagrams/_utils.py

Arrays are no longer sorted by lifetime before filtering.

* Fix test for Filtering

* Improve documentation, begin addressing #233

* Small improvement to `_subdiagrams`

* Return figures instead of showing them (#441)

* Refactor of plotting API

- Make all functions in gtda/plotting return figures (or tuples of figures for betti_surfaces) instead of showing them
- `plot` class methods also return figures instead of showing them
- `transform_plot` and `fit_transform_plot` still show figures and only return transformed data
- Add `plot_params` kwarg throughout to allow user customisability of output figures (subtlety: one of the key can be either "trace" when the output figure only has one trace, or "traces" when it has several)

* Suppress user warnings on graph geodesic distance algorithms in tests

* Resolve overflow warnings in mapper filter tests

* Resolve numpy DeprecationWarning in test_validation

* [DOCS] Clarify expected diagram properties to fix #233 (#443)

* Allow plot_heatmap to take boolean arguments (#444)

* Add pullback cover set labels and partial cluster labels to Mapper hovertext, improve docs (#445)

* Add pullback set ID to mapper hovertext

* Add partial cluster label to Mapper hovertext

* Improve docs for plot_static_mapper_graph and plot_interactive_mapper_graph

* Fix tests after changes

* Slight wording improvement

* Further wording clarification

* Make wording on edges even clearer

* Refactor igraph.Graph output of Nerve, modify `fit` behaviour of Nerve, add `store_edge_elements` kwarg to Nerve and make_mapper_pipeline, add Nerve and ParallelClustering to docs (#447)

* Refactor igraph.Graph output, add Nerve and ParallelClustering to docs, add store_edge_elements kwarg to Nerve and make_mapper_pipeline

- Store node metadata not as a graph-level dictionary, but as vertex attributes accessible by graph.vs[attr_name][node_id] or graph.vs[node_id][attr_name] for attr_name in ["pullback_set_label", "partial_cluster_label", "node_elements"]
- Remove "node_id" from node attributes as it always coincided with the igraph.Graph node number anyway.
- Automatically store sizes of intersections as edge weights, accessible by graph.es["weight"].
- Add "store_intersections" kwarg to Nerve and make_mapper_pipeline to allow storing indices of node intersections as edge attributes, accessible via graph.es["edge_elements"].
- Simplify logic of Nerve.fit_transform code
- Change the attributes stored by Nerve.fit. Now the entire graph is stored as graph_ instead.
- Improve documentation of make_mapper_pipeline
- Expose ParallelClustering and Nerve in __init__ and docs.
- Adapt tests, mapper quickstart notebook, and mapper plotting functions.

* Add two tests for the behaviour of store_edge_elements and min_intersection

* Remove check for shape of `layout` in _calculate_graph_data

`layout` can only be a string or a callable, not an ndarray

* Improve test coverage of mapper visualization modules

* Create tests for plot_betti_surfaces and plot_betti_curves

* Add plotly_params to remaining plot methods in diagrams/representations, missed out in #441 

* Fix some linting and docstrings

* Minor improvements

* Avoid shadowing range function in plot_diagram

* Fix minimum birth bug in `plot_diagram` (#449)

* Fix bug introduced in 4bc90b2

np.max should have been np.min in plot_diagram for minimum birth and death calculation

* Clean/simplify plot_diagram further to be more ready for extended persistence and better behave under plotly HTML autoscaling

* Make `store_edge_elements` work with MapperPipeline.get_mapper_params and MapperPipeline.set_params, missed out in 4bc90b2

* Remove displayed SegmentLocal from gif hyperlink in classifying_shapes example notebook (#448)

* Remove displayed SegmentLocal from classifying_shapes example notebook

* Fix doc typos

* Pybind11 quick fix (#451)

* Fix pybind11 broken master

pybind11 checkout version v2.5.0, some issues are observed on master

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Fix bug in use of homology_dimension_ix with samplings_, rename homology_dimension_ix (#452)

* Rename homology_dimension_ix to homology_dimension_idx, fix bug

* Add normalization of persistence entropy (#450)

* Add normalization of persistence entropy via `normalize` kwarg in gtda.diagrams.PersistenceEntropy

* Use entropy from scipy.stats in gtda.diagrams.PersistenceEntropy, gtda.time_series.PermutationEntropy and gtda.mapper.Entropy

* Rename _entropy hidden method in gtda.diagrams.PersistenceEntropy

* Add tests for normalize=True

* Add `fill_nan_value` kwarg to gtda.diagrams.PersistenceEntropy

- See #450 (comment)
- Adapt pipeline tests to use this kwarg

Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix y-axis in HeatKernel plots, add default titles to plot methods (#453)

* Fix y-axis in HeatKernel.plot

* Add titles to figures generated via plot_heatmap in plot class methods

* PEP8 E133 improvements

* Refactor gtda/diagrams (#454)

* First attempt at fix of #438

* Fix y-axis in `PersistenceImage` plots, extending #453

* Represent multiplicity of persistence pairs in hovertext in `plot_diagram`

* Change reflect mode of `gaussian_filter` to "constant" from "reflect"

Affects `HeatKernel` and `PersistenceImage`

* Fix `PersistenceLandscape` plot method

- Only the figure corresponding to the first seen homology dimension was returned
- Output is now a figure with subplots, a main title, and one subtitle per plot (homology dimension)

* Improve tests for plot methods in gtda.diagrams.representations

Cover use of `plotly_params`

* Minor docstring linting

* Miscellaneous docstring improvements in gtda/diagrams

* Fix validation dictionary for `metric_params` in the case of `PersistenceImage`

* Change default value of `order` in Amplitude, from 2. to None (vector features)

* Change meaning of default None for `weight_function` in `PersistenceImage`

- ``None`` corresponding the identity means that there really is a non-trivial weighting in that case. Semantically, this does not seem correct ("None" should mean no weighting at all)

* Improve code style and clarity in plot methods in gtda.diagrams.representations

* Refactor gtda/diagrams/_metrics.py to fix several bugs

- Change computation of heat/persistence image distances and amplitudes to yield the continuum limit when `n_bins` tends to infinity.
- Make `sigma` in persistence-image-- and heat-kernel--based representations/distances/amplitudes measured in the same units as the filtration parameter (not in pixels), thus decoupling its effect from the effect of `n_bins`. Also change the default value in PairwiseDistance, Amplitude, HeatKernel and PersistenceImage from 1. to 0.1.
- Remove trivial points from diagrams before creating a sampled image in `heats` and `persistence_images`. This ensures in particular that the trivial points really give no contribution regardless of the weight function used for persistence images.
- Finish fixing #438 and similar issues with PairwiseDistance when the metric is persistence-image--based.
- Ensure `silhouettes` does not create NaNs when a subdiagram is trivial.
- Change default value of `power` in `silhouette_amplitudes`
 and `silhouette_distances` to 1., to agree with Amplitude and PairwiseDistance docs.
- Fix use of np.array_equal in `persistence_image_distances` and `heat_distances` -- so far the boolean check for equal inputs always failed due to the in-place modifications of `diagrams_1`.
- No longer store weights in `effective_metric_params_` attribute of PairwiseDistance and Amplitude when the metric is persistence-image--based.
- Remove _heat from gtda.diagrams._metrics.
- Remove identity from gtda.diagrams._utils and make default `weight_function` kwargs equal to np.ones_like everywhere to agree with default in `PersistenceImage`.
- Other style improvements (variable names, linting)

* Fix trace name when homology dimension is np.inf in `BettiCurve` and `Silhouette`

* Adapt `test_all_pts_the_same` to new behaviour of `heats_` in gtda.diagrams._metrics

* Improve test coverage of `Amplitude` and `PairwiseDistance`

- Make sure silhouettes and persistence images are covered throughout
- Cover `order` parameter throughout

* Add test of zero `weight_function` for `PersistenceImage`

* Make behaviour of `Scaler.fit` when the metric is persistence image the same as `Amplitude`

Accordingly add more combinations of metrics and metric_params in tests for Scaler

* Delete never-used `_matrix_wrapper` and `_arrays_wrapper` functions

* Remove `_pad` from gtda.diagrams._utils as it is never used

* Make `copy=True` in calls to check_diagrams in Scaler.transform and Scaler.inverse_transform

* Make `homology_dimensions_` attributes tuples instead of lists, with integers when possible

* Improve code style

* Hard-code zero array outputs by `heats` and `persistence_images` when step sizes are zero

* Add `homology_dimensions` kwarg to `_bin`

Achieves beautification of self._samplings and self.samplings_ (ints shown instead of floats) in several transformers (and saves some computation)

* Adapt choices of `min_values`, `max_values` and `sigma` in hypothesis-based tests

New meaning of sigma led to overflow issues in existing tests.

* Make all homology dimensions equal in `test_hk_big_sigma`

Also extend this test to `PersistenceImage`, and rename it accordingly

* Cover use of `plotly_params` kwarg in diagram preprocessing classes plot methods

* Extract some common logic from plot methods in gtda.diagrams.representations

* Silence expected warnings from image transformers in test_common

* Implement @wreise's suggestion to abstract away sorting and integer conversion of fit hom dims

* Add tests for `Amplitude` and `PairwiseDistance` to check that a zero weight function yields identically zero amplitudes/distances.

* Refactor `_subdiagrams` to throw informative errors on unfulfilled input properties

* Bump package dependencies to latest available in conda (#457)

* Bump dependencies to latest available in conda (or PyPI if conda unavailable)

* [CI] Bump pip version in .azure-ci/docker_scripts.sh

* Plot infinite deaths in plot_diagrams (#461)

* Plot infinite deaths in plot_diagrams

- Dynamically establish a certain value greater than the maximum observed death as the y-coordinate of infinity
- Plot points with infinite death as having this death value
- Plot a horizontal dashed line to show the position of infinity

* Add first draft of FlagserPersistence notebook

* Fix errors with use of homology_dimensions kwarg in FlagserPersistence

- Implementation was incorrect in cases where minimum in homology_dimensions is greater than 1
- Implementation would crash whenever `flagser_weighted` does not return a list of size self._max_homology_dimension + 1 - self._min_homology_dimension, as can happen with small graphs

* Fix #92

Document removal of one infinite bar in simplicial transformers

* Polish up classifying_shapes notebook

* Remove null cell from vietoris_rips_quickstart

* Upload clique complex filtration images

* Crop image

* Crop and colour images

* New versions of clique complex

* New notebook version

* Test a small version of a png

* Update notebook and pngs

* Remove plotly code

* Add SVGs for directed flag complex illustration

* Correct one SVG

* Add color fill

* Fix color fill

* Reorient one triangle

* Finish up notebook

* Remove some PNGs

* Delete legacy image

* More "correct" name for a PNG

* Reflect name change on notebook

* LaTeX formatting

* Add reference to ripser.py tutorials

* Fix handling of COO matrices by ripser mirroring scikit-tda/ripser.py#104 (#465)

* Update gudhi bindings to latest changes on August 2020 (#468)

* Pull latest changes on Gudhi-devel:giotto

* Add Eigen as a submodule

Eigen is now required for latest Gudhi version

* Add Eigen for gudhi modules that require it

* Update get_filtration and get_skeleton with the new Gudhi API

* Update ccache in CI

* Fix type issue for get_persistence in simplex_tree

* Fix persistence in periodic cubical complex

* Fix simplex tree

* Fix cubical complex

* Beautify setup.py

* Move towards fixing #307 in azure_pipelines.yml (still not addressing docker_scripts.sh)

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Change DISTNAME back form "giotto-tda" to 'giotto-tda' in setup.py (#470)

Avoids uploading nightly distributions accidentally to PyPI from CI as happened on 28.08.2020

* Change title as suggested by @lewtun, fix GitHub capitalization in all notebooks

* Unify postprocessing of diagrams between simplicial and cubical, remove one H0 inf bar in cubical

* Fix imports, reinstate reshape

* Fix slicing and tests

* Document removal of infinite bar in 0D

* Improve readability

* Place _replace_infinity_values and _pad_diagram in scope of _postprocess_diagrams

* Forfeit use of n_jobs in _postprocess_diagrams

Favour use of purely in-place operations by a logic replacing _pad_diagram

* Linting

* Remove unused joblib imports from homology/_utils.py

* Ignore SparseRipsPersistence in code cells as suggested by @lewtun

* LaTeXify indices as suggested by @lewtun

* Fix typos

* Fix typo

* Convert pycairo note from inline comments to markdown text

* Implement some of @lewtun's suggestions for classifying shapes notebook

* Migrate general persistence diagram discussion to vr notebook, create "General API" section

In partial fulfillment of @lewtun's review suggestions

* Fix VR notebook and add explanation of homology_dimensions parameter

* Make PyPI nightly uploads more robust to changes in quote types (#471)

* Make PyPI nightly uploads more robust to changes in quote types

Look for both single and double quotes when modifying DISTNAME in setup.py

* Reinstate double quotes in setup.py

* Change sed -i to sed -i.bak in macOS Azure jobs (#472)

* Adds WeakAlphaPersistence (#464)

* Add WeakAlphaPersistence (gtda.homology.simplicial)

* Change relative order of simplicial transformers

* Add tests of correctness of VietorisRipsPersistence and WeakAlphaPersistence (circle PH)

* Adapt to sklearn 0.23 API in test_common

Silences warnings

* Improve docstrings in gtda.homogy (type of X in fit and transform, and type of Xt in plot)

* Avoid dtype search and always make a float array

* Adapt WeakAlphaPersistence to changes

* Some style consistency

* Fix small bug with plot_diagrams

Trivial points were not excluded from computation of plot ranges

* Bump pyflagser version in requirements

* Improve docs for VietorisRipsPersistence by explaining how precomputed input is handled

* Improve writing following @lewtun's comments, fix hyperlinks in preparation for merge

* Add a regression test for FlagserPersistence

* Introduce a `reduced_homology` kwarg to toggle or not removal of infinite H0 bar

* Change default value of nan_fill_value in PersistenceEntropy to -1 (#474)

* Change default value of nan_fill_value in PersistenceEntropy to -1.

* Remove false URL in README (#476)

* Fix read-only error analogous to #427 in Amplitude and PairwiseDistance (#481)

* Improve naming of parallelism tests for PersistenceImage and HeatKernel

* Extend #428 to Amplitude and PairwiseDistance, add regression test, improve edge case handling in _metrics.py

* Cover n_jobs > 1 in test_features_representations.py

* Fix formatting issues in notebooks (#482)

* Add persistent homology of graphs notebook to docs

* Add Vietoris-Rips gif to examples/images and use it instead of Medium links notebooks

* Change single backticks for monospace to double backticks, recommended for Markdown

* Fix a parenthesis in classifying_shapes.ipynb

* Add bindings for collapser, refactor ripser_interface (#469)

* Add bindings for collapser

* Add collapse_edges kwarg to ripser to enable edge collapser

* Add tests for the collapser bindings

* Add tests for ripser with edge collapse

* Add regression test for #465

* Refactor ripser_interface to simplify code, improve lexsort logic, and improve detection of sparse matrix shape

* Disable computation of cocycles by default

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Enable edge collapse in VietorisRipsPersistence (#483)

* Enable edge collapse in VietorisRipsPersistence

Also cover coefficient different from 2 in tests of collapser

* Cover collapse_edges kwarg in simple VietorisRipsPersistence tests

* Rename TakensEmbedding to SingleTakensEmbedding, introduce new TakensEmbeding transformer on collections and a function for time-delay embedding parameter optimization, replace "width" parameter with "size" in Labeller and SlidingWindow, fix bug with Labeller.resampler when n_steps_future >= size - 1 (#460)

* Extract time delay embedding logic into a _time_delay_embedding function in the new file gtda.time_series._utils

* Rename TakensEmbedding to SingleTakensEmbedding

* Extract and expose code for finding optimal Takens embedding parameters, as a new takens_embedding_optimal_parameters function

* Simplify logic of resample method of SingleTakensEmbedding and SlidingWindow

* Improve tests for Stationarizer

* Fix bug in Labeller.resample when n_steps_future >= width, add test to cover this scenario

* Allow zero width (meaning 1 point) in SlidingWindow and Labeller

* Add simple test for invalid width in Labeller

* Faster implementation of _slice_windows in SlidingWindow and of SlidingWindow.transform

* Make SlidingWindow.slice_windows public, add test

* Rename "width" parameter of Labeller to "size", addressing #94

* Rename "width" parameter of SlidingWindow to "size", addressing #94 

* Create new TakensEmbedding function to deal with collections of univariate or multivariate time series

* Support list of time series input in gtda.time_series.embedding by means of a new check_collection function in gtda.utils.validation

* Fix typos in glossary.tex

* Remove plot method from SlidingWindow and place it in new TakensEmbedding

* Refactor Lorenz attractor notebook following API changes involving TakensEmbedding and SlidingWindow

* Use check_collection in CubicalPersistence

* TransformerResamplerMixin notebook, make Labeller.transform return 1D arrays (#475)

* Add notebook on TransformerResamplerMixins and time series forecasting with sliding windows + topology

* Change return shape or Labeller.transform to 1D

* Fix Labeller docstring following change in #475

* Modify `PlotterMixin.transform_plot` to give a dictionary to the call to `plot` (#484)

* Fix sample numbers in automatically-generated plot titles when using `transform_plot`

* Add test for Takens to use fit_transform_plot

* Remove papermill version from azure-pipelines

Co-authored-by: wreise <wojciech.reise@epfl.ch>

* removed check_collection from docs

* Fix references in takens and simplicial

* Change todo in weak alpha filtration reference to the section name, remove empty line from release

* Add `mode` kwarg to KNeighborsGraph, refactor input-output formats in gtda.graphs, turn outputs of KNeighborsGraph and TransitionGraph into asymmetric adjacency matrices (#478)

* Add mode kwarg to KNeighborsGraph, allow for list input in gtda.graphs, return lists instead of ragged arrays

* Turn output of TransitionGraph into a directed 0-1 adjacency matrix

* Docstring improvements in gtda.graphs

* Add check_collection again

* Reference ripser [1]

* Move the references before the full stops

* Replace missing glossry entries with TODOs, following Umberto's comment

* Docstring improvements following @gtauzin's review

* Tiny wording fix

* Improve docstring following @gtauzin's comment

* Add missing references

* Flag weak_alpha_complex as missing entry

* Reference in the documentation

* Add a target for testing notebooks in the docs

* Make html does not fail on missing versions

* Capitalize "Euclidean" in glossary.tex

* Remove spaces after references, check that publication names are in italic

* Add check_collection to gtda/utils/__init__.py (#491)

* Add check_collection to gtda/utils/__init__.py

* Improve paths in some tests

* Add option to contract Mapper vertices when pairs are in a subset relation, change layout in interactive mode (#456)

* Add contract_nodes kwarg to Nerve

- When True, igraph.Graph's contract_vertices is employed together with the limit flow by a mapping with fixed points to execute the contraction
- Add a test for this option

* Filter out nodes from same pullback set when checking intersections, to increase performance

* Ensure that setting min_intersection < 1 does not lead to fully connected graphs

* Remove unused variable name in ParallelClustering

* Make nerve parameters interactive in plot_interactive_mapper_graph, refactor get_widgets_per_params, change layouts

* Improve method_to_transform code and docstring

* Small improvements to transformer_from_callable_on_rows

* Linting throughout gtda.mapper

* Remove effective_metric_params_ from Eccentricity

* Remove unused import and sort the remaining ones

* Provide an explicit name for the substituition

* Copy images from examples to notebooks

* Remove the use of imgonverter, change the logo back to svg and add supported image types list

* Name for substitution did not work

* Wording improvements in persistent_homology_graphs.ipynb

* Typo fix in notebook

* Fix variable name error in GraphGeodesicDistance.transform

* Extend Padder and Inverter to greyscale images (#489)

* Extend Padder to greyscale images

* Extend Inverter to greyscale images

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

Co-authored-by: wreise <wojciech.reise@epfl.ch>

* Add notebook for topological time series classification to examples (#458)

* Add notebook for topological time series classification example

* Add Takens embedding GIF to images directory

* Add gravitational wave dataset

Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix links to time series classification notebook by @lewtun

* Move datasets.py and gravitational-wave-signals.npy to new data subfolder, and rename them

* Fix errors in check_point_clouds docstrings

* Try easier paths in See also

Test for @wreise

* Create captions across all notebooks

* Small wording improvement in persistent_homology_graphs.ipynb

* Change the path (sub)package in the index

* Implement double backticks consistently

* Simplify some captions

* Slightly reduce size of plots by plot_point_cloud

* Fix some errors in make_mapper_pipeline docs

* Fix typo in Nerve docs

* Change citation style for consistency

* Remove new lines in array printouts in docs

* Make citation style more consistent throughout

* Reorder notebooks, modularize Makefile

* Reformat the references

* More link and backtick fixes in mapper_quickstart

* More RST fixes in notebooks

* Fix linting

* Fix an error in WeakAlphaPersistence Notes

* Fix URLs for GUDHI

* Fix TakensEmbedding docstrings

* Linting

* Try changing some SVG attributes for better display

* Further SVG improvements

* Update CODE_AUTHORS.rst (#497)

* More Mapper tests, small change to `max_fraction` in FirstSimpleGap and FirstHistogramGap (#412)

* Add first tests for plot_interactive_mapper_graph

* Change deprecated 'overflow_y' to 'overflow' property in_logging, and remove unnecessary warning catching

* Avoid name clashes with Python built-ins throughout mapper/tests

* Slightly change meaning of `max_fraction` in FirstSimpleGap and FirstHistogramGap

Make the default 1. instead of None, and give it a simpler interpretation: (the floor of) max_fraction * n_samples is the maximum number of clusters the algorithm can return

* Implement a looser criterion for the hk_pi_big_sigma test

Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix some warnings seen in tests (#498)

- `continuous_updates` is deprecated from ToggleButton
- `np.stack` should not take generators
- should explicitly pass dtype=object for ragged arrays

* Add the giotto-tda icon in the browser tab (#500) (favicon)

* Add NumberOfPoints (#496)

* Add NumberOfPoints

* Make See alsos more consistent throughout diagrams.features

Sined-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: wreise <wojciech.reise@epfl.ch>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix and improve validate_params (#502)

* Fix validate_params bug and improve behaviour when both numeric and list-like types are allowed

* Add a test

* Fix mistake in validate_params docs

* Pybind11 as a git submodule (#459)

* Add MODULE keyword in CMakeLists.txt to explicitly show expectation

* Update collapser to latest standard in the CMake file

* Add pybind11 as a submodule

* Remove downloading pybind11 from setup.py

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Fix metric_params documentation in KNeighborsGraph

* Improve ripser bindings performance (#501)

* Update ripser & collapser bindings to allow pass by reference with numpy

* Update making the distance matrix triangular more memory friendly

* Remove unnecessary dict inherited from ripser.py cython

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

* Add MNIST image classification example notebook (#477)

* Add MNIST image classification/full blown ML example notebook

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Add ComplexPolynomial (#479)

* Add ComplexPolynomial

Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>
Co-authored-by: wreise <wojciech.reise@epfl.ch>

* Fix precomputed behaviour in KNeighborsGraph (#506)

* Fix precomputed behaviour in KNeighborsGraph

- Avoid warnings when metric is passed as 'precomputed'
- Fix errors in docstrings and improve wording
- Improve tests

* Fix #499 by covering case of list input in CubicalPersistence.fit (#503)

* Improve memory use in ripser_interface (#507)

Implement suggestions in #501 (comment)
- Use scipy's squareform function for fast extraction of the upper diagonal part of dm

* Add DensityFiltration (#473)

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Add metaestimators subpackage with CollectionTransformer meta-estimator (#495)

* Add gtda/metaestimators with CollectionTransformer meta-estimator 

* Fix docstring for make_mapper_pipeline

* Cchange parallel_backend_prefer default to None in ParallelClustering and make_mapper_pipeline

* Cross-reference two time series notebooks with See also

* Improve time series classification notebook by making use of CollectionTransformer

* Temp fix for macOS CI failures by preventing automatic brew cleanup (#513)

Also make mapper tests more lenient to reduce number of flaky tests

Co-authored-by: ulupo <umberto.lupo@googlemail.com>

* Add a curves subpackage and a StandardFeatures transformer to extract features from them (#480)

* Add curves subpackage with StandardFeatures

* Reshape output of PersistenceLandscape so that it's a multi channel curve

* Add curves to main init and to rst docs

* Add Feature extraction subtitle in curves.rst

* Split simplicial homology into undirected and directed subsections

* Fix typo in validate_params

* Raise deadlines for some mapper tests

* Remove notebook tests even in macOS jobs

Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai>
Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>

* Add curves.Derivative (#492)

* Add curves.Derivative

* Fix curves.StandardFeatures docs

Signed-off-by: Guillaume Tauzin <guillaume.tauzin@inait.ai>

Co-authored-by: Umberto Lupo <umberto.lupo@gmail.com>
Co-authored-by: wreise <wojciech.reise@epfl.ch>

* Raise test coverage and make ParallelClustering clusterer parameter required (#508)

* Improve test coverage of mapper visualization tools

* Make pandas part of test requirements in setup.py

* Add pandas installation in Linux Azure jobs

* Avoid notebooks checks even in macOS unless notebooks_checks is true

* Make clusterer a required parameter in ParallelClustering, add ParallelClustering tests

* Change kind of error for bad transformers in CollectionTransformer

* Improve check_diagrams and its tests

* Use validate_params in plot_point_cloud

* Cover plotting of diagrams with infinite deaths in simplicial tests

* Prepare for v0.3.0 release (#516)

* Bump version to 0.3.0

* Add release notes

* Minor corrections in notebooks

Co-authored-by: wreise <wojciech.reise@epfl.ch>

Co-authored-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: Wojciech Reise <reisewojtus@gmail.com>
Co-authored-by: Wojciech Reise <32167173+wreise@users.noreply.github.com>
Co-authored-by: Umberto <u.lupo@l2f.ch>
Co-authored-by: Nicholas Sale <nicholas.j.sale@gmail.com>
Co-authored-by: MonkeyBreaker <julian.burellaperez@heig-vd.ch>
Co-authored-by: REDS institute <reds-heig@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: wreise <wojciech.reise@epfl.ch>
Co-authored-by: ulupo <umberto.lupo@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants