Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Persistent Homology Computation Backend for VR #1

Merged
merged 74 commits into from
Jul 2, 2021

Conversation

MonkeyBreaker
Copy link
Collaborator

@MonkeyBreaker MonkeyBreaker commented May 31, 2021

This PR add the first version of the HPC backend for compute PH on VR filtration.
This backend is based on RIpser and Ripser Lock-free version.

The Python Interface is the one used in giotto-tda, which is based on Ripser.py Python interface.

Remaining tasks:

  • : Expose num_threads parameter in Python
  • : ripser.cpp -> ripser.h, move the function to the bindings
  • : Move pybind11 to ext folder
  • : Move rips_dm and rips_dm_sparse to the bindings, change prototype to directly support numpy arrays (like collapser)
  • : Remove the hash implementation from ripser.h, put it in a folder with common code
  • : Move sorting algorithm from ripser.h, put it in a folder with common code
  • : Update Documentation
    • : Add documentation for hash_map current limitations (maximal size, no negative
      index, up to (2^64)-2 supported simplices, does not support 0 index I hack around by increasing all indexes in the hash_map by +1)
    • : Add to the building instruction for developers that a C++14 needs to be supported by the compiler.
  • : Update Changelog
  • : Update version
  • : Test ? (of course !)
    • : Added test for varying the number of threads and keep same number of threads
  • : Naming of the classes, Python, etc.
    • : As suggested by Matteo, rename ripser into parallelRipser, is it fine for everyone ?
  • : Add support to modules > 2
  • : Add check in Python for values in bounds with np.float32 (use .astype(np.float32))
    • Add note in the documentation that the input data should be in range of float32.
  • : Remove enclosing radius from the C++
    • : Perform enclosing radius before calling collapser when input data is a distance matrix
  • : Add Umberto Feedback in Python interface as discussed (21.06.21)
    • : Remove everything related to greedy permutation
    • : Rename num_threads to n_threads to follow giotto-tda naming convention
    • : Only keep at the output dgms
    • : Refactor weights related stuff in ripser
    • : Use new names for functions calling bindings
  • : Add support to n_threads -1 value in order to use the maximal number of threads available.

julian added 30 commits May 31, 2021 13:46
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* Fix windows compilation
* Fix hash_map when indices have a value of (0, 0)
* Remove optimization for max dimension, we return the number of dimensions as input
* Essential pairs prevent adding one with wrong value

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* Disable Thresh optimization due to unexpected results with negative edges

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* Fix a test sorting an unexpected value

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Currently experience segmentation faults ...

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* Uniformed chunk size computation
* Use index_pivot instead of pivot (primitive type vs class)

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
…ilable

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* remove wrong fix in compute_pairs
* Only sort and create hashmap if at least one column is present
* compressed_distance_matrix from class to struct

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
julian added 4 commits June 29, 2021 22:01
… function ...

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@ulupo
Copy link
Collaborator

ulupo commented Jun 30, 2021

In terms of adapting the giotto-tda tests to make tests for the weights here, you could start by adapting this (there are probably small syntax errors as I have not tested it)

X_pc = np.array([[2., 2.47942554],
                 [2.47942554, 2.84147098],
                 [2.98935825, 2.79848711],
                 [2.79848711, 2.41211849],
                 [2.41211849, 1.92484888]])

X_dist = squareform(pdist(X_pc))

X_pc_sparse = csr_matrix(X_pc)
X_dist_sparse = csr_matrix(X_dist)

X_dist_disconnected = np.array([[0, np.inf], [np.inf, 0]])

X_vrp_exp = [
    np.array([[0., 0.43094373],
              [0., 0.5117411],
              [0., 0.60077095],
              [0., 0.62186205]]),
    np.array([[0.69093919, 0.80131882]])
    ]


def test_wrp_notimplemented_string_weights():
    with pytest.raises(ValueError, match="'foo' passed for `weights` but the "
                                         "only allowed string is 'DTM'"):
        ripser(X_pc, weights="foo")


def test_wrp_notimplemented_p():
    with pytest.raises(NotImplementedError):
       ripser(X_pc, weights="DTM", weight_params={"p": 1.2})


@pytest.mark.parametrize("X, metric", [(X_pc, "euclidean"),
                                       (X_pc_sparse, "euclidean"),
                                       (X_dist, "precomputed"),
                                       (X_dist_sparse, "precomputed")])
@pytest.mark.parametrize("weight_params", [{"p": 1}, {"p": 2}, {"p": np.inf}])
@pytest.mark.parametrize("collapse_edges", [True, False])
@pytest.mark.parametrize("thresh", [np.inf, 0.8])
def test_wrp_same_as_vrp_when_zero_weights(X, metric, weight_params,
                                           collapse_edges, thresh):
    X_wrp = ripser(weights=lambda x: np.zeros(x.shape[0]),
                 weight_params=weight_params,
                 metric=metric,
                 collapse_edges=collapse_edges,
                 thresh=thresh)

    for i in range(2):
        assert_almost_equal(X_wrp[i], X_vrp_exp[i])


X_dtm_exp = {1: [np.array([[0.95338798, 1.474913],
                           [1.23621261, 1.51234496],
                           [1.21673107, 1.68583047],
                           [1.30722439, 1.73876917]]),
                 np.array([[]])],
             2: [np.array([[0.95338798, 1.08187652],
                           [1.23621261, 1.2369417],
                           [1.21673107, 1.26971364],
                           [1.30722439, 1.33688354]]),
                 np.array([[]])],
             np.inf: [np.array([[]]),
                      np.array([[]])]}


@pytest.mark.parametrize("X, metric", [(X_pc, "euclidean"),
                                       (X_pc_sparse, "euclidean"),
                                       (X_dist, "precomputed"),
                                       (X_dist_sparse, "precomputed")])
@pytest.mark.parametrize("weight_params", [{"p": 1}, {"p": 2}, {"p": np.inf}])
@pytest.mark.parametrize("collapse_edges", [True, False])
def test_dtm(X, metric, weight_params, collapse_edges):
    X_dtm = ripser(X, weights="DTM", weight_params=weight_params,
                   metric=metric, collapse_edges=collapse_edges)
    
    for i in range(2):
        assert_almost_equal(X_dtm[i], X_dtm_exp[weight_params["p"]][i])

julian added 4 commits June 30, 2021 13:24
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Honestly I do not understand why it failing ...

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@MonkeyBreaker
Copy link
Collaborator Author

@ulupo, thank you for the code snippet.
I rework it in order to pass !
I needed to do some adaptations, I made the most obvious one:

  • Add a function that do similar steps than _postprocess_diagrams method in WeightedRipsPersistence.
  • It was weird but in the test for DTM, two empty arrays did not have the same shape, in order to not add to much the code, I just check if both arrays are empty, I do not run the assert.

@ulupo
Copy link
Collaborator

ulupo commented Jun 30, 2021

Thanks @MonkeyBreaker, and sorry I made you do this extra work. I can suggest to replace np.array([[]]) with np.empty((0, 2)) which will have the right shape, and to put np.inf instead of the threshold values in the *_exp arrays so you don't have to code the replace inf function.

@MonkeyBreaker
Copy link
Collaborator Author

Let me give it a shot, if it can make things cleaner, better to go this way :).

julian and others added 7 commits June 30, 2021 16:43
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
* Adapt test for wrp because of precision
* When no threshold provided and input is not sparse, compute enclosing radius and represent data in a dense way
* When threshold is provided and input is sparse, don't compute enclosing radius and represent data in a sparse way

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
…thread

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@MonkeyBreaker MonkeyBreaker mentioned this pull request Jul 2, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MonkeyBreaker MonkeyBreaker changed the title [WIP] Add Persistent Homology Computation Backend for VR Add Persistent Homology Computation Backend for VR Jul 2, 2021
@MonkeyBreaker MonkeyBreaker merged commit 37f2b17 into main Jul 2, 2021
@MonkeyBreaker MonkeyBreaker deleted the c++_backend branch July 2, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants