Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag persistence generators #29

Merged

Conversation

ulupo
Copy link
Collaborator

@ulupo ulupo commented Oct 18, 2021

This PR add the possibility to retrieve representative simplicies.

  • : Compute and retrieve finite representative for dimensions > 0
  • : Compute and retrieve essential representative for dimensions > 0
  • : Compute and retrieve finite representative for dimension == 0
  • : Compute and retrieve essential representative for dimension == 0
  • : Add support in the bindings for retrieving representative simplicies
  • : Add support in Python to retrieve and output representative simplicies
    • : Document the output regarding representative simplicies
    • : Throw exception if collapser and representative simplicies are enable at the same time
  • : Update documentation about representative simplicies
  • : Update example basic tutorial to show example of this feature

Copy link
Collaborator Author

@ulupo ulupo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, we always return edges that are in the lower diagonal of the distance matrix, i.e. edges [i, j] such that i > j. Is this true consistently? Especially between the case of dimension 0 and of higher dimensions?

@MonkeyBreaker
Copy link
Collaborator

MonkeyBreaker commented Oct 19, 2021

Yes, we always return edges [i, j] where we i > j.
Quick question, this order doesn't imply that we are in the upper diagonal instead (i for column and j for row)? Or did I misunderstand something ?

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 19, 2021

(i for column and j for row)

We are playing with fire here :) I suggest we stick to i indexing rows and j columns as this is virtually universal in mathematics and Python.

@MonkeyBreaker
Copy link
Collaborator

Oh !
Thank you for the correction. Then yes we stick to i for rows and j for columns.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

gph/src/ripser.h Outdated Show resolved Hide resolved
@ulupo
Copy link
Collaborator Author

ulupo commented Oct 24, 2021

@MonkeyBreaker You probably have noticed that there is repeated logic between the 0-dimensional fix and the code for link in the union_find class. An idea would be to make link return what is now called birth_idx instead of being void. Also because I think the check

 if (get_diameter(e) != 0) { 

might be wrong.

@MonkeyBreaker
Copy link
Collaborator

@ulupo about returning the index from the link function, it should be doable.
I just need to think to implement it in a way it makes sense when you will read the function.

About the check if (get_diameter(e) != 0) {, I am not sure I understand the issue.
It just ensure that the diameter of the edge is not 0. I am not sure I understand the relation with the vertices, we could maybe have a call ?

gph/src/ripser.h Outdated Show resolved Hide resolved
gph/src/ripser.h Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@ulupo ulupo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a draft suggestion to explain what I mean

gph/src/ripser.h Outdated Show resolved Hide resolved
@ulupo ulupo added this to In progress in v0.2.0 via automation Oct 25, 2021
@ulupo ulupo linked an issue Oct 25, 2021 that may be closed by this pull request
gph/src/ripser.h Show resolved Hide resolved
gph/src/ripser.h Outdated Show resolved Hide resolved
@ulupo ulupo added the enhancement New feature or request label Oct 25, 2021
Copy link
Collaborator Author

@ulupo ulupo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test that covers sparse matrices with non-zero values along the diagonal. Here we can also test the finite bars in dimension zero and that the birth indices are correct there. Here it is important that the input is valid for a filtration. For example we can create random diagonal entries between 0 and 1, and ensure all off-diagonal entries are greater than 1.

@MonkeyBreaker
Copy link
Collaborator

We should add a test that covers sparse matrices with non-zero values along the diagonal. Here we can also test the finite bars in dimension zero and that the birth indices are correct there. Here it is important that the input is valid for a filtration. For example we can create random diagonal entries between 0 and 1, and ensure all off-diagonal entries are greater than 1.

This should be addressed in the latest commit

gph/test/test_ripser.py Outdated Show resolved Hide resolved
gph/src/ripser.h Show resolved Hide resolved
Copy link
Collaborator Author

@ulupo ulupo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: we should avoid using the word "representatives" for what is computed in this PR. We can use "persistence pairs" or "simplex pairs" or "persistence generators"

@MonkeyBreaker
Copy link
Collaborator

Well I used "representatives" from the beginning.
But if there is a better name, lets go for it.
I just need to know to update everything.

For me the most important point is the name in the dictionary as output in ripser_parallel. I would like to keep short, 4-5 characters maximum. Any suggestion ?

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 26, 2021

I just need to know to update everything.

I took the liberty of renaming ret_representative_simplices to return_generators and renaming the key to 'gens' and rewriting the docs on this part a little.

gph/src/ripser.h Outdated Show resolved Hide resolved
gph/src/ripser.h Show resolved Hide resolved
@ulupo
Copy link
Collaborator Author

ulupo commented Oct 29, 2021

Another note: res['gens'][1] and res['gens'][3] should always have length maxdim. Currently, they can be empty lists if there is no content. Instead, we should always return a list of length maxdim with empty ndarrays if necessary in some (or even all) positions.

@MonkeyBreaker I'm guessing this can be fixed at the level of the bindings?

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 31, 2021

During work on this PR #29, @MonkeyBreaker and I realized that the current core logic in the matrix reduction algorithm makes it so that we break the following promise to the user: that passing a dense matrix with a few np.inf will lead to exactly the same output as passing a sparse version where the np.inf entries are just missing entries. This was never noticed at the level of barcodes because a "finite" bar with death equal to infinity looks just the same as an infinite bar with the same birth. But flag generators don't lie: we now end up telling the user there are never any generators for essential classes if the matrix is dense, no matter what the matrix is or represents.

Because changing this requires potentially delicate changes to the reduction routine, we decided to discuss what to do about this separately, and open a separate PR if necessary after this one is merged.

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 31, 2021

In view of #36 , we need to update the imports in the two notebooks here.

@MonkeyBreaker
Copy link
Collaborator

In view of #36 , we need to update the imports in the two notebooks here.

Before doing the merge, we will need to rebase to main and then we will be able to merge, otherwise as you point out, we will have conflicts

ulupo and others added 13 commits October 31, 2021 19:14
Fix that no value in the diagonal should be bigger than a non diagonal value

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
…lupo

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Refactor slighly compute_dim_0

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
If get_youngest_edge_simplex does not implement the correct logic it could per example return [4, 0, 5, 4]

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
@MonkeyBreaker MonkeyBreaker force-pushed the feature/return_flag_persistence_generators branch from 25fd98f to 0abb23b Compare October 31, 2021 18:23
@MonkeyBreaker
Copy link
Collaborator

MonkeyBreaker commented Oct 31, 2021

Another note: res['gens'][1] and res['gens'][3] should always have length maxdim. Currently, they can be empty lists if there is no content. Instead, we should always return a list of length maxdim with empty ndarrays if necessary in some (or even all) positions.

It seems weird to me, the C++ core has the same logic as for barcodes (barcodes have a maxdim+1 arrays of barcodes because you include dim0). The only difference is that you have maxdim as size of res['gens'][1] and res['gens'][3] , because dimension 0 is not included.

I need to verify if this is not the case

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 31, 2021

The only difference is that you have maxdim as size of res['gens'][1] and res['gens'][3] , because dimension 0 is not included.

I don't think this can be the case if maxdim means the same as in Python. Example: if maxdim=1 (compute up to and including dimension 1) then the list has length 1, not 1-1=0.

@MonkeyBreaker
Copy link
Collaborator

I don't think this can be the case if maxdim means the same as in Python. Example: if maxdim=1 (compute up to and including dimension 1) then the list has length 1, not 1-1=0.

Well when you compute maxdim=1 you have dimension 0 and dimension 1 => 2-1 = 1, no ?

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 31, 2021

Oh I'm sorry @MonkeyBreaker , I thought you said maxdim - 1 (because it used to say that in the docstrings), but you didn't! We said the same thing, i.e. that res['gens'][1] and res['gens'][3] should have length maxdim.

@MonkeyBreaker
Copy link
Collaborator

Oh I'm sorry @MonkeyBreaker , I thought you said maxdim - 1 (because it used to say that in the docstrings), but you didn't! We said the same thing, i.e. that res['gens'][1] and res['gens'][3] should have length maxdim.

Does it work as expected then ? Do we have an issue in the documentation ? (In the documentation we are right in that we return maxdim for [1] and [3].

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 31, 2021

Does it work as expected then ?

I think I was able to produce examples which returned empty lists when maxdim=1, but can't remember how. Maybe square_dm?

@MonkeyBreaker
Copy link
Collaborator

Maybe square_dm?

Just tried with maxdim=1 works as expected

gph/python/test/test_ripser.py Outdated Show resolved Hide resolved
@ulupo ulupo marked this pull request as ready for review November 1, 2021 09:18
@ulupo ulupo merged commit ec45de5 into giotto-ai:main Nov 1, 2021
v0.2.0 automation moved this from In progress to Done Nov 1, 2021
@ulupo ulupo deleted the feature/return_flag_persistence_generators branch November 1, 2021 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

Return simplex pairings and essential simplices
2 participants