Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Dead comment in compute_dim_0_pairs/incorrect removal of edges with zero filtration value #31

Closed
ulupo opened this issue Oct 22, 2021 · 4 comments · Fixed by #43
Closed
Labels
bug Something isn't working
Projects

Comments

@ulupo
Copy link
Collaborator

ulupo commented Oct 22, 2021

Am i right to think that this comment

giotto-ph/gph/src/ripser.h

Lines 936 to 937 in c7f5d46

// TODO: Get correct birth times if the edges are negative (required for
// lower star)

is referring to an issue that is now resolved?

@ulupo
Copy link
Collaborator Author

ulupo commented Oct 22, 2021

Actually I now worry that this is still unaddressed, because of

if (get_diameter(e) != 0) {

@ulupo ulupo mentioned this issue Oct 24, 2021
10 tasks
@ulupo
Copy link
Collaborator Author

ulupo commented Oct 24, 2021

Continuing from #29 (comment)

About the check if (get_diameter(e) != 0) {, I am not sure I understand the issue.
It just ensure that the diameter of the edge is not 0.

There is an important conceptual difference between ripser.py and original ripser, and it is that original ripser assumes all inputs are matrices with zero along the diagonal and non-negative off-diagonal entries. This corresponds to assuming all inputs are distance matrices, since there cannot be negative distances and the distance of a point from itself is zero for any reasonable notion of distance. Original ripser in dimension 0 allows for the possibility that some off-diagonal distances are zero, but if that is the case it does not create a bar (because it would be a trivial bar which is born at 0 and dies at 0). This is enforced by the if (get_diameter(e) != 0) {, check.

But this is conceptually incorrect for us and for ripser.py. Our assumptions for the input are just that entry (i, j) is greater than or equal to both entry (i, i) and entry (j, j). There should be nothing special about zero off-diagonal entries and we should even allow for negative entries (on or off-diagonal). I think we already do this correctly in higher dimensions, but it seems to me that there might still be issues in dimension 0.

There is a unit test that really should be added to this repository and it is the following:

  1. Create a dense distance matrix dm by running squareform(pdist(X)) where X is a point cloud. Compute its diagrams dgms (up to dimension 1 or 2).
  2. Create dm_neg_vert = dm - 1. Compute its diagrams dgms_neg_vert. Check that dgms_neg_vert[d] = dgms[d] - 1 for all dimensions d.
  3. Check that dm[0, 1] != 0 and create dm_one_zero_edge = dm - dm[0, 1]. Compute its diagrams dgms_one_zero_edge. Check that dgms_one_zero_edge [d] = dgms_one_zero_edge[d] - dm[0, 1] for all dimensions d.

I suspect that we currently fail test 3 in dimension 0.

@ulupo
Copy link
Collaborator Author

ulupo commented Nov 2, 2021

@MonkeyBreaker here's a complete test demonstrating that there is a bug:

import numpy as np
from scipy.spatial.distance import pdist, squareform
from gph import ripser_parallel

np.random.seed(0)

maxdim = 0
kwargs = {"metric": "precomputed", "maxdim": maxdim}

shape = (5, 4)

X = np.random.random(shape)
dm = squareform(pdist(X))

dgms_orig = ripser_parallel(dm, **kwargs)["dgms"]

offset = 1
dm_neg_vert = dm - offset
dgms_neg_vert = ripser_parallel(dm_neg_vert, **kwargs)["dgms"]

edge_val = dm[0, 1]
assert edge_val

dm_one_zero_edge = dm - edge_val

dgms_one_zero_edge = ripser_parallel(dm_one_zero_edge, **kwargs)["dgms"]

for dim in range(maxdim + 1):
    print(f"""# DIMENSION {dim}

Original dm:
{dgms_orig[dim]}
Subtract {offset} from dm, and add {offset} back to the diagram:
{dgms_neg_vert[dim] + np.float32(offset)}
Subtract dm[0, 1] from dm (make one edge zero), and add dm[0, 1] back to the diagram:
{dgms_one_zero_edge[dim] + np.float32(edge_val)}
    
    """)

All diagrams should be identical, but the last one has one fewer bar (!) because we removed an edge just because it was zero. Furthermore, it has incorrect birth times. (see edit below)

I'm thinking we have TWO issues here: the TODO comment is not dead (there is an issue with birth values clearly), AND we should not remove edges just because they are zero. The second issue is solved in this example by removing the if (get_diameter(e) != 0) check as I mentioned above, but not the first! (see edit below)

@ulupo
Copy link
Collaborator Author

ulupo commented Nov 2, 2021

EDIT TO PREVIOUS MESSAGE: It turns out the wrong birth times were just about float64 to float32 conversions, the test works now that I've added float32 casting explicitly. This reminds me that we really should address #19!

@MonkeyBreaker MonkeyBreaker moved this from To do to In progress in v0.2.0 Nov 4, 2021
@ulupo ulupo changed the title Dead comment? [BUG] Dead comment in compute_dim_0_pairs/incorrect removal of edges with zero filtration value Nov 8, 2021
ulupo added a commit to ulupo/giotto-ph that referenced this issue Nov 8, 2021
ulupo added a commit to ulupo/giotto-ph that referenced this issue Nov 8, 2021
@ulupo ulupo closed this as completed in #43 Nov 8, 2021
v0.2.0 automation moved this from In progress to Done Nov 8, 2021
ulupo added a commit that referenced this issue Nov 8, 2021
* Remove check for zero-diameter edge in compute_dim_0_pairs

Fixes issue #31

* Avoid get_dense_distance_matrices producing overflow warnings

This was due to adding non-finite diagonal elements to themselves when doing dm + dm.T, but we can avoid having those elements in the first place

* Add unit test/regression test for equivariance

* [CPP] Add inline comment to explain divergence from @ubauer's ripser

* [CPP] code style changes after @MonkeyBreaker's review

* Add separate regression test for matrix of the kind in issue #31

Also revert import ordering change as suggested by @MonkeyBreaker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
1 participant