Output floating point types #19

ulupo · 2021-08-05T16:21:29Z

It seems that we output barcodes with 64-bit floating-point type. I think that, as things stand, this could be a little misleading because we internally convert to 32-bit representations and hence lose precision. Is there a reason I am missing why outputting doubles is preferrable?

MonkeyBreaker · 2021-08-06T06:13:18Z

Good remark.

By default, numpy uses a 64-bit precision for float representation.
We need to currently cast input matrices to 32-bit, because the C++ core currently only supports 32-bit floating point representation.
For the output, If I am not mistaken, this line cast implicitly from 32-bit to 64-bit for float representation in the Python interface.
The cast from 32-bit to 64-bit should be safe in terms of precision.

I don't think it is a bad idea to return to users the default type for numpy.float from the Python interface. But, I do not have a strong reason to not return 32-bit as how it is currently returned from the C++ core.

ulupo · 2021-08-06T08:35:16Z

I don't think it is a bad idea to return to users the default type for numpy.float from the Python interface. But, I do not have a strong reason to not return 32-bit as how it is currently returned from the C++ core.

The reason I think it is not great to upcast downcasted floats is that users who pass distance matrices (as 64-bit arrays) might decide to "match" distance matrix values with the output bars. If 64-bit, they would be well-justified to expect that every birth and death corresponds exactly to an entry in the distance matrix. If 32-bit, they would at least have an indirect hint as to what might have happened.

Incidentally, this is related to another issue, namely that a possible enhancement would be to add a 64-bit implementation and allow users to pass a precision kwarg (default: single, but with the possibility of passing double).

MonkeyBreaker · 2021-08-06T08:48:04Z

The reason I think it is not great to upcast downcasted floats is that users who pass distance matrices (as 64-bit arrays) might decide to "match" distance matrix values with the output bars. If 64-bit, they would be well-justified to expect that every birth and death corresponds exactly to an entry in the distance matrix. If 32-bit, they would at least have an indirect hint as to what might have happened.

Good argument about trying to match the output with the input ! Definitely, if they try to match and because we outputs 64-bit float, they would expect to find the exact same values they used as input. But, because of the downcast, this values are slightly different.

Incidentally, this is related to another issue, namely that a possible enhancement would be to add a 64-bit implementation and allow users to pass a precision kwarg (default: single, but with the possibility of passing double).

This shall not be in my opinion too much work, it is adding a template argument to the ripser function in the C++. Currently I am on a different project, but if this should be a must to have, then I can see to do it earlier.

ulupo · 2021-08-06T08:51:47Z

if this should be a must to have, then I can see to do it earlier.

I think there are other features that would be cooler to have first, but they are harder (cocycles, persistence pairs, isolate edge collapser). Maybe we should make a kanban for these? Anyway, if it is really that simple, why not add it I say.

MonkeyBreaker · 2021-08-06T09:12:45Z

As always, it seems simple, maybe some issues will be encountered ...
But even if it seems easy, we should add corresponding test in order to ensure that what we add works when expected.
This alone takes some time.

Maybe we should make a kanban for these?

Definitely, but if possible, if we can centralize this in Github directly, it would be really cool !

ulupo · 2021-08-11T14:47:02Z

Definitely, but if possible, if we can centralize this in Github directly, it would be really cool !

Done :)

MonkeyBreaker · 2021-08-12T06:46:43Z

Nice, thank you !

ulupo added this to To do in v0.2.0 Aug 11, 2021

ulupo mentioned this issue Nov 2, 2021

[BUG] Dead comment in compute_dim_0_pairs/incorrect removal of edges with zero filtration value #31

Closed

ulupo removed this from To do in v0.2.0 Nov 3, 2021

ulupo added this to To do in v0.3.0 via automation Nov 3, 2021

MonkeyBreaker linked a pull request Nov 14, 2021 that will close this issue

Small codebase updates #50

Merged

7 tasks

MonkeyBreaker moved this from To do to In progress in v0.3.0 Dec 4, 2021

ulupo closed this as completed in #50 Dec 6, 2021

v0.3.0 automation moved this from In progress to Done Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output floating point types #19

Output floating point types #19

ulupo commented Aug 5, 2021 •

edited

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021 •

edited

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021

MonkeyBreaker commented Aug 6, 2021 •

edited

ulupo commented Aug 11, 2021 •

edited

MonkeyBreaker commented Aug 12, 2021

Output floating point types #19

Output floating point types #19

Comments

ulupo commented Aug 5, 2021 • edited

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021 • edited

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021

MonkeyBreaker commented Aug 6, 2021 • edited

ulupo commented Aug 11, 2021 • edited

MonkeyBreaker commented Aug 12, 2021

ulupo commented Aug 5, 2021 •

edited

ulupo commented Aug 6, 2021 •

edited

MonkeyBreaker commented Aug 6, 2021 •

edited

ulupo commented Aug 11, 2021 •

edited