Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output floating point types #19

Closed
ulupo opened this issue Aug 5, 2021 · 7 comments · Fixed by #50
Closed

Output floating point types #19

ulupo opened this issue Aug 5, 2021 · 7 comments · Fixed by #50
Projects

Comments

@ulupo
Copy link
Collaborator

ulupo commented Aug 5, 2021

It seems that we output barcodes with 64-bit floating-point type. I think that, as things stand, this could be a little misleading because we internally convert to 32-bit representations and hence lose precision. Is there a reason I am missing why outputting doubles is preferrable?

@MonkeyBreaker
Copy link
Collaborator

Good remark.

By default, numpy uses a 64-bit precision for float representation.
We need to currently cast input matrices to 32-bit, because the C++ core currently only supports 32-bit floating point representation.
For the output, If I am not mistaken, this line cast implicitly from 32-bit to 64-bit for float representation in the Python interface.
The cast from 32-bit to 64-bit should be safe in terms of precision.

I don't think it is a bad idea to return to users the default type for numpy.float from the Python interface. But, I do not have a strong reason to not return 32-bit as how it is currently returned from the C++ core.

@ulupo
Copy link
Collaborator Author

ulupo commented Aug 6, 2021

I don't think it is a bad idea to return to users the default type for numpy.float from the Python interface. But, I do not have a strong reason to not return 32-bit as how it is currently returned from the C++ core.

The reason I think it is not great to upcast downcasted floats is that users who pass distance matrices (as 64-bit arrays) might decide to "match" distance matrix values with the output bars. If 64-bit, they would be well-justified to expect that every birth and death corresponds exactly to an entry in the distance matrix. If 32-bit, they would at least have an indirect hint as to what might have happened.

Incidentally, this is related to another issue, namely that a possible enhancement would be to add a 64-bit implementation and allow users to pass a precision kwarg (default: single, but with the possibility of passing double).

@MonkeyBreaker
Copy link
Collaborator

The reason I think it is not great to upcast downcasted floats is that users who pass distance matrices (as 64-bit arrays) might decide to "match" distance matrix values with the output bars. If 64-bit, they would be well-justified to expect that every birth and death corresponds exactly to an entry in the distance matrix. If 32-bit, they would at least have an indirect hint as to what might have happened.

Good argument about trying to match the output with the input ! Definitely, if they try to match and because we outputs 64-bit float, they would expect to find the exact same values they used as input. But, because of the downcast, this values are slightly different.

Incidentally, this is related to another issue, namely that a possible enhancement would be to add a 64-bit implementation and allow users to pass a precision kwarg (default: single, but with the possibility of passing double).

This shall not be in my opinion too much work, it is adding a template argument to the ripser function in the C++. Currently I am on a different project, but if this should be a must to have, then I can see to do it earlier.

@ulupo
Copy link
Collaborator Author

ulupo commented Aug 6, 2021

if this should be a must to have, then I can see to do it earlier.

I think there are other features that would be cooler to have first, but they are harder (cocycles, persistence pairs, isolate edge collapser). Maybe we should make a kanban for these? Anyway, if it is really that simple, why not add it I say.

@MonkeyBreaker
Copy link
Collaborator

MonkeyBreaker commented Aug 6, 2021

As always, it seems simple, maybe some issues will be encountered ...
But even if it seems easy, we should add corresponding test in order to ensure that what we add works when expected.
This alone takes some time.

Maybe we should make a kanban for these?

Definitely, but if possible, if we can centralize this in Github directly, it would be really cool !

@ulupo ulupo added this to To do in v0.2.0 Aug 11, 2021
@ulupo
Copy link
Collaborator Author

ulupo commented Aug 11, 2021

Definitely, but if possible, if we can centralize this in Github directly, it would be really cool !

Done :)

@MonkeyBreaker
Copy link
Collaborator

Nice, thank you !

@ulupo ulupo removed this from To do in v0.2.0 Nov 3, 2021
@ulupo ulupo added this to To do in v0.3.0 via automation Nov 3, 2021
@MonkeyBreaker MonkeyBreaker linked a pull request Nov 14, 2021 that will close this issue
7 tasks
@MonkeyBreaker MonkeyBreaker moved this from To do to In progress in v0.3.0 Dec 4, 2021
@ulupo ulupo closed this as completed in #50 Dec 6, 2021
v0.3.0 automation moved this from In progress to Done Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants