-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ripser backend #106
Conversation
I'll fix the issue with the windows compilation ASAP, sorry for that ... |
|
d8dfbba
to
d99bd3e
Compare
Codecov Report
@@ Coverage Diff @@
## master #106 +/- ##
=======================================
Coverage 96.75% 96.75%
=======================================
Files 3 3
Lines 154 154
Branches 26 26
=======================================
Hits 149 149
Misses 4 4
Partials 1 1
Continue to review full report at Codecov.
|
@MonkeyBreaker Should the Windows compatibility changes be suggested in a pull request to the upstream repository? They were initially just a hack I did to make it build so that we could have Windows conda-forge builds (users were asking @sauln for Windows compatibility, if I recall). |
@bdice by upstream repository, you mean the main repository of Ripser ? |
@MonkeyBreaker Yes. The changes made here should enable cross-platform compatibility without any major degradations in the functionality, and it would make this Windows compatible Python/Cython library easier to maintain in the future as the C++ library continues to evolve. It seems like a win-win. https://github.com/Ripser/ripser |
I just tested to compile |
To give more details @ulupo:
The last table shows the results without
Well I didn't want to add for
Should this be related to this fix add in ripser ? |
I guess I was suggesting that it may be worthwhile to do so to have the "ultimate" performance figures.
What I meant was that I guess the CI for this project builds Python wheels using compiled extensions and that maybe these compiled extensions should be made using robinhood so that the final Python wheels can benefit from the performance boost. I am not suggesting any changes for the end user who compiles from source. I am just saying that it would be good if the Python user who installs from PyPI could also benefit from this particular addition.
Maybe, though I only think this is the case from experimenting with you on some input and not from looking deeply into the git history. |
Thanks @reds-heig for this awesome work, and thanks @ulupo and @bdice for fielding this 🙇
This sounds good to me, but unfortunately the CI is only running automated tests at the moment, not builds and deploys. It's been a minute since I've touched the travis code, so that might be easier to convert to github-actions in the long run if we want to add CI/CD. We should probably also set it up so it runs tests both with and without robinhood present. |
I add the last table with
About
|
Hi ! I had a bit of time, and I tried to add the
From what I can observe so far:
Otherwise, on my machine this seems to work, but I encountered an issue that only dev of the library should encounter: If you already run Of course maybe my edits are not the changes you would expect, I'm really not an expert on CI, but it's a first draft of them :). I didn't want migrate Best, |
.travis.yml
Outdated
export DISPLAY=:99.0; | ||
sh -e /etc/init.d/xvfb start; | ||
sleep 3; | ||
fi | ||
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew list python &>/dev/null || brew install python; fi | ||
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew list python3 &>/dev/null || brew install python3; fi | ||
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install pyenv-virtualenv; fi | ||
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install git; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this line is not needed for osx. The travis error says Error: git 2.24.1 is already installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it, but I didn't want to assume that git
is already installed. I didn't expect the pipeline to fail if git
was already present to be honest.
That's for taking a pass at this. The changes look reasonable, but I'm not much of an expert myself. Thanks for taking a pass at this! |
@sauln The C++ is somewhat hard to review because the upstream file has been updated, in addition to the Windows compatibility fixes and optional Robinhood dependency. I gave it a quick overview and I think it's fine. Overall the PR state is "working" and the integrations with CI, etc. appear to be functioning as desired. I think it's in the interest of the maintainers to reduce the downstream burden on ripser.py, and I would push just a little more on a topic I raised previously: this PR is mostly C++ changes and this package's purpose/scope (as I understand it) is to wrap the C++ library and offer convenient Python bindings. Filing an upstream PR to https://github.com/Ripser/ripser for the Windows compatibility fixes and getting these two As for Robinhood, it seems like this PR offers a 28% average speedup at the cost of potentially breaking the "copy from upstream" method for keeping this repo's C++ internals up to date. Obviously performance improvements are good, but it's a tradeoff to consider carefully. This isn't an issue if someone (PR author(s)? @reds-heig @MonkeyBreaker) is willing to help maintain that patch. Also, if the upstream project is abandoned at some point, then there's no longer a problem -- this repository is already incurring that maintenance cost if the upstream is not accepting new PRs (it doesn't appear to have changed much recently). I just wanted to throw that in there as a perspective from a fellow OSS maintainer with finite time/resources. This code appears to work well and I would approve it for merge if project maintainers agree upon consideration. |
@bdice you raised some good points.
|
@MonkeyBreaker Thanks for the helpful insight! I wasn't aware that there were other changes in this repo's copy of the C++ code. Since that's the case, then my intent of aligning the two C++ implementations may not be a realistic goal. Thanks again for the PR and for thinking about the questions I raised. I'll let project maintainers finalize and merge this PR. |
Alright, let's ship it! @MonkeyBreaker could you update the version # (https://github.com/scikit-tda/ripser.py/blob/master/ripser/_version.py) to 0.6.0 and write a brief summary in the changelog (CHANGELOG.md) and then I'll merge the changes and redeploy everything. |
Hi all,
Thank you so much for being on this, and I'm sorry I haven't contributed
more to the discussion as I was going on. Remote teaching and mentoring
has been sucking up all of my time, and for those of you who are not in the
US, things are insane right now (horrible mismanagement of a pandemic
across the board, and a lot of civic and political unrest).
@MonkeyBreaker, it sounds like you dug quite into the code and understand
it really well, so thank you for that! As you noticed, I broke with the
original ripser convention in a few key ways:
1) I defer distance computations to the Python front-end. This has pluses
and minuses, but it allows us to be more general and to leverage other
packages for distance computations
2) I totally disregard the ratio parameter, as you noticed. Personally, I
found this opaque and possibly misleading, but that's more a matter of taste
3) Perhaps most significantly, I added the ability to have negative birth
times with H0. This is because I have a number of applications that
benefit from doing lower star / upper star filtrations, particularly in
image processing and time series analysis, and it was not too much effort
to add that one feature to this library
Anyway, I do think because of this, it's probably more effort than it's
worth to keep this synced with the original ripser, but then again, there
are other capabilities there like persistent homology instead of
cohomology, where it's possible to actually extract representative cycles.
So I am open to discussion later. But for now, let's keep it its own thing.
Thank you so much again for all of your contributions. I will try to give
you a shoutout moving forward any time this software comes up.
Sincerely,
Chris Tralie
…On Thu, Oct 8, 2020 at 12:35 PM Nathaniel Saul ***@***.***> wrote:
Alright, let's ship it! @MonkeyBreaker <https://github.com/MonkeyBreaker>
could you update the version # (
https://github.com/scikit-tda/ripser.py/blob/master/ripser/_version.py)
to 0.6.0 and write a brief summary in the changelog (CHANGELOG.md) and then
I'll merge the changes and redeploy everything.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJWDZWKP5UGGNB7Z7SYMWDSJXS35ANCNFSM4RYMKKDA>
.
|
Hi ! @ctralie thank you for your feedback ! I hope that despite all what's going on in the US, everything will go well.
I think that we should be able to update the code to integrate this kind of possibilities in the future even if we are a bit different than the upstream repository. For me, it's important that we match as much as possible the upstream repository because if new changes are done upstream they shall be easy to integrate. @sauln Let me proceed with the changes, but before this PR will be merged, I would like to have the opinion of everyone about one of the points I raised in the description of the PR:
Currently in order to use the enclosing radius optimization, we need to set in Python the threshold to if (threshold == std::numeric_limits<value_t>::max() || threshold == std::numeric_limits<value_t>::infinity()) {
.... @ctralie, @sauln, @bdice, @ulupo what do you think about this ? Should I update the code or is there a reason to compute in some cases Best, |
@MonkeyBreaker knows my opinion on this issue from the conversations we've had on it, but to share with everybody else: I think we should make sure that the enclosing radius optimization is used when appropriate, as the performance benefits can be large (if slightly unpredictable). In C++
I think 2 is a small unintentional design flaw. It seems clear to me that So I would be in favour of implementing @MonkeyBreaker's suggested modification of the When interfacing with Python, are we sure that |
Additionally, I'd like to repeat a previous point I made: I think that with this update of the C++ backend one should be able to fully revert #104 and the lexicographic ordering should not be necessary. If this is the case, I suggest this is done in this PR or at least as part of the 0.6 release, and that an example such as the one I gave in #103 is added as a test to avoid regressions. |
@ulupo and @MonkeyBreaker, you've made a good case for modifying the behavior. Again, I am not as familiar with the C++ backend as I should be, so will trust both of your judgements. @ulupo Could you add regression tests and revert the changes in a follow up PR? |
Sure OK! 👍 |
Yeah, this all sounds reasonable to me.
I personally do find ratio confusing compared to distance threshold,
because outliers can really mess up what it means. But I'm fine if you
want to add that parameter back, as long as it defaults to 1.0
Thank you again for digging into this while I've had very limited bandwidth
on my end!
Sincerely,
Chris
…On Fri, Oct 9, 2020 at 12:10 PM Umberto Lupo ***@***.***> wrote:
@ulupo <https://github.com/ulupo> Could you add regression tests and
revert the changes in a follow up PR?
Sure OK! 👍
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJWDZQJJ6LB6TJFQ3AHT23SJ4YYXANCNFSM4RYMKKDA>
.
|
@MonkeyBreaker thanks for the extra commit! I repeat one small question I had, just to be sure:
If yes, I have nothing more to add and leave it for the maintainers to decide on whether the state is good for merging. |
@ulupo About infinity, I verified one thing: if About
But I cannot find information about Python |
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
4aac65c
to
7a9e21f
Compare
Well, I'll go sleep, I'm struggling with Windows (Yay ...). With the new CI, the windows job uses I did not find at the moment how to detect the compiler used for the compilation and according to this, choose the correct flag format. I'll give it a try tomorrow or on the week-end. Julián |
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Hurray ! Seems to work now. In order to make it work I follow this answer on SO. [build]
compiler=msvc If you have another solution, please feel free to integrate it :) Julián |
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Signed-off-by: julian <julian.burellaperez@heig-vd.ch>
Hi everyone, As @ubauer requested I benchmarked and also added some changes:
About PACK on windows, windows compiler require data each data fields have the same robinhood has two different memory layouts. Currently I used
Reading the
Please let me know if you have any question/suggestion. @ubauer you gave now write access to the repository, feel free to add all the changes we discussed :) Best, |
This is great! I'll give it one last review this weekend and give a few days for anyone else to make comments before shipping. Thanks @MonkeyBreaker for all your work with this improvement 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Really nice work, this took a lot of effort.
@MonkeyBreaker thank you for the hard work putting this together and patience with getting it merged! 6.0.0 is out. I still have a bit of work to do to get the documentation rolled out. I was hoping to get 1 more brief PR from you! Could you add a blurb about the robinhood installation in the readme and in the docs site? I think also a copy of the benchmarking table would be really helpful within the docs site as well! Thank you :D |
@sauln thank you for the merge ! Sure about the installation and the benchmarking, let me prepare it, hopefully bythe end of the week. And also, thank you everyone for the already amazing work on the library :D Julián |
Yes, thank you all so much for your work here, especially during a time
when my personal free development cycles are very low!
…On Mon, Nov 2, 2020 at 2:08 PM MonkeyBreaker ***@***.***> wrote:
@sauln <https://github.com/sauln> thank you for the merge !
Sure about the installation and the benchmarking, let me prepare it,
hopefully bythe end of the week.
And also, thank you everyone for the already amazing work on the library :D
Julián
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJWDZSJZGLOJH5GCOYM5MDSN37TZANCNFSM4RYMKKDA>
.
|
Hello,
Description
This PR updates the C++ ripser backend to the latest code available at Ripser and also improve some parts of the code.
Changes
About the main changes add, this are some of the main ones:
The profiling of
ripser
showed a lot of time spent usingstd::unordered_map
. From personal experience and as you'll see on the benchmark below, usingrobinhood::unordered_map
add a certain amount of speed-up by just replacingstd::unordered_map
. I don't have the numbers about the memory consumption forripser.py
but in a different project, the memory was reduced by 10% just by usingrobinhood::unordered_map
.Comments
std::unordered_map
I did not add directly
robinhood hashmap
as a dependency but I let a#if definded
in the code in case someone (I in this case) would like to use it.I would really like to not loose the gain in performances added by the new unordered_map.
Enclosing radius
I add a third table in the benchmark discussing about the enclosing radius optimization. In
ripser
this optimization is used when no threshold is set explicitly. Inripser.py
to enable the use of enclosing radius, we need to set the threshold parameter tothreshold=np.finfo(np.float32).max
.In
ripser.py
the default value of the threshold is infinity, meaning that the enclosing radius isn't used. But as described inripser
paper, p.11, section 4, input:From what I understand, there's no point in computing PH above this radius, but maybe am I wrong ?
Because it could be a possibility to change inside
ripser
the condition to also in case if the threshold is set to infinity, use the enclosing radius as a threshold, what do you think ? I mailed directly Prof. Bauer. but I don't have any news.Test
I runned the test available with
pytest
, I also verified in more details on different datasetsbarcodes
andcocycles
.But please, feel free to test on your side :)
Please, let me know if I need to add some changes or if you wouldn't like to merge this PR
Best,
Julián
Benchmark
I made some benchmarking to show run time difference on the same datasets used in the original ripser paper:
Ripser.py
Ripser.py updated
Ripser.py Using enclosing radius
In order to use the enclosing radius optimization implemented in
ripser
, it'snecessary to pass as a threshold the value
np.finfo(np.float32).max
. In the previous table, the previous value is used when no threshold is specified.