Update ripser backend #106

reds-heig · 2020-09-24T14:35:35Z

Hello,

Description

This PR updates the C++ ripser backend to the latest code available at Ripser and also improve some parts of the code.

Changes

About the main changes add, this are some of the main ones:

Update to latest ripser.cpp available
Add support to robinhood hashmap
Flat coefficient binomial table
Use specialized function to compute module 2 with mask operator
Refactor to remove dups

The profiling of ripser showed a lot of time spent using std::unordered_map. From personal experience and as you'll see on the benchmark below, using robinhood::unordered_map add a certain amount of speed-up by just replacing std::unordered_map. I don't have the numbers about the memory consumption for ripser.py but in a different project, the memory was reduced by 10% just by using robinhood::unordered_map.

Comments

`std::unordered_map`

I did not add directly robinhood hashmap as a dependency but I let a #if definded in the code in case someone (I in this case) would like to use it.

I would really like to not loose the gain in performances added by the new unordered_map.

Enclosing radius

I add a third table in the benchmark discussing about the enclosing radius optimization. In ripser this optimization is used when no threshold is set explicitly. In ripser.py to enable the use of enclosing radius, we need to set the threshold parameter to threshold=np.finfo(np.float32).max.

In ripser.py the default value of the threshold is infinity, meaning that the enclosing radius isn't used. But as described in ripser paper, p.11, section 4, input:

If no threshold is specified, the minimum enclosing radius of the input is used as a threshold,as suggested by Henselman-Petrusek [16]. Above that threshold the Vietoris–Rips complex is a simplicial cone with apex a minimizing point x, and so the homology remains trivial afterwards.

From what I understand, there's no point in computing PH above this radius, but maybe am I wrong ?

Because it could be a possibility to change inside ripser the condition to also in case if the threshold is set to infinity, use the enclosing radius as a threshold, what do you think ? I mailed directly Prof. Bauer. but I don't have any news.

Test

I runned the test available with pytest, I also verified in more details on different datasets barcodes and cocycles.
But please, feel free to test on your side :)

Please, let me know if I need to add some changes or if you wouldn't like to merge this PR

Best,
Julián

Benchmark

I made some benchmarking to show run time difference on the same datasets used in the original ripser paper:

Ripser.py

Dataset	size	threshold	dim	coeff	time [s]
sphere3	192		2	2	1.5
dragon	2000		1	2	3.8
o3	1024	1.8	3	2	3.1
random16	50		7	2	*
fractal	512		2	2	18.3
o3	4096	1.4	3	2	**

* Run out of memory, necessary to use enclosing radius
** I'm surprized it runs out of memory

Ripser.py updated

Dataset	size	threshold	dim	coeff	time [s]	time robinhood [s]
sphere3	192		2	2	1.5	1.2
dragon	2000		1	2	3.8	3.3
o3	1024	1.8	3	2	2.9	2.2
random16	50		7	2	*	*
fractal	512		2	2	17.7	14
o3	4096	1.4	3	2	68.6	53.4

* Run out of memory, necessary to use enclosing radius

Ripser.py Using enclosing radius

In order to use the enclosing radius optimization implemented in ripser, it's
necessary to pass as a threshold the value np.finfo(np.float32).max. In the previous table, the previous value is used when no threshold is specified.

Dataset	size	threshold	dim	coeff	time [s]
sphere3	192		2	2	1.5
dragon	2000		1	2	2.9
o3	1024	1.8	3	2	2.9
random16	50		7	2	8.4
fractal	512		2	2	17.7
o3	4096	1.4	3	2	68.6

MonkeyBreaker · 2020-09-24T15:13:28Z

I'll fix the issue with the windows compilation ASAP, sorry for that ...

ulupo · 2020-09-24T15:24:47Z

Concerning benchmarks, I guess the ultimate table would combine the enclosing radius fix with robinhood hashmap, correct? Perhaps it would be worth showing it? @reds-heig
As far as I understand, some changes to the project CI should be made so that robinhood hashmap is available when building Python wheels, right? The user who compiles from source, on the other hand, should not need it installed. @ctralie @sauln is this correct?
@ctralie: I think (to be double-checked) the newer C++ backend does not suffer from the issues that led me to opening Incorrect output on COO matrices instantiated with rows and columns not in lexicographic order #103. So perhaps the changes made in Fix #103 #104 can be reverted if this is merged.

codecov · 2020-09-25T11:47:52Z

Codecov Report

Merging #106 into master will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #106   +/-   ##
=======================================
  Coverage   96.75%   96.75%           
=======================================
  Files           3        3           
  Lines         154      154           
  Branches       26       26           
=======================================
  Hits          149      149           
  Misses          4        4           
  Partials        1        1

Impacted Files	Coverage Δ
ripser/_version.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc90fb9...7e1107b. Read the comment docs.

bdice · 2020-09-25T12:36:43Z

@MonkeyBreaker Should the Windows compatibility changes be suggested in a pull request to the upstream repository? They were initially just a hack I did to make it build so that we could have Windows conda-forge builds (users were asking @sauln for Windows compatibility, if I recall).

MonkeyBreaker · 2020-09-25T12:40:53Z

@bdice by upstream repository, you mean the main repository of Ripser ?

bdice · 2020-09-25T12:45:38Z

@MonkeyBreaker Yes. The changes made here should enable cross-platform compatibility without any major degradations in the functionality, and it would make this Windows compatible Python/Cython library easier to maintain in the future as the C++ library continues to evolve. It seems like a win-win. https://github.com/Ripser/ripser

setup.py

MonkeyBreaker · 2020-09-25T12:50:32Z

I just tested to compile ripser on windows, and effectively the same issues are encountered.
I think it would be a good idea to prepare a separate PR on the upstream repository enabling compilation on windows.

MonkeyBreaker · 2020-09-25T13:58:28Z

To give more details @ulupo:

Concerning benchmarks, I guess the ultimate table would combine the enclosing radius fix with robinhood hashmap, correct? Perhaps it would be worth showing it? @reds-heig

The last table shows the results without robinhood, would it be worth adding them ?

As far as I understand, some changes to the project CI should be made so that robinhood hashmap is available when building Python wheels, right? The user who compiles from source, on the other hand, should not need it installed. @ctralie @sauln is this correct?

Well I didn't want to add for ripser,py the dependency with robinhood. But I let the possibility in the C++ to use robinhood in case it's already present in you project.

@ctralie: I think (to be double-checked) the newer C++ backend does not suffer from the issues that led me to opening #103. So perhaps the changes made in #104 can be reverted if this is merged.

Should this be related to this fix add in ripser ?

ulupo · 2020-09-25T14:04:38Z

@MonkeyBreaker

The last table shows the results without robinhood, would it be worth adding them ?

I guess I was suggesting that it may be worthwhile to do so to have the "ultimate" performance figures.

Well I didn't want to add for ripser,py the dependency with robinhood. But I let the possibility in the C++ to use robinhood in case it's already present in you project.

What I meant was that I guess the CI for this project builds Python wheels using compiled extensions and that maybe these compiled extensions should be made using robinhood so that the final Python wheels can benefit from the performance boost. I am not suggesting any changes for the end user who compiles from source. I am just saying that it would be good if the Python user who installs from PyPI could also benefit from this particular addition.

Should this be related to this fix add in ripser ?

Maybe, though I only think this is the case from experimenting with you on some input and not from looking deeply into the git history.

sauln · 2020-09-25T15:53:38Z

Thanks @reds-heig for this awesome work, and thanks @ulupo and @bdice for fielding this 🙇

these compiled extensions should be made using robinhood so that the final Python wheels can benefit from the performance boost

This sounds good to me, but unfortunately the CI is only running automated tests at the moment, not builds and deploys. It's been a minute since I've touched the travis code, so that might be easier to convert to github-actions in the long run if we want to add CI/CD.

We should probably also set it up so it runs tests both with and without robinhood present.

MonkeyBreaker · 2020-09-28T08:42:49Z

I add the last table with robinhood timing added:

Dataset	size	threshold	dim	coeff	time [s]	time robinhood [s]
sphere3	192		2	2	1.5	1.2
dragon	2000		1	2	2.9	2.5
o3	1024	1.8	3	2	2.9	2.2
random16	50		7	2	8.4	6.0
fractal	512		2	2	17.7	14
o3	4096	1.4	3	2	68.6	53.4

About robinhhood inside of the CI, one easy solution could be as follow:

Add robinhood as a submodule for the repository: git submodule add https://github.com/martinus/robin-hood-hashing ripser/<something>
inside of setup.py, you could test if the folder is present and set the following flags for the compilation:
- os.path.isdir('ripser/<something>'), if true
- append to define_macros the following : ("USE_ROBINHOOD_HASHMAP", 1)
- Add the correct include path, gcc/clang -Iripser/<something>/src/include or for msvc /Iripser/<something>/src/include

MonkeyBreaker · 2020-10-06T09:56:12Z

Hi !

I had a bit of time, and I tried to add the robinhood hashmap into the CI.
I have done as follow:

git clone https://github.com/martinus/robin-hood-hashing ripser/robinhood
Inside of the setup.py, if the folder ripser/robinhood exist, I enable robinhood hashmap when compiling ripser.

From what I can observe so far:

travis-ci and appveyor, robinhood hashmap is correctly downloaded
In appveyor, which manages the Windows deploy, the compilation flag are correctly set : 3.7 and 3.7 x64
In travis-ci, robinhood hashmap is correctly download, but I do not have access to the compilation flags. I should add -v before starting the script setup.py inside of ci_scripts/install.sh

Otherwise, on my machine this seems to work, but I encountered an issue that only dev of the library should encounter:

If you already run pip install . and after that you download the robinhood hashmap, unfortunately when running again command pip install . won't recompile ripser. You need first to clean the previous build to ensure that it recompiles using robinhood hashmap.

Of course maybe my edits are not the changes you would expect, I'm really not an expert on CI, but it's a first draft of them :). I didn't want migrate travis to github-actions because I think this shall be done in a separate PR.

Best,
Julián

sauln · 2020-10-06T16:22:33Z

.travis.yml

            export DISPLAY=:99.0;
            sh -e /etc/init.d/xvfb start;
            sleep 3;
      fi
    - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew list python &>/dev/null || brew install python; fi
    - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew list python3 &>/dev/null || brew install python3; fi
    - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install pyenv-virtualenv; fi
+    - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install git; fi


It looks like this line is not needed for osx. The travis error says Error: git 2.24.1 is already installed

I removed it, but I didn't want to assume that git is already installed. I didn't expect the pipeline to fail if git was already present to be honest.

sauln · 2020-10-06T16:22:48Z

That's for taking a pass at this. The changes look reasonable, but I'm not much of an expert myself. Thanks for taking a pass at this!

sauln · 2020-10-06T16:25:23Z

Could @bdice, @ulupo, or @ctralie confirm the c++ changes look good? I don't have enough c++ background to judge them.

If one of you confirms and the CI runs successfully, I suggest we merge the changes.

bdice · 2020-10-07T02:02:31Z

@sauln The C++ is somewhat hard to review because the upstream file has been updated, in addition to the Windows compatibility fixes and optional Robinhood dependency. I gave it a quick overview and I think it's fine. Overall the PR state is "working" and the integrations with CI, etc. appear to be functioning as desired.

I think it's in the interest of the maintainers to reduce the downstream burden on ripser.py, and I would push just a little more on a topic I raised previously: this PR is mostly C++ changes and this package's purpose/scope (as I understand it) is to wrap the C++ library and offer convenient Python bindings. Filing an upstream PR to https://github.com/Ripser/ripser for the Windows compatibility fixes and getting these two ripser.cpp files in sync seems like the appropriate next step. If the files are kept "in sync" then future upstream changes can be easily merged into this repository by simply copying the new C++ code into this repo.

As for Robinhood, it seems like this PR offers a 28% average speedup at the cost of potentially breaking the "copy from upstream" method for keeping this repo's C++ internals up to date. Obviously performance improvements are good, but it's a tradeoff to consider carefully. This isn't an issue if someone (PR author(s)? @reds-heig @MonkeyBreaker) is willing to help maintain that patch. Also, if the upstream project is abandoned at some point, then there's no longer a problem -- this repository is already incurring that maintenance cost if the upstream is not accepting new PRs (it doesn't appear to have changed much recently). I just wanted to throw that in there as a perspective from a fellow OSS maintainer with finite time/resources. This code appears to work well and I would approve it for merge if project maintainers agree upon consideration.

MonkeyBreaker · 2020-10-07T06:33:33Z

@bdice you raised some good points.

About a PR for the upstream repository for windows compatibility, I can do it. But not sure it will be merged, I don't know if it stays maintained or the author is quite busy.
This PR aligns on the C++ upstream implementation, but there are some differences:
- ripser.py supports lowerstar filtrations and for that, non 0 vertex birth need to be supported, in the upstream implementation, as far as I understood this isn't the case.
- The upstream implementation has slightly different approach to set the maximal dimension, it uses dim_max(std::min(_dim_max, index_t(dist.size() - 2))) as compared to ripser.py which is simply dim_max(_dim_max), the reason is that if the users wants till dimension 5 (per example) and the previous rule set dim_max to let's say 3, the output will only contain 3 dimensions instead of 5. Because the 2 missing dimension have 0 barcodes.
- Currently in ripser.py, the ratio parameter isn't exposed for Python. From the documentation of upstream, ratio is used for : "ratio r: only show persistence pairs with death/birth ratio > r.". To support this in ripser.py, it should be necessary to expose this parameter. In ripser.py ratio is set to 1.
- Maybe it's trivial, but the indenting of source is with my own personal rules, I use clang_format. I think that for future changes it would be good if a defined indentation is used, much easier to maintain :). All this text to say that I couldn't reproduce the indentation from upstream because I did not find it.
About Robinhood, in the upstream repository. there's a optional support for Google's sparsehash, I just find easier to maintain as a dependency Robinhood than Google's sparsehash. If my memory doesn't fail me, I think that just replacing std::unordered_map by Robinhood in upstream won't compile because at one place in the code, an insert does not support the constructor used. But I'm not 100% sure, this was some time ago. But in any case, nothing difficult to fix.
As for helping to maintain, sure I'm more than happy to do it, but this will be done on my spare time ...

bdice · 2020-10-07T17:13:46Z

@MonkeyBreaker Thanks for the helpful insight! I wasn't aware that there were other changes in this repo's copy of the C++ code. Since that's the case, then my intent of aligning the two C++ implementations may not be a realistic goal. Thanks again for the PR and for thinking about the questions I raised. I'll let project maintainers finalize and merge this PR.

sauln · 2020-10-08T16:34:54Z

Alright, let's ship it! @MonkeyBreaker could you update the version # (https://github.com/scikit-tda/ripser.py/blob/master/ripser/_version.py) to 0.6.0 and write a brief summary in the changelog (CHANGELOG.md) and then I'll merge the changes and redeploy everything.

ctralie · 2020-10-08T16:42:39Z

Hi all, Thank you so much for being on this, and I'm sorry I haven't contributed more to the discussion as I was going on. Remote teaching and mentoring has been sucking up all of my time, and for those of you who are not in the US, things are insane right now (horrible mismanagement of a pandemic across the board, and a lot of civic and political unrest). @MonkeyBreaker, it sounds like you dug quite into the code and understand it really well, so thank you for that! As you noticed, I broke with the original ripser convention in a few key ways: 1) I defer distance computations to the Python front-end. This has pluses and minuses, but it allows us to be more general and to leverage other packages for distance computations 2) I totally disregard the ratio parameter, as you noticed. Personally, I found this opaque and possibly misleading, but that's more a matter of taste 3) Perhaps most significantly, I added the ability to have negative birth times with H0. This is because I have a number of applications that benefit from doing lower star / upper star filtrations, particularly in image processing and time series analysis, and it was not too much effort to add that one feature to this library Anyway, I do think because of this, it's probably more effort than it's worth to keep this synced with the original ripser, but then again, there are other capabilities there like persistent homology instead of cohomology, where it's possible to actually extract representative cycles. So I am open to discussion later. But for now, let's keep it its own thing. Thank you so much again for all of your contributions. I will try to give you a shoutout moving forward any time this software comes up. Sincerely, Chris Tralie

…

On Thu, Oct 8, 2020 at 12:35 PM Nathaniel Saul ***@***.***> wrote: Alright, let's ship it! @MonkeyBreaker <https://github.com/MonkeyBreaker> could you update the version # ( https://github.com/scikit-tda/ripser.py/blob/master/ripser/_version.py) to 0.6.0 and write a brief summary in the changelog (CHANGELOG.md) and then I'll merge the changes and redeploy everything. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWDZWKP5UGGNB7Z7SYMWDSJXS35ANCNFSM4RYMKKDA> .

MonkeyBreaker · 2020-10-08T19:26:22Z

Hi !

@ctralie thank you for your feedback ! I hope that despite all what's going on in the US, everything will go well.
Well, if we're ready to drop features (like ratio), we could maybe discuss on a separate Issue if we could/should remove unnecessary computation, I'm an optimization guy and keeping work that we don't use at all, shall in my opinion be removed. About the changes you made into ripser, in my opinion they were worth it.

Anyway, I do think because of this, it's probably more effort than it's
worth to keep this synced with the original ripser, but then again, there
are other capabilities there like persistent homology instead of
cohomology, where it's possible to actually extract representative cycles.
So I am open to discussion later. But for now, let's keep it its own thing.

I think that we should be able to update the code to integrate this kind of possibilities in the future even if we are a bit different than the upstream repository. For me, it's important that we match as much as possible the upstream repository because if new changes are done upstream they shall be easy to integrate.

@sauln Let me proceed with the changes, but before this PR will be merged, I would like to have the opinion of everyone about one of the points I raised in the description of the PR:

Enclosing radius

I add a third table in the benchmark discussing about the enclosing radius optimization. In ripser this optimization is used when no threshold is set explicitly. In ripser.py to enable the use of enclosing radius, we need to set the threshold parameter to threshold=np.finfo(np.float32).max.
In ripser.py the default value of the threshold is infinity, meaning that the enclosing radius isn't used. But as described in ripser paper, p.11, section 4, input:
If no threshold is specified, the minimum enclosing radius of the input is used as a threshold,as suggested by Henselman-Petrusek [16]. Above that threshold the Vietoris–Rips complex is a simplicial cone with apex a minimizing point x, and so the homology remains trivial afterwards.
From what I understand, there's no point in computing PH above this radius, but maybe am I wrong ?
Because it could be a possibility to change inside ripser the condition to also in case if the threshold is set to infinity, use the enclosing radius as a threshold, what do you think ? I mailed directly Prof. Bauer. but I don't have any news.

Currently in order to use the enclosing radius optimization, we need to set in Python the threshold to np.finfo(np.float32).max. Otherwise it will use the one set by the user, or by default inf. As I said earlier and from what I understood from the papers, computing homology for a greater radius than the enclosing radius, won't output more barcodes. I think we should modify condition here:

if (threshold == std::numeric_limits<value_t>::max() || threshold == std::numeric_limits<value_t>::infinity()) {
 ....

@ctralie, @sauln, @bdice, @ulupo what do you think about this ? Should I update the code or is there a reason to compute in some cases "To Infinity... and Beyond!"

Best,
Julián

ulupo · 2020-10-09T06:19:43Z

@MonkeyBreaker knows my opinion on this issue from the conversations we've had on it, but to share with everybody else: I think we should make sure that the enclosing radius optimization is used when appropriate, as the performance benefits can be large (if slightly unpredictable).

In C++ ripser, what seems to happen in https://github.com/Ripser/ripser/blob/286d3696796a707eecd0f71e6377880f60c936da/ripser.cpp#L1022-L1039 is this:

if the user does not pass a threshold via the --threshold option in the command line, the threshold is internally set to std::numeric_limits<value_t>::max() and the enclosing radius optimization is used;
if the user passes std::numeric_limits<value_t>::infinity() explicitly as a threshold, no optimization is used.

I think 2 is a small unintentional design flaw. It seems clear to me that std::numeric_limits<value_t>::infinity() should also mean "we will not be using a threshold", and hence that the enclosing radius optimization should be used.

So I would be in favour of implementing @MonkeyBreaker's suggested modification of the if clause.

When interfacing with Python, are we sure that np.inf will be passed correctly by the binding code as std::numeric_limits<value_t>::infinity()?

ulupo · 2020-10-09T06:35:44Z

Additionally, I'd like to repeat a previous point I made: I think that with this update of the C++ backend one should be able to fully revert #104 and the lexicographic ordering should not be necessary. If this is the case, I suggest this is done in this PR or at least as part of the 0.6 release, and that an example such as the one I gave in #103 is added as a test to avoid regressions.

sauln · 2020-10-09T16:07:54Z

@ulupo and @MonkeyBreaker, you've made a good case for modifying the behavior. Again, I am not as familiar with the C++ backend as I should be, so will trust both of your judgements.

@ulupo Could you add regression tests and revert the changes in a follow up PR?

ulupo · 2020-10-09T16:10:34Z

@ulupo Could you add regression tests and revert the changes in a follow up PR?

Sure OK! 👍

ctralie · 2020-10-09T17:24:56Z

Yeah, this all sounds reasonable to me. I personally do find ratio confusing compared to distance threshold, because outliers can really mess up what it means. But I'm fine if you want to add that parameter back, as long as it defaults to 1.0 Thank you again for digging into this while I've had very limited bandwidth on my end! Sincerely, Chris

…

On Fri, Oct 9, 2020 at 12:10 PM Umberto Lupo ***@***.***> wrote: @ulupo <https://github.com/ulupo> Could you add regression tests and revert the changes in a follow up PR? Sure OK! 👍 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWDZQJJ6LB6TJFQ3AHT23SJ4YYXANCNFSM4RYMKKDA> .

ulupo · 2020-10-12T11:39:36Z

@MonkeyBreaker thanks for the extra commit! I repeat one small question I had, just to be sure:

When interfacing with Python, are we sure that np.inf will be passed correctly by the binding code as std::numeric_limits<value_t>::infinity()?

If yes, I have nothing more to add and leave it for the maintainers to decide on whether the state is good for merging.

MonkeyBreaker · 2020-10-12T11:58:57Z

@ulupo About infinity, I verified one thing: if float infinity equals double infinity and from my results it's the case.

About np.inf, the information I get is from the official documentation of numpy.

NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). This means that Not a Number is not equivalent to infinity. Also that positive infinity is not equivalent to negative infinity. But infinity is equivalent to positive infinity.

But I cannot find information about Python inf and C++ inf. At the moment in cython or pybind11, np.inf and std::numeric_limits<value_t>::infinity() are equal. I think that the best way to always be sure it will be the case, is to add a test.

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

MonkeyBreaker · 2020-10-29T21:08:28Z

Well, I'll go sleep, I'm struggling with Windows (Yay ...).

With the new CI, the windows job uses conda, so far so good, but for something I cannot find why, it uses then by default gcc.exe compiler. This implies that the flags should start with - and not /. And currently the flags are configured depending on the platform (Windows, Darwin, etc.).

I did not find at the moment how to detect the compiler used for the compilation and according to this, choose the correct flag format. I'll give it a try tomorrow or on the week-end.

Julián

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

MonkeyBreaker · 2020-10-30T15:54:27Z

Hurray ! Seems to work now.

In order to make it work I follow this answer on SO.
The "hack" I implemented is to create a setup.cfg into which I add the following 2 lines:

[build]
compiler=msvc

If you have another solution, please feel free to integrate it :)

Julián

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

MonkeyBreaker · 2020-10-30T16:30:14Z

Hi everyone,

As @ubauer requested I benchmarked and also added some changes:

PACK now works also on windows, see below for a tiny discussion on this
Replace std::hash with robinhood::hash, the benchmark shows better performance.

About PACK on windows, windows compiler require data each data fields have the same type in order to pack correctly. In order to do this, I changed inside of entry_t coefficient_t coefficient into index_t coefficient. This works because both are signed types. The performance and the behaviour are from my test and observations the same, please feel free to double check.

robinhood has two different memory layouts. Currently I used robinhood hashmap in "auto" mode to choose the a memory layout. I benchmarked for our case and I got the following results:

Dataset	size	threshold	dim	coeff	unordered_map [s]	unordered_flat_map [s]	unordered_node_map [s]
sphere3	192		2	2	1.4	1.4	1.4
dragon	2000		1	2	2.6	2.6	2.7
o3	1024	1.8	3	2	2.3	2.3	2.4
random16	50		7	2	6.5	6.5	6.9
fractal	512		2	2	16.1	16.1	16.5
o3	4096	1.4	3	2	57.9	57.9	58.4

unordered_map is the auto mode, and apparently in our case it choose the best memory layout without any performance looses. From what I understood, this choice is made at compilation detecting the type of key used inside of the hashmap.
BTW, if the results seems slower than the one at the beginning of the PR, it's because my computer was doing other heavy tasks, which I think slowed a bit the execution.

Reading the robinhood documentation I stumbled on an implemented hash function directly available. The benchmark showed that the hash function provides a bit more of speed up :)

Dataset	size	threshold	dim	coeff	std::hash [s]	std::robinhood [s]
sphere3	192		2	2	1.4	1.4
dragon	2000		1	2	2.6	2.6
o3	1024	1.8	3	2	2.3	2.3
random16	50		7	2	6.5	6.2
fractal	512		2	2	16.1	15
o3	4096	1.4	3	2	57.9	53.9

Please let me know if you have any question/suggestion.

@ubauer you gave now write access to the repository, feel free to add all the changes we discussed :)

Best,
Julián

sauln · 2020-10-30T19:52:19Z

This is great! I'll give it one last review this weekend and give a few days for anyone else to make comments before shipping.

Thanks @MonkeyBreaker for all your work with this improvement 🙇

bdice

Looks good to me. Really nice work, this took a lot of effort.

sauln · 2020-11-02T18:10:01Z

@MonkeyBreaker thank you for the hard work putting this together and patience with getting it merged! 6.0.0 is out. I still have a bit of work to do to get the documentation rolled out.

I was hoping to get 1 more brief PR from you! Could you add a blurb about the robinhood installation in the readme and in the docs site? I think also a copy of the benchmarking table would be really helpful within the docs site as well!

Thank you :D

MonkeyBreaker · 2020-11-02T19:08:30Z

@sauln thank you for the merge !

Sure about the installation and the benchmarking, let me prepare it, hopefully bythe end of the week.

And also, thank you everyone for the already amazing work on the library :D

Julián

ctralie · 2020-11-02T19:29:24Z

Yes, thank you all so much for your work here, especially during a time when my personal free development cycles are very low!

…

On Mon, Nov 2, 2020 at 2:08 PM MonkeyBreaker ***@***.***> wrote: @sauln <https://github.com/sauln> thank you for the merge ! Sure about the installation and the benchmarking, let me prepare it, hopefully bythe end of the week. And also, thank you everyone for the already amazing work on the library :D Julián — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWDZSJZGLOJH5GCOYM5MDSN37TZANCNFSM4RYMKKDA> .

reds-heig force-pushed the speedup_ripser_cpp branch from d8dfbba to d99bd3e Compare September 25, 2020 07:56

bdice reviewed Sep 25, 2020

View reviewed changes

setup.py Show resolved Hide resolved

sauln reviewed Oct 6, 2020

View reviewed changes

julian added 9 commits October 29, 2020 20:34

Add default constructor to diameter_entry_t

bfe06d7

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Fix windows different behavior with 1L representation

78c7ec7

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Enable compilation assert only when not compiling with Windows compiler

3a0d7dd

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Add comment as suggested by @bdice

eadb981

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[WIP] Add Robinhood into travis pipeline

330d51a

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[WIP] Fix path to manage the separators depending on the OS

23379c9

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Update version and fill changelog.md

42cd20e

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Enable enclosing radius when threshold is set to infinity

8a2af5c

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

update CI/CD with robinhood-hashmap

7a9e21f

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

reds-heig force-pushed the speedup_ripser_cpp branch from 4aac65c to 7a9e21f Compare October 29, 2020 19:59

julian added 6 commits October 30, 2020 13:18

Change shell for windows from bash to cmd.exe

8512069

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[CI] Windows call pytest through python -m

3da334c

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[WIP][CI] Install again pytest for second step on windows

8be1dc7

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[CI] Windows change shell from cmd to powershell

61d314d

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Revert CI changes, draft compiler detection

7e7de81

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

[CI] Creating for windows a setup.cfg

5052694

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

julian added 3 commits October 30, 2020 17:00

Enable packed entry_t structure on Windows

f01b528

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Fix packed entry_t structure on Windows

212066a

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

Use robinhood::hash function instead of std::hash

7e1107b

Signed-off-by: julian <julian.burellaperez@heig-vd.ch>

bdice approved these changes Nov 2, 2020

View reviewed changes

sauln merged commit f784e1f into scikit-tda:master Nov 2, 2020

This was referenced Nov 3, 2020

Revert "Merge pull request #104 from ulupo/coo_format_patch" #112

Open

Pull ripser.py latest changes giotto-ai/giotto-tda#530

Merged

Update ripser backend #106

Update ripser backend #106

Conversation

reds-heig commented Sep 24, 2020

Description

Changes

Comments

std::unordered_map

Enclosing radius

Test

Benchmark

Ripser.py

Ripser.py updated

Ripser.py Using enclosing radius

MonkeyBreaker commented Sep 24, 2020

ulupo commented Sep 24, 2020 • edited Loading

codecov bot commented Sep 25, 2020 • edited Loading

Codecov Report

bdice commented Sep 25, 2020

MonkeyBreaker commented Sep 25, 2020

bdice commented Sep 25, 2020 • edited Loading

MonkeyBreaker commented Sep 25, 2020

MonkeyBreaker commented Sep 25, 2020

ulupo commented Sep 25, 2020 • edited Loading

sauln commented Sep 25, 2020

MonkeyBreaker commented Sep 28, 2020

MonkeyBreaker commented Oct 6, 2020 • edited Loading

sauln Oct 6, 2020

Choose a reason for hiding this comment

MonkeyBreaker Oct 6, 2020 • edited Loading

Choose a reason for hiding this comment

sauln commented Oct 6, 2020

sauln commented Oct 6, 2020

bdice commented Oct 7, 2020

MonkeyBreaker commented Oct 7, 2020

bdice commented Oct 7, 2020

sauln commented Oct 8, 2020

ctralie commented Oct 8, 2020 via email

MonkeyBreaker commented Oct 8, 2020 • edited Loading

ulupo commented Oct 9, 2020 • edited Loading

ulupo commented Oct 9, 2020 • edited Loading

sauln commented Oct 9, 2020

ulupo commented Oct 9, 2020

ctralie commented Oct 9, 2020 via email

ulupo commented Oct 12, 2020 • edited Loading

MonkeyBreaker commented Oct 12, 2020

MonkeyBreaker commented Oct 29, 2020

MonkeyBreaker commented Oct 30, 2020

MonkeyBreaker commented Oct 30, 2020

sauln commented Oct 30, 2020

bdice left a comment

Choose a reason for hiding this comment

sauln commented Nov 2, 2020

MonkeyBreaker commented Nov 2, 2020

ctralie commented Nov 2, 2020 via email

`std::unordered_map`

ulupo commented Sep 24, 2020 •

edited

Loading

codecov bot commented Sep 25, 2020 •

edited

Loading

bdice commented Sep 25, 2020 •

edited

Loading

ulupo commented Sep 25, 2020 •

edited

Loading

MonkeyBreaker commented Oct 6, 2020 •

edited

Loading

MonkeyBreaker Oct 6, 2020 •

edited

Loading

MonkeyBreaker commented Oct 8, 2020 •

edited

Loading

ulupo commented Oct 9, 2020 •

edited

Loading

ulupo commented Oct 9, 2020 •

edited

Loading

ulupo commented Oct 12, 2020 •

edited

Loading