-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment replacing Cython with Numba in _moments #2873
base: main
Are you sure you want to change the base?
Conversation
This is an experiment that replaces a small piece of SKImage's Cython code, skimage/measure/_moments_cy.py, with Numba. This is just here for discussion and to facilitate benchmarking. It should not be merged.
Hello @mrocklin! Thanks for submitting the PR.
|
I'm usually bullish on numba, but I recently saw that you can't "pip install datashader" because they are concerned about missing dependencies. From the README:
I would be reluctant on taking on this dependency until such concerns are addressed. As regards performance, my experience has been that Numba is marginally faster than comparable code, so I would be inclined to trust your benchmarks. A big upside you didn't mention, though, is that for debugging you can just do We might as well use this PR to discuss a roadmap for performant code in the future. I think we might actually be able to implement an optional dependency, where the Numba version is used if available, or the Cython version if it isn't, using the new Pure Python mode in Cython: http://cython.readthedocs.io/en/latest/src/tutorial/pure.html |
Just to be clear, my intention here isn't to push for the use of Numba within scikit-image, optional or otherwise. I was sitting with Stefan and we were talking about this topic. I thought I'd put something together strictly as a proof of concept in case others might find it interesting. |
I'm not sure I can explain this comment. It might be worth asking them on their issue tracker what their concerns are. From what I understand, the Numba wheels on PyPI are decently compatible everywhere, though again, I am not an expert here. |
@jni The datashader comment also doesn't make sense to me; do you have any further insight? With type annotations, we should be able to switch out numba and Cython to some extent; but what is the advantage of supporting both? |
@stefanv I don't have any further insight, you would have to contact the datashader team to find out more. All I know is that neither As to supporting both, I don't know where Numba is in terms of ARM support, but my intuition is that Cython/C support remains far more widespread across many platforms. (I just got a Raspberry Pi in the mail today, hence my sudden interest in ARM. =P) Additionally, debugging and profiling tools are also currently limited for Numba. If we get some performance regression like this one, we have very little recourse (other than becoming LLVM experts). Again, as I've said before, I'm bullish on Numba, but I think we should dip our toes in, rather than dive in head-first. |
@mrocklin thank you for your clarifications and for this experiment! |
AFAIK Numba does not yet support ARM or at least support for the platform is experimental. There is as armv7 PR from a year and a half ago, numba/numba#1968, which might be resurrected. If you want a up-to-date Python data science stack on the Raspberry Pi, berryconda can provide it. The |
@jni, the latest version of |
This is an experiment that replaces a small piece of Scikit-Image's Cython code,
skimage/measure/_moments_cy.pyx
, with Numba. Literally all I did was erase Cython type hinting and addnumba.jit
.This is just here to enable for discussion and benchmarking. It should not be merged.
My preliminary benchmarks on random data show that, unsurprisingly, this is not any faster than the existing Cython code. However those benchmarks are not very representative of real data. I think that we should try this with a real dataset. We might also consider playing with parallelism with
numba.jit(parallel=True)
ornumba.prange
(though I don't have any experience with these).The main advantage here is probably simpler code and a simpler development process. The main cost here is a significant dependency (numba) as well as JIT compilation times (the initial run of these functions seem to incur an additional cost of a few hundred milliseconds).
cc @stefanv @jni @sklam