Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment replacing Cython with Numba in _moments #2873

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrocklin
Copy link

This is an experiment that replaces a small piece of Scikit-Image's Cython code, skimage/measure/_moments_cy.pyx, with Numba. Literally all I did was erase Cython type hinting and add numba.jit.

This is just here to enable for discussion and benchmarking. It should not be merged.

My preliminary benchmarks on random data show that, unsurprisingly, this is not any faster than the existing Cython code. However those benchmarks are not very representative of real data. I think that we should try this with a real dataset. We might also consider playing with parallelism with numba.jit(parallel=True) or numba.prange (though I don't have any experience with these).

The main advantage here is probably simpler code and a simpler development process. The main cost here is a significant dependency (numba) as well as JIT compilation times (the initial run of these functions seem to incur an additional cost of a few hundred milliseconds).

cc @stefanv @jni @sklam

This is an experiment that replaces a small piece of SKImage's Cython
code, skimage/measure/_moments_cy.py, with Numba.

This is just here for discussion and to facilitate benchmarking.
It should not be merged.
@pep8speaks
Copy link

Hello @mrocklin! Thanks for submitting the PR.

Line 51:18: E225 missing whitespace around operator

@jni
Copy link
Member

jni commented Nov 10, 2017

I'm usually bullish on numba, but I recently saw that you can't "pip install datashader" because they are concerned about missing dependencies. From the README:

Datashader is not currently available on PyPI, to avoid broken or low-performance installations that come from not keeping track of C/C++ binary dependencies such as LLVM (required by Numba).

I would be reluctant on taking on this dependency until such concerns are addressed.

As regards performance, my experience has been that Numba is marginally faster than comparable code, so I would be inclined to trust your benchmarks. A big upside you didn't mention, though, is that for debugging you can just do NUMBA_DISABLE_JIT=1 and then use pdb/ipdb in all its glory.

We might as well use this PR to discuss a roadmap for performant code in the future. I think we might actually be able to implement an optional dependency, where the Numba version is used if available, or the Cython version if it isn't, using the new Pure Python mode in Cython:

http://cython.readthedocs.io/en/latest/src/tutorial/pure.html

@mrocklin
Copy link
Author

Just to be clear, my intention here isn't to push for the use of Numba within scikit-image, optional or otherwise. I was sitting with Stefan and we were talking about this topic. I thought I'd put something together strictly as a proof of concept in case others might find it interesting.

@mrocklin
Copy link
Author

Datashader is not currently available on PyPI, to avoid broken or low-performance installations that come from not keeping track of C/C++ binary dependencies such as LLVM (required by Numba).

I'm not sure I can explain this comment. It might be worth asking them on their issue tracker what their concerns are. From what I understand, the Numba wheels on PyPI are decently compatible everywhere, though again, I am not an expert here.

@stefanv
Copy link
Member

stefanv commented Nov 13, 2017

@jni The datashader comment also doesn't make sense to me; do you have any further insight?

With type annotations, we should be able to switch out numba and Cython to some extent; but what is the advantage of supporting both?

@jni
Copy link
Member

jni commented Nov 14, 2017

@stefanv I don't have any further insight, you would have to contact the datashader team to find out more. All I know is that neither pip install datashader nor conda install -c conda-forge datashader got me the latest version and I was annoyed. =P

As to supporting both, I don't know where Numba is in terms of ARM support, but my intuition is that Cython/C support remains far more widespread across many platforms. (I just got a Raspberry Pi in the mail today, hence my sudden interest in ARM. =P) Additionally, debugging and profiling tools are also currently limited for Numba. If we get some performance regression like this one, we have very little recourse (other than becoming LLVM experts).

Again, as I've said before, I'm bullish on Numba, but I think we should dip our toes in, rather than dive in head-first.

@jni
Copy link
Member

jni commented Nov 14, 2017

@mrocklin thank you for your clarifications and for this experiment!

@mrocklin
Copy link
Author

You might want to check in with @seibert , who is both a Numba dev and a huge Raspberry Pi fan. There are definitely Numba builds (and many other builds) for ARM. You might also want to see this talk by @jjhelmus

@jjhelmus
Copy link
Contributor

AFAIK Numba does not yet support ARM or at least support for the platform is experimental. There is as armv7 PR from a year and a half ago, numba/numba#1968, which might be resurrected.

If you want a up-to-date Python data science stack on the Raspberry Pi, berryconda can provide it. The rpi channel provides most of the packages you would likely want, including scikit-learn, for armv6 and armv7.

@jakirkham
Copy link
Contributor

@jni, the latest version of datashader should be available in conda-forge. We recently added the feedstock a few weeks back. If not, feel free to raise an issue at the feedstock.

Base automatically changed from master to main February 18, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants