Experiment replacing Cython with Numba in _moments #2873

mrocklin · 2017-11-10T01:21:49Z

This is an experiment that replaces a small piece of Scikit-Image's Cython code, skimage/measure/_moments_cy.pyx, with Numba. Literally all I did was erase Cython type hinting and add numba.jit.

This is just here to enable for discussion and benchmarking. It should not be merged.

My preliminary benchmarks on random data show that, unsurprisingly, this is not any faster than the existing Cython code. However those benchmarks are not very representative of real data. I think that we should try this with a real dataset. We might also consider playing with parallelism with numba.jit(parallel=True) or numba.prange (though I don't have any experience with these).

The main advantage here is probably simpler code and a simpler development process. The main cost here is a significant dependency (numba) as well as JIT compilation times (the initial run of these functions seem to incur an additional cost of a few hundred milliseconds).

cc @stefanv @jni @sklam

This is an experiment that replaces a small piece of SKImage's Cython code, skimage/measure/_moments_cy.py, with Numba. This is just here for discussion and to facilitate benchmarking. It should not be merged.

pep8speaks · 2017-11-10T01:21:51Z

Hello @mrocklin! Thanks for submitting the PR.

In the file skimage/measure/_moments_cy.py, following are the PEP8 issues :

Line 51:18: E225 missing whitespace around operator

jni · 2017-11-10T02:20:42Z

I'm usually bullish on numba, but I recently saw that you can't "pip install datashader" because they are concerned about missing dependencies. From the README:

Datashader is not currently available on PyPI, to avoid broken or low-performance installations that come from not keeping track of C/C++ binary dependencies such as LLVM (required by Numba).

I would be reluctant on taking on this dependency until such concerns are addressed.

As regards performance, my experience has been that Numba is marginally faster than comparable code, so I would be inclined to trust your benchmarks. A big upside you didn't mention, though, is that for debugging you can just do NUMBA_DISABLE_JIT=1 and then use pdb/ipdb in all its glory.

We might as well use this PR to discuss a roadmap for performant code in the future. I think we might actually be able to implement an optional dependency, where the Numba version is used if available, or the Cython version if it isn't, using the new Pure Python mode in Cython:

http://cython.readthedocs.io/en/latest/src/tutorial/pure.html

mrocklin · 2017-11-10T07:18:23Z

Just to be clear, my intention here isn't to push for the use of Numba within scikit-image, optional or otherwise. I was sitting with Stefan and we were talking about this topic. I thought I'd put something together strictly as a proof of concept in case others might find it interesting.

mrocklin · 2017-11-10T07:19:37Z

Datashader is not currently available on PyPI, to avoid broken or low-performance installations that come from not keeping track of C/C++ binary dependencies such as LLVM (required by Numba).

I'm not sure I can explain this comment. It might be worth asking them on their issue tracker what their concerns are. From what I understand, the Numba wheels on PyPI are decently compatible everywhere, though again, I am not an expert here.

stefanv · 2017-11-13T05:47:57Z

@jni The datashader comment also doesn't make sense to me; do you have any further insight?

With type annotations, we should be able to switch out numba and Cython to some extent; but what is the advantage of supporting both?

jni · 2017-11-14T00:06:58Z

@stefanv I don't have any further insight, you would have to contact the datashader team to find out more. All I know is that neither pip install datashader nor conda install -c conda-forge datashader got me the latest version and I was annoyed. =P

As to supporting both, I don't know where Numba is in terms of ARM support, but my intuition is that Cython/C support remains far more widespread across many platforms. (I just got a Raspberry Pi in the mail today, hence my sudden interest in ARM. =P) Additionally, debugging and profiling tools are also currently limited for Numba. If we get some performance regression like this one, we have very little recourse (other than becoming LLVM experts).

Again, as I've said before, I'm bullish on Numba, but I think we should dip our toes in, rather than dive in head-first.

jni · 2017-11-14T00:08:16Z

@mrocklin thank you for your clarifications and for this experiment!

mrocklin · 2017-11-14T00:30:37Z

You might want to check in with @seibert , who is both a Numba dev and a huge Raspberry Pi fan. There are definitely Numba builds (and many other builds) for ARM. You might also want to see this talk by @jjhelmus

jjhelmus · 2017-11-14T17:50:58Z

AFAIK Numba does not yet support ARM or at least support for the platform is experimental. There is as armv7 PR from a year and a half ago, numba/numba#1968, which might be resurrected.

If you want a up-to-date Python data science stack on the Raspberry Pi, berryconda can provide it. The rpi channel provides most of the packages you would likely want, including scikit-learn, for armv6 and armv7.

jakirkham · 2017-11-14T19:46:30Z

@jni, the latest version of datashader should be available in conda-forge. We recently added the feedstock a few weeks back. If not, feel free to raise an issue at the feedstock.

Replace Cython with Numba in _moments_cy.pyx

d56ebff

This is an experiment that replaces a small piece of SKImage's Cython code, skimage/measure/_moments_cy.py, with Numba. This is just here for discussion and to facilitate benchmarking. It should not be merged.

soupault added the 💬 Discussion label Nov 10, 2017

Base automatically changed from master to main February 18, 2021 18:22

mkcor mentioned this pull request Feb 22, 2021

2021's calendar of community management #5169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment replacing Cython with Numba in _moments #2873

Experiment replacing Cython with Numba in _moments #2873

mrocklin commented Nov 10, 2017

pep8speaks commented Nov 10, 2017

jni commented Nov 10, 2017

mrocklin commented Nov 10, 2017

mrocklin commented Nov 10, 2017

stefanv commented Nov 13, 2017

jni commented Nov 14, 2017

jni commented Nov 14, 2017

mrocklin commented Nov 14, 2017

jjhelmus commented Nov 14, 2017

jakirkham commented Nov 14, 2017

Experiment replacing Cython with Numba in _moments #2873

Are you sure you want to change the base?

Experiment replacing Cython with Numba in _moments #2873

Conversation

mrocklin commented Nov 10, 2017

pep8speaks commented Nov 10, 2017

jni commented Nov 10, 2017

mrocklin commented Nov 10, 2017

mrocklin commented Nov 10, 2017

stefanv commented Nov 13, 2017

jni commented Nov 14, 2017

jni commented Nov 14, 2017

mrocklin commented Nov 14, 2017

jjhelmus commented Nov 14, 2017

jakirkham commented Nov 14, 2017