For discussion: numba_scipy.stats #42

luk-f-a · 2020-05-17T18:21:42Z

hi everyone!

this is meant as a way to gather feedback on the current status of numba_scipy.stats. I'm pinging people that have expressed interest in numba_scipy.stats and/or are involved in numba and scipy. I'd like to share what I've learned so far, and hopefully you'll share your perspective on this.

Since last year I've been looking into scipy.stats with a view of getting numba-scipy.stats started. I created a prototype in #16. It's viable, but the experience has led me to question the cost/benefit tradeoff of following that path.

The main technical complication with scipy.stats is that is not function based, but object based. It relies on several of Python's OO features like inheritance and operator overloading. Numba has great support for function-based libraries (or method based, when the number of objects is limited) like Numpy. However, the support for classes (via jitclasses) is more limited and experimental. Outside of jitclasses, the only other option is to use the extending module, with the added effort that it implies.

The consequence of the above is that it will not be possible to fully imitate the behaviour of scipy.stats. At least not in the medium term, and not without a lot of work.

Even if jitclasses worked exactly as python classes, scipy.stats has more than a hundred distributions, each of them with more than 10 methods. If we followed the way of how numba supports numpy, we are talking about 1000+ methods to re-write. In some cases there will be performance improvements, but in some cases there won't.

Look at the following example:

from scipy.stats import norm
from numba import njit

def foo():
    k = 20000
    x = np.zeros(k)
    for m in range(100):
        x += norm.rvs(m, 1, size= 20000)
    return x

foo_jit = njit(foo)

@njit
def bar():
    k = 20000
    x = np.zeros(k)
    for m in range(100):
        with nb.objmode(y='float64[:]'):
            y = norm.rvs(m, 1, size= 20000)
        x += y
    return x

%timeit foo() #66 ms ± 277 µs

foo_jit()
%timeit foo_jit() #73.7 ms ± 214 µs

bar()
%timeit bar() #65.8 ms ± 208 µs

There's no performance improvement at all, because most of the work is already done in C. This will be the case in many scipy.stats functions.

To summarize, I see a few ways forward, each with pros and cons:

jitclass based solution
- pros: easy for people to contribute (not much more than being competent with python and having used numba before)
- cons: won't replicate scipy's behaviour, will regularly find jitclass' limitations and will have to find workarounds, will require 1000s of man-hours to build. All that effort does not build anything new, just a copy of existing scipy features.
low-level numba extension (http://numba.pydata.org/numba-doc/latest/extending/low-level.html)
- pros: should be able to reproduce all or most behaviour
- cons: harder to work with: would increase the effort required and limit the number of contributors. All that effort does not build anything new, just a copy of existing scipy features.
objmode approach = no jitted solution
- pros: all existing features are immediately supported.
- cons: all currently slow methods remain slow. added overhead of entering objmode, both in runtime and in boilerplate code. This last point might be made lighter by these: Calling objectmode function from nopython mode numba#5461 and Pass thru pyobjects numba#3282

I personally lean towards option 3 at the moment. I might write some custom code that calls special functions if I really need performance. But I'm not feeling very attracted to the idea of re-implementing such a large module as scipy.stats.

It would be great to hear your perspective on this.

cc: @gioxc88 @francoislauger @stnatter @remidebette @rileymcdowell @person142 @stuartarchibald

The text was updated successfully, but these errors were encountered:

LordGav · 2021-01-01T16:10:00Z

I really wanted to use stats.skew() and stats.kurtosis(), is there any way to do that?

stnatter · 2021-01-04T02:36:37Z

None of these alternative routes seem to be particularly appealing. The upside seems very limited indeed. Thanks for your perspective @luk-f-a

Happy New Year!

HDembinski · 2021-01-26T18:41:36Z

To apply the method of maximum likelihood, fast implementations of the pdfs and cdfs are needed, option 3 would not do. There are speed gains of factor 100 currently if for example norm.cdf is replaced by a custom implementation based on the erf in scipy.special.cython_function.

HDembinski · 2021-02-02T16:50:18Z

I started a repository with fast implementations here https://github.com/HDembinski/numba-stats that work for me. It would be great to merge this into numba-scipy, but it is not straight-forward, since I did not implement the scipy API, just added some fast versions of norm.pdf, etc.

For now, numba-stats wraps the special functions from scipy.special.cython_special independently of numba-scipy, but eventually once numba-scipy is stable, I would prefer to depend on numba-scipy.

HDembinski · 2021-02-02T23:44:46Z

Adding to that, the speed gains are dramatic as mentioned before, I see up to a factor 100 in some cases, less for large arrays. There seems to be a very large call overhead in scipy. I added some benchmarks with pytest-benchmark to my repo, just run pytest and see what you get.

HDembinski · 2021-02-02T23:52:29Z

In my field (high energy physics) having fast stats translates directly into fast turn-around when developing non-linear fits, which is the default for us. The speed-up in the stats functions translates very nicely into equivalent speed-ups of the fits. Which means we can build more complex fits and bootstrap our fit results.

luk-f-a · 2021-02-03T21:41:38Z

if you need fast code for stats, and don't need to follow the scipy API, then rvlib is a good library. sadly unmaintained, but the code is there if you want to use it.

HDembinski · 2021-02-10T20:50:32Z

if you need fast code for stats, and don't need to follow the scipy API, then rvlib is a good library.

Thank you for pointing this out. rvlib claims to have a better API than Scipy, I could not see that from a quick look. I really want numba-scipy to offer this functionality.

In the meantime, I realize that wrapping scipy is not that hard, a lot of scipy's implementations just call some C function. I am puzzled why it is so slow if the actual work is done in C anyway.

luk-f-a · 2021-02-11T08:13:22Z

I really want numba-scipy to offer this functionality.

What functionality? Fast pdf and cdf under a scipy API?

I am puzzled why it is so slow if the actual work is done in C anyway.

There might be some work being done that you are not considering. You say that it's slow, are you comparing the speed to a pure C implementation?

dlee992 · 2022-08-26T06:42:11Z

Hi, @luk-f-a. Using the current master code in this project, could I implement scipy.stats.truncnorm.rvs easily? I also found JAX has supported for this API (using jax.random.truncated_normal and related stuff to rewrite), and achieved 10~1000x speedup (seems using multithreading and vectorized? I don't know it well).

Based on ur example on norm.rvs(), numba didn't achieve any speedup, I guess I have to give up the idea that using numba to speedup truncnorm.rvs() as well.

luk-f-a · 2022-08-28T21:00:05Z

@dlee992 I don't think you can take what I did for norm.rvs() as a reference for what could happen with truncnorm.rvs(). My argument was that:

it is impossible to fully replicate the scipy API, in particular under the constraint of "jit transparency" which it's what numba has for numpy: functions "just work" when inside a jitted function, without the user making any adjustment.
even getting close to the scipy API would required too much work that it's not justified by the performance gain.

In your case, it does not sound like either of these points applies to you, so you shouldn't read too much into my conclusions, because the starting point is not the same.

On the first point, it seems that you don't need identical API or jit transparency. Please note that JAX supports the truncated_normal distribution, but this is different from saying that it supports the scipy API. It does not look to me that it does the latter, because the functions have different names. If you are willing to depart from scipy's API, there's a lot that numba can do.

Also, each function in the stats module has a different degree of performance in scipy. Functions which are already fast, as norm.rvs() are unlikely to be sped up by Numba or JAX. I don't know how truncnorm.rvs is implemented, but if JAX can improve 10-1000X, then it's not efficiently implemented. This means that my second point also does not apply to your case. There are performance gains to be achieved, even if it means re-implemented the method entirely. Since you were willing to do that work in JAX, then the performance gain is worth enough to you, and there's nothing that stops Numba from achieving that performance.

Going back to your question:

could I implement scipy.stats.truncnorm.rvs easily?

If you want to implement the functionality, ie build a function that produces the same results, the effort will be similar to building it in JAX. If you want to implement the functionality and replicate the API when called from normal python code, ie a non-jit function that calls my_truncnorm.rvs(), which is itself jitted, then it won't be hard, similar to building it in JAX.
If you want to perfectly replicate scipy API and behaviour even when calling methods from inside another jit function, then yes, it will be hard. But the problem won't be the code of truncnorm.rvs, the problem is simulating the scipy API under jit transparency.

dlee992 · 2022-08-29T02:43:15Z

@luk-f-a , many thanks for this thoughtful and detailed explaination! Now, I know my situation very well! I will figure out pro-and-cons about implmemting it using numba. In fact, JAX uses multithreading, and sometimes creates too many threads, even more than the max limit on linux... And some JAX issues (e.g., google/jax#11168) confirm this point, and JAX guys seem not want to "fix" this.

matthewfeickert mentioned this issue Feb 2, 2021

Investigate possible speedups through Numba scikit-hep/pyhf#364

Open

dlee992 mentioned this issue Aug 26, 2022

numba-scipy is now accepting PRs, discuss what to focus on first! #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For discussion: numba_scipy.stats #42

For discussion: numba_scipy.stats #42

luk-f-a commented May 17, 2020

LordGav commented Jan 1, 2021

stnatter commented Jan 4, 2021

HDembinski commented Jan 26, 2021

HDembinski commented Feb 2, 2021 •

edited

HDembinski commented Feb 2, 2021

HDembinski commented Feb 2, 2021

luk-f-a commented Feb 3, 2021

HDembinski commented Feb 10, 2021 •

edited

luk-f-a commented Feb 11, 2021

dlee992 commented Aug 26, 2022 •

edited

luk-f-a commented Aug 28, 2022

dlee992 commented Aug 29, 2022

For discussion: numba_scipy.stats #42

For discussion: numba_scipy.stats #42

Comments

luk-f-a commented May 17, 2020

LordGav commented Jan 1, 2021

stnatter commented Jan 4, 2021

HDembinski commented Jan 26, 2021

HDembinski commented Feb 2, 2021 • edited

HDembinski commented Feb 2, 2021

HDembinski commented Feb 2, 2021

luk-f-a commented Feb 3, 2021

HDembinski commented Feb 10, 2021 • edited

luk-f-a commented Feb 11, 2021

dlee992 commented Aug 26, 2022 • edited

luk-f-a commented Aug 28, 2022

dlee992 commented Aug 29, 2022

HDembinski commented Feb 2, 2021 •

edited

HDembinski commented Feb 10, 2021 •

edited

dlee992 commented Aug 26, 2022 •

edited