Numbagg: Fast N-dimensional aggregation functions with Numba
Currently accelerated functions:
Easy to extend
Numbagg makes it easy to write, in pure Python/NumPy, flexible aggregation functions accelerated by Numba. All the hard work is done by Numba's JIT compiler and NumPy's gufunc machinery (as wrapped by Numba).
For example, here is how we wrote
import numpy as np from numbagg.decorators import ndreduce @ndreduce def nansum(a): asum = 0.0 for ai in a.flat: if not np.isnan(ai): asum += ai return asum
Advantages over Bottleneck
- Way less code. Easier to add new functions. No ad-hoc templating system. No Cython!
- Fast functions still work for >3 dimensions.
axisargument handles tuples of integers.
- ufunc broadcasting lets us supply an array for the
windowin moving window functions.
The functions in Numbagg are adapted from (and soon to be tested against) Bottleneck's battle-hardened Cython. Still, Numbagg is experimental, and probably not yet ready for production.
Initial benchmarks are quite encouraging. In many cases, Numbagg/Numba has competitive performance with Bottleneck/Cython:
import numbagg import numpy as np import bottleneck x = np.random.RandomState(42).randn(1000, 1000) x[x < -1] = np.NaN # timings with numba=0.15.1-20-gd877602 and bottleneck=0.8.0 In : %timeit numbagg.nanmean(x) 100 loops, best of 3: 2.39 ms per loop In : %timeit numbagg.nanmean(x, axis=0) 100 loops, best of 3: 9.54 ms per loop In : %timeit numbagg.nanmean(x, axis=1) 100 loops, best of 3: 2.77 ms per loop In : %timeit bottleneck.nanmean(x) 100 loops, best of 3: 2.27 ms per loop In : %timeit bottleneck.nanmean(x, axis=0) 100 loops, best of 3: 9.03 ms per loop In : %timeit bottleneck.nanmean(x, axis=1) 100 loops, best of 3: 2.3 ms per loop
To see these performance numbers, you'll need to install the dev version of
Numba, as Numba's handling of the
.flat iterator was sped up considerably
in a recent PR.
Numbagg includes somewhat awkward workarounds for features missing from NumPy/Numba:
- It implements its own cache for functions wrapped by Numba's
guvectorize, because that decorator is rather slow.
- It does its own handling of array transposes to handle the
axisargument, which we hope will eventually be directly supported by all NumPy gufuncs.
- It uses some terrible hacks to hide the out-of-bound memory access necessary to write gufuncs that handle scalar values with Numba.
I hope that the need for most of these will eventually go away. In the meantime, expect Numbagg to be tightly coupled to Numba and NumPy release cycles.
MIT. Includes portions of Bottleneck, which is distributed under a Simplified BSD license.