Suggestions on improving numpy.bincount #8495

rainwoodman · 2017-01-18T02:12:53Z

Currently, numpy.bincount only allows scalar weights per bin. There was a stackoverflow article about using supporting vector weights. Can we extend the function to such that the result propagates the shape of the weights into the result?
Can we also add an 'out' argument to allow saving the result to an existing array? This will be useful for very sparsely populated bins and large input data.

rgommers · 2017-01-18T07:09:53Z

An out keyword seems okay to me. For the vector weights, do you have a link to the stackoverflow question?

seberg · 2017-01-18T08:20:54Z

I admit it is way too slow and should be sped up, but for vector weights using np.add.at may be an OK option, which won't blow up the bincount implementation a lot.

rainwoodman · 2017-01-18T19:11:49Z

Here is the link:
http://stackoverflow.com/questions/5205345/using-numpy-bincount-with-array-weights

rainwoodman · 2017-01-18T19:23:08Z

Yes I looked up the example at

https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ufunc.at.html

An implementation with np.add could solve both issues. We shall also mention advanced uses of ufunc.at like this. For example, it appears to me we can find the minimum per bin, if we replace np.add with np.fmin?

rainwoodman · 2017-01-20T22:19:28Z

So, a PR for this or ask on the mail-list first?

seberg · 2017-01-20T22:23:33Z

Not sure what exactly you mean, but you can probably do whatever you feel is right. As I said, the problem with ufunc.at is that it is very slow, so implementing an out argument for bincount by defering to it would seem probably pretty useless.

rainwoodman · 2017-01-20T23:26:48Z

I see. You meant ufunc.at is very slow and ufunc.at shall not be used for vector bincount. So this is far from trivial.

For the record.

In [14]:

out = numpy.zeros(index.max() + 1)

%timeit add.at(out, index, weight)

100 loops, best of 3: 18 ms per loop

In [18]:

out = numpy.zeros(index.max() + 1)

%timeit numpy.bincount(index, weights=weight, minlength=len(out))

1000 loops, best of 3: 607 µs per loop

rainwoodman · 2017-01-21T00:02:43Z

I took a look at the cause of slowness of ufunc.at. Here is an hypothesis -- I don't know if this makes sense or not:

Buffering is requested by [1]. However, the comment at [2] suggests buffering is very slow when subspace is small (is it 1 in this case?)

[1] https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L5384

[2] https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L2534

seberg · 2017-01-21T00:06:31Z

Yes, that abuse is probably the biggest reason. I am aware of it, and maybe a bit at fault that it is there, but nobody ever bothered to try to get rid of it....

seberg · 2017-01-21T00:07:56Z

Note that in general you need to convert types there though, but you can do it much faster then that. You will never achieve bincount speeds, but I would not be suprised by a factor of 5-10 faster for ufunc.at. More could be achieved if we somehow manage to give low-level access to the mapping iterator. But then you also need specializations, etc., etc.

rainwoodman · 2017-01-21T00:52:29Z

Indeed. After this patch [1], without cast is 3x faster than with cast. All test cases passes, but I dislike the catch and clear approach -- is there a NpyIter function that tests if buffering is necessary for given operands?

In [23]:

index = arange(100000, dtype='i8') % 100

weight = arange(100000, dtype='f4')

out = numpy.zeros(index.max() + 1, dtype='f8')

%timeit add.at(out, index, weight)

100 loops, best of 3: 7.91 ms per loop

In [24]:

index = arange(100000, dtype='i8') % 100

weight = arange(100000, dtype='f8')

out = numpy.zeros(index.max() + 1, dtype='f8')

%timeit add.at(out, index, weight)



100 loops, best of 3: 2.29 ms per loop

In [26]:

index = arange(100000, dtype='i8') % 100

weight = arange(100000, dtype='f4')

out = numpy.zeros(index.max() + 1, dtype='f4')

%timeit add.at(out, index, weight)

100 loops, best of 3: 2.3 ms per loop

[1]:

diff --git a/numpy/core/src/umath/ufunc_object.c b/numpy/core/src/umath/ufunc_object.c
index 52f11ee..db58268 100644
--- a/numpy/core/src/umath/ufunc_object.c
+++ b/numpy/core/src/umath/ufunc_object.c
@@ -5382,12 +5382,26 @@ ufunc_at(PyUFuncObject *ufunc, PyObject *args)
                         NPY_ITER_EXTERNAL_LOOP|
                         NPY_ITER_REFS_OK|
                         NPY_ITER_ZEROSIZE_OK|
+//                        NPY_ITER_BUFFERED|
+                        NPY_ITER_GROWINNER|
+                        NPY_ITER_DELAY_BUFALLOC,
+                        NPY_KEEPORDER, NPY_UNSAFE_CASTING,
+                        op_flags, dtypes,
+                        -1, NULL, NULL, buffersize);
+
+    if (iter_buffer == NULL) {
+        PyErr_Clear();
+        iter_buffer = NpyIter_AdvancedNew(nop, array_operands,
+                        NPY_ITER_EXTERNAL_LOOP|
+                        NPY_ITER_REFS_OK|
+                        NPY_ITER_ZEROSIZE_OK|
                         NPY_ITER_BUFFERED|
                         NPY_ITER_GROWINNER|
                         NPY_ITER_DELAY_BUFALLOC,
                         NPY_KEEPORDER, NPY_UNSAFE_CASTING,
                         op_flags, dtypes,
                         -1, NULL, NULL, buffersize);
+    }

rainwoodman · 2017-01-21T00:53:06Z

It is still 10x slower than bincount.

In [21]:

index = arange(100000, dtype='i8') % 10

weight = arange(100000, dtype='f4')

out = numpy.zeros(index.max() + 1)

%timeit numpy.bincount(index, weights=weight, minlength=len(out))

10000 loops, best of 3: 191 µs per loop

seberg · 2017-01-21T10:08:41Z

The correct approach there is to delete that iterator completely and replace it with individual element casts (at least for the case where it is individual elements), I believe (did not read the code again though). You will never get bincount speeds. To get close there you would have to iterate like fancy indexing does with at least a 1-D specialization. And that is not even possible too easily, because I doubt you have access to all you need through the public mapiter API.

rainwoodman · 2017-01-25T01:26:55Z

I think we can probably avoid using MapIter at all. Are their other users of MapIter? It feels using MapIter here is a bit of a stretch. We can use a NpyIter on index and weights instead.

to define bincount/at, random access of the out array must be possible; in numpy data model, when it is possible it is fast (strides * index); dtype is also known.
Use a regular numpy iterator on the index and the weights array, where the weights is cast to the dtype of out, and index is cast to an integer type (e.g. intp?)
Have to call the ufunc loop with length of 1 because we do not have any better alterantives in the UFunc data structure.
Probably add a special code path for "+" and hope the compiler will lift the condition out of the loop.

seberg · 2017-01-25T07:23:23Z

You are not going to get advanced indexing easily without using mapiter, for a single index array, sure, you can/should avoid it for speed reasons.

rainwoodman · 2017-01-25T19:34:20Z

What does single index array mean? ndim = 1 or array[index] ...?

seberg · 2017-01-25T20:50:32Z

Well, fancy indexing has some special cases for both (IIRC, maybe not). Mostly the 1 index and 1D case is likely important of course (if you assume that the part which is not indexed is large enough), but you can also branch of to all cases, which advanced indexing does, but it gives a lot of different inner loops, including some avoiding MapIter altogether.

Going all the way with that optimization will be quite a chunk of work, and may require to get more info from MapIter. I dunno if parts of it may be easier to do, such as cases where advanced indexing avoids the MapIter (maybe even not avoiding it fully, but also not using it for iteration, etc.).

But you can get a good factor (similar or bigger to what you already saw) with just fixing the buffering and specializing the single item loop and still using MapIter....

seberg · 2017-01-25T20:50:53Z

I don't really want to get into too much detail, because I would have to reread the advanced indexing code myself, heh...

rainwoodman · 2017-01-25T22:43:29Z

Hi. Making the at method faster benefits all categorization operations though; so rereading the advanced indexing code maybe worthwhile.

On the other hand, I'd like to poke around at a relatively fast bincount like this:

def bincount(indices, weights, minlength=None, ufunc=np.add, out=None):

I'll start by avoiding MapIter. If the out array is really weird, then we can resort to holding the result in a temp, before using the MapIter or alike to copying it back to the output.

seberg · 2017-01-25T22:54:24Z

Honestly, I don't understand that signature. If you pass in the ufunc, might as well use the np.add.at signature and put in faster special cases into ufunc.at. I simply do not have the time to really think about how to efficiently implement ufunc.at in a nice way.

seberg · 2017-01-25T22:55:03Z

(though I admit that its a bit non-trivial that you need to prepare the out array first...

mhvk · 2017-01-26T02:48:23Z

Just thought that I'd link back to some previous discussion (with many comments from @seberg ;-):

ENH: add an axis argument to np.bincount #4330 which added an axis argument (really quite useful, pity @jaimefrio closed it, though I can understand the reasons somewhat).
ufunc.at perfomance >10x too slow #5922 with quite a bit of discussion about performance, including links to a package from which perhaps some stuff could be ported over.
out kwarg to bincount #7998 on an out argument (especially useful if one would not necessarily zero it before starting)

All of this suggests both speed-ups of np.<ufunc>.at and generalizations of np.bincount (that keeps its great performance) would be very welcome! (I'd certainly use a bincount version that could count with vector weights myself...).

rainwoodman · 2017-01-26T21:42:51Z

The reason I suggest to tamper with bincount is because a copy of the out may be necessary -- however, we do expect ufunc.at to be explicitly in-place.

If we work out a way of bincount without copy in the future, then ufunc.at can be replaced with the code in bincount.

Thanks mhvk for the links. My proposal was actually proposed in one of the threads , though it appears nobody actually implement it.

WarrenWeckesser · 2020-04-27T23:24:57Z

Another cross-reference: gh-9397

mattip · 2023-02-01T15:17:52Z

ufunc.add.at got a performance boost in #23136. But it does not cover casting:

>>> import time, numpy as np
>>> index = np.arange(100000, dtype='i8') % 10
>>> weight = np.arange(100000, dtype='f4')
>>> out = np.zeros(index.max() + 1)
>>> t0 = time.perf_counter(); xxx = np.bincount(index, weights=weight, minlength=len(out)); t1 = time.perf_counter()
>>> t_bincout = t1 - t0; t_bincount
0.0013672089553438127
>>> t0 = time.perf_counter(); xxx = np.add.at(out, index, weight); t1 = time.perf_counter()
>>> t_at_casting = t1 - t0; t_at_casting
0.030268130998592824
>>> out = np.zeros(index.max() + 1, dtype='f4')
>>> t0 = time.perf_counter(); xxx = np.add.at(out, index, weight); t1 = time.perf_counter()
>>> t_at_fast = t1 - t0; t_at_fast
0.0010713650262914598
>>> t_at_casting / t_bincount, t_at_fast / t_bincount
(22.138628393478655, 0.7836146933531773)

mattip · 2023-02-08T08:44:48Z

PR #23136 should make ufunc.at (for 1d, aligned arrays with no casting) as fast as bincount. Closing and opening a new issue to continue discussion there.

rainwoodman changed the title ~~Suggestions improving numpy.bincount~~ Suggestions on improving numpy.bincount Jan 18, 2017

rgommers added the 23 - Wish List label Jan 18, 2017

mhvk mentioned this issue Jul 10, 2017

Feature request: Incremental bincount and axis argument #9397

Open

rgommers added 01 - Enhancement and removed 23 - Wish List labels Aug 1, 2018

mhvk mentioned this issue Oct 23, 2022

ENH: pass counting array to bincount #22471

Open

arthurfeeney mentioned this issue Dec 31, 2022

ENH: add initial and out parameters to bincount #22907

Closed

mattip mentioned this issue Feb 1, 2023

ENH: create and use indexed inner loops #23136

Merged

mattip closed this as completed Feb 8, 2023

mattip mentioned this issue Feb 8, 2023

ENH: make ufunc.at faster for more cases #23176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions on improving numpy.bincount #8495

Suggestions on improving numpy.bincount #8495

rainwoodman commented Jan 18, 2017

rgommers commented Jan 18, 2017

seberg commented Jan 18, 2017

rainwoodman commented Jan 18, 2017

rainwoodman commented Jan 18, 2017

rainwoodman commented Jan 20, 2017

seberg commented Jan 20, 2017

rainwoodman commented Jan 20, 2017

rainwoodman commented Jan 21, 2017

seberg commented Jan 21, 2017

seberg commented Jan 21, 2017

rainwoodman commented Jan 21, 2017

rainwoodman commented Jan 21, 2017

seberg commented Jan 21, 2017

rainwoodman commented Jan 25, 2017 •

edited

Loading

seberg commented Jan 25, 2017

rainwoodman commented Jan 25, 2017

seberg commented Jan 25, 2017

seberg commented Jan 25, 2017

rainwoodman commented Jan 25, 2017

seberg commented Jan 25, 2017

seberg commented Jan 25, 2017

mhvk commented Jan 26, 2017

rainwoodman commented Jan 26, 2017

WarrenWeckesser commented Apr 27, 2020

mattip commented Feb 1, 2023

mattip commented Feb 8, 2023

Suggestions on improving numpy.bincount #8495

Suggestions on improving numpy.bincount #8495

Comments

rainwoodman commented Jan 18, 2017

rgommers commented Jan 18, 2017

seberg commented Jan 18, 2017

rainwoodman commented Jan 18, 2017

rainwoodman commented Jan 18, 2017

rainwoodman commented Jan 20, 2017

seberg commented Jan 20, 2017

rainwoodman commented Jan 20, 2017

rainwoodman commented Jan 21, 2017

seberg commented Jan 21, 2017

seberg commented Jan 21, 2017

rainwoodman commented Jan 21, 2017

rainwoodman commented Jan 21, 2017

seberg commented Jan 21, 2017

rainwoodman commented Jan 25, 2017 • edited Loading

seberg commented Jan 25, 2017

rainwoodman commented Jan 25, 2017

seberg commented Jan 25, 2017

seberg commented Jan 25, 2017

rainwoodman commented Jan 25, 2017

seberg commented Jan 25, 2017

seberg commented Jan 25, 2017

mhvk commented Jan 26, 2017

rainwoodman commented Jan 26, 2017

WarrenWeckesser commented Apr 27, 2020

mattip commented Feb 1, 2023

mattip commented Feb 8, 2023

rainwoodman commented Jan 25, 2017 •

edited

Loading