-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions on improving numpy.bincount #8495
Comments
An |
I admit it is way too slow and should be sped up, but for vector weights using |
Yes I looked up the example at https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ufunc.at.html An implementation with |
So, a PR for this or ask on the mail-list first? |
Not sure what exactly you mean, but you can probably do whatever you feel is right. As I said, the problem with |
I see. You meant ufunc.at is very slow and ufunc.at shall not be used for vector bincount. So this is far from trivial. For the record.
|
I took a look at the cause of slowness of Buffering is requested by [1]. However, the comment at [2] suggests buffering is very slow when subspace is small (is it 1 in this case?) [1] https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L5384 [2] https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L2534 |
Yes, that abuse is probably the biggest reason. I am aware of it, and maybe a bit at fault that it is there, but nobody ever bothered to try to get rid of it.... |
Note that in general you need to convert types there though, but you can do it much faster then that. You will never achieve bincount speeds, but I would not be suprised by a factor of 5-10 faster for ufunc.at. More could be achieved if we somehow manage to give low-level access to the mapping iterator. But then you also need specializations, etc., etc. |
Indeed. After this patch [1], without cast is 3x faster than with cast. All test cases passes, but I dislike the catch and clear approach -- is there a NpyIter function that tests if buffering is necessary for given operands?
[1]:
|
It is still 10x slower than bincount.
|
The correct approach there is to delete that iterator completely and replace it with individual element casts (at least for the case where it is individual elements), I believe (did not read the code again though). You will never get bincount speeds. To get close there you would have to iterate like fancy indexing does with at least a 1-D specialization. And that is not even possible too easily, because I doubt you have access to all you need through the public mapiter API. |
I think we can probably avoid using MapIter at all. Are their other users of MapIter? It feels using MapIter here is a bit of a stretch. We can use a NpyIter on index and weights instead.
|
You are not going to get advanced indexing easily without using mapiter, for a single index array, sure, you can/should avoid it for speed reasons. |
What does |
Well, fancy indexing has some special cases for both (IIRC, maybe not). Mostly the 1 index and 1D case is likely important of course (if you assume that the part which is not indexed is large enough), but you can also branch of to all cases, which advanced indexing does, but it gives a lot of different inner loops, including some avoiding MapIter altogether. Going all the way with that optimization will be quite a chunk of work, and may require to get more info from MapIter. I dunno if parts of it may be easier to do, such as cases where advanced indexing avoids the MapIter (maybe even not avoiding it fully, but also not using it for iteration, etc.). But you can get a good factor (similar or bigger to what you already saw) with just fixing the buffering and specializing the single item loop and still using MapIter.... |
I don't really want to get into too much detail, because I would have to reread the advanced indexing code myself, heh... |
Hi. Making the On the other hand, I'd like to poke around at a relatively fast bincount like this:
I'll start by avoiding MapIter. If the out array is really weird, then we can resort to holding the result in a temp, before using the MapIter or alike to copying it back to the output. |
Honestly, I don't understand that signature. If you pass in the ufunc, might as well use the |
(though I admit that its a bit non-trivial that you need to prepare the out array first... |
Just thought that I'd link back to some previous discussion (with many comments from @seberg ;-):
All of this suggests both speed-ups of |
The reason I suggest to tamper with bincount is because a copy of the out may be necessary -- however, we do expect If we work out a way of bincount without copy in the future, then ufunc.at can be replaced with the code in bincount. Thanks mhvk for the links. My proposal was actually proposed in one of the threads , though it appears nobody actually implement it. |
Another cross-reference: gh-9397 |
>>> import time, numpy as np
>>> index = np.arange(100000, dtype='i8') % 10
>>> weight = np.arange(100000, dtype='f4')
>>> out = np.zeros(index.max() + 1)
>>> t0 = time.perf_counter(); xxx = np.bincount(index, weights=weight, minlength=len(out)); t1 = time.perf_counter()
>>> t_bincout = t1 - t0; t_bincount
0.0013672089553438127
>>> t0 = time.perf_counter(); xxx = np.add.at(out, index, weight); t1 = time.perf_counter()
>>> t_at_casting = t1 - t0; t_at_casting
0.030268130998592824
>>> out = np.zeros(index.max() + 1, dtype='f4')
>>> t0 = time.perf_counter(); xxx = np.add.at(out, index, weight); t1 = time.perf_counter()
>>> t_at_fast = t1 - t0; t_at_fast
0.0010713650262914598
>>> t_at_casting / t_bincount, t_at_fast / t_bincount
(22.138628393478655, 0.7836146933531773) |
PR #23136 should make ufunc.at (for 1d, aligned arrays with no casting) as fast as bincount. Closing and opening a new issue to continue discussion there. |
Currently, numpy.bincount only allows scalar weights per bin. There was a stackoverflow article about using supporting vector weights. Can we extend the function to such that the result propagates the shape of the weights into the result?
Can we also add an 'out' argument to allow saving the result to an existing array? This will be useful for very sparsely populated bins and large input data.
The text was updated successfully, but these errors were encountered: