New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pass counting array to bincount #22471
Comments
See also #7998 (which suggested an Overall, I think there is general agreement something like this would be great to have, but it needs someone to actually implement it (without slowing down Anyway, in the end I think it is mostly someone finding the time to do it... |
@mhvk Thanks for pointing out these earlier issues. So this has come up a couple of times. It looks like there are four things that people want to change in
Implementing everything seems a bigger task to me. But implementing only points 3 and 4 while not attacking points 1 and 2 seems doable: accept only a 1d The question is: is there interest in the limited change? Or should a change try to cover everything? |
@ntessore - I think the limited change that you suggest is what most people have asked for, and could very easily stand on its own! |
Ok, I give it a go when I have a moment. If there are any existing C API functions which have a similar sort of |
The suggestion of |
I think I've got something functional working for Just curious what y'all (@ntessore, @mhvk) think: rather than using |
Agreed that perhaps Overall, I think I'm still somewhat in favor of |
Sorry for dropping the ball on this. I managed to actually make some progress towards all of 1-4 using a rather trivial NpyIter implementation, except I haven't worked out how to support arbitrary dtypes without manually repeating the code, so it got stuck and forgotten. I only mention this because playing around with the code, it's pretty clear that beyond the "common interface" points that @mhvk raises, all combinations of |
Proposed new feature or change:
I would like to propose adding the output array as an optional parameter to
bincount
. This makesbincount
very useful when iteratively tallying large amounts of data with large indices.After a bit of discussion on the mailing list, I think that a good interface for this might be a
bins
parameter, name debatable, that can either be an integer, which fixes the number of bins, or an array, to which the bin counts are then added. If there are indices larger than allowed by thebins
specification, the function should probably raise aValueError
.To see where this can be very useful, consider this example of making a high-resolution map from large batches of data:
This would be trivially sped up:
Loops like the above can be found a lot in e.g. astronomy.
As far as I know, there is no equivalent numpy functionality, and there isn't any fast alternative outside of writing the simplest of loops in C/Cython/numba/..., which is what some packages, including my own, have done many times over. But this exact loop is what
bincount
does under the hood, and the change still fits both the purpose and description ofbincount
nicely, without changing existing behaviour.The only moral issue is of course what happens when both
minlength
andbins
are given. I thinkminlength
should then be silently ignored. (I actually cannot imagine a situation where you actually want to specifyminlength
and notbins
, but that's beside the point.)The text was updated successfully, but these errors were encountered: