Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add initial and out parameters to bincount #22907

Closed
wants to merge 1 commit into from

Conversation

arthurfeeney
Copy link

@arthurfeeney arthurfeeney commented Dec 31, 2022

This PR adds optional parameters initial and out to np.bincount, mostly based on discussion in #22471 and a prior attempt at this #9424.

  • initial is used to set the initial values for the output array.
  • out is an alternative output parameter. if initial is out, then np.bincount will reuse and directly accumulate onto out. (this allows bincount to reuse out across multiple calls, rather than overwriting it.)

It's a bit niche, but a feature like this has been requested several times in the past. Including these: #22471, #8495, #9424. It's sort of a faster alternative to np.add.at.

The intended use case is essentially to do bincounts over large chunks of data:

high_res_map = np.zeros(10000**2)
for indices, quantities in next_big_chunk_of_data():
    np.bincount(indices, quantities, initial=high_res_map, out=high_res_map)

This is my first time trying to contribute to Numpy, so apologies in advance if I've missed something. Of course, I am happy to make any changes / improvements :-)

@arthurfeeney arthurfeeney force-pushed the bincount-out-param branch 3 times, most recently from fc70b8e to 9f6e608 Compare December 31, 2022 02:16
The optional parameters initial and out allow bincount
to efficiently reuse and accumulate results across
multiple invocations.
@mattip
Copy link
Member

mattip commented Jan 1, 2023

Do we need both initial and out? They seem redundant, if I go to the bother of supplying an initial, it can just be out.

It's sort of a faster alternative to np.add.at

Did you run benchmarks on this new implementation? Also see #22889 which speeds up np.add.at by about 6x.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first quick look at this, and I think the initial plus out idea actually works well! I have some first in-line comments on the docstring and the tests.

The main code is a bit harder as it seems you made quite a few changes that may be improvements but are not essential for the addition of initial and out. I worry the changes in how strides are dealt with may make the code slower in some cases. I'll need to think a bit more but my sense is that rather than check this exhaustively, it may be better to keep the code as similar as possible, i.e., not change list and weight, and only deal with strides for out (for initial, it should not be necessary to worry about it if in the case of initial_array != out_array, one simply uses an array copy -- this would also take care of casting (but probably treat 0 separately!?). Or at least split it in 2 commits? Anyway, probably good to have others have a look too!

@@ -912,22 +913,32 @@ def bincount(x, weights=None, minlength=None):
Weights, array of the same shape as `x`.
minlength : int, optional
A minimum number of bins for the output array.
initial: ndarray, 1 dimension, optional
Array of initial values for each bin. It must have the same shape and
buffer length as the expected output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally it would just broadcast to the output (i.e., effectively the default is initial=0).


.. versionadded:: 1.6.0

Returns
-------
out : ndarray of ints
The result of binning the input array.
The length of `out` is equal to ``np.amax(x)+1``.
The length of `out` is equal to ``max(minlength, np.amax(x)+1)``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly, it can be even larger if the input out has a larger size.

@@ -417,6 +417,8 @@ def bincount(
/,
weights: None | ArrayLike = ...,
minlength: SupportsIndex = ...,
initial: None | NDArray[Any] = ...,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think scalar would be OK in principle.

{
npy_intp min = *data;
npy_intp max = *data;
*mn = *(npy_intp *)data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these changes?

}

lst = (PyArrayObject *)PyArray_ContiguousFromAny(list, NPY_INTP, 1, 1);
if (lst == NULL) {
list_arr = _array_from_object(list_obj, NPY_INTP, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried this change to deal with strides might actually make the code slower in some cases -- it may be faster to copy the index and weights to new arrays in turn rather than loop over them with unequal stride.

out = np.empty(6, dtype=float)
assert_raises_regex(TypeError,
"Cannot cast",
lambda: np.bincount(x, out=out))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error? In principle, casting is possible. But I can see that this would make things more complicated, so probably not worth it!

initial = np.empty(6, dtype=float)
assert_raises_regex(TypeError,
"Cannot cast",
lambda: np.bincount(x, initial=initial))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the case is a bit less clear, since one clearly could cast (if unsafely). This would be especially nice since one could then really just define a default of zero for the argument (initial=0), which would cast to float when weight is present.

x = np.array([1, 5, 2, 4, 1])
out = np.ones(6, dtype=int)
y = np.bincount(x, initial=out, out=out)
assert_array_equal(out, np.array([0, 2, 1, 0, 1, 1]) + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, also test assert y is out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, good to do a test as well with weight present too.

def test_with_initial_and_out2(self):
x = np.array([1, 5, 2, 4, 1])
out = np.zeros(6, dtype=int)
initial = np.ones(6, dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the above tests would be good to do with initial=1 (which I feel should work too).


def test_with_strided_initial(self):
x = np.array([1, 5, 2, 4, 1])
initial = np.ones(12, dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to define an array with order, so we can see the slicing worked correctly (i.e., catch possible bugs with strides in the code). E.g., initial = np.arange(12).

@arthurfeeney
Copy link
Author

Do we need both initial and out? They seem redundant, if I go to the bother of supplying an initial, it can just be out.

So, initial is kind of necessary. There needs to be some way of telling bincount to accumulate on top of the existing values in out, rather than overwriting it. (By default, if initial=None, the out array will be zeroed before the actual binning algorithm is run). So, I'm not sure if I agree that it's redundant, but it could be implemented in a different way.

Did you run benchmarks on this new implementation?

I did run the microbenchmarks: python runtests.py --bench-compare fe2c2a4cbb bench_function_base.Bincount shows either no significant performance difference, or a slight improvement. histogram benchmarks also show no difference.

Also see #22889 which speeds up np.add.at by about 6x.

Thank you for sharing this! I hadn't noticed that. Impressive speedup. I guess it makes this change to bincount less worthwhile :-( I can setup a comparison if you'd like?

The main code is a bit harder as it seems you made quite a few changes that may be improvements but are not essential for the addition of initial and out. [...] I'll need to think a bit more but my sense is that rather than check this exhaustively, it may be better to keep the code as similar as possible

I forgot to attribute it in the original post, but I actually built on top of #9424, rather than starting from scratch. So some of the code changes (mostly the work to handle strides) are actually from that old PR. I agree, it's probably safer to limit the changes for now. So, I can undo some of it.

@mhvk
Copy link
Contributor

mhvk commented Jan 3, 2023

@arthurfeeney - I think it is better to do one thing at a time, but an alternative would be to at least make them separate commits. That would also give some credit to @jaimefrio.

@seberg
Copy link
Member

seberg commented Feb 16, 2023

Since bincount and add.at are now much more comparable speed, should we close this? (See #5922 (comment))

I am not opposed to this, since I do think it is straight forward in its core. But it is also just a bit awkward with out= and minlength=. And at this point I suspect there are few use-cases left that np.add.at isn't (almost) as good at (or could be improved to be).

(There are many ways that add.at can be improved still – including deep ones – but most use-cases are likely reasonably covered at this point.)

@seberg seberg added the 57 - Close? Issues which may be closable unless discussion continued label Feb 16, 2023
@mhvk
Copy link
Contributor

mhvk commented Feb 16, 2023

@seberg - I think it makes sense, but would be good to check that performance for the optimal case (contiguous, 1d) now indeed is comparable. If so, we probably should mention in the release note and perhaps even the docstring of np.bincount that we now recommend np.add.at (and then eventually either deprecate np.bincount or re-implement it in terms of np.add.at).

On the other hand, if np.bincount still is substantially (2x?) faster, then I think it is worth putting (a version of) this in.

@mattip
Copy link
Member

mattip commented Feb 16, 2023

I think the two are asymptotically equivalent. bincount has much less overhead. If we add initial and out that will add more overhead and make them even closer.

@seberg
Copy link
Member

seberg commented Feb 17, 2023

@arthurfeeney thanks so much for the PR. I am going to close it for now, since I am very sure your use-case is pretty well covered by np.add.at (a doc update would be nice).

Reimplementing may may well make sense (or re-using the inner-loop). Bincount is very limited with what dtypes it can take, but it has the advantage of being much easier to find/read for new users, I expect. And it has the auto resizing feature.

@seberg seberg closed this Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 57 - Close? Issues which may be closable unless discussion continued
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants