Skip to content

Commit

Permalink
ENH: add an axis argument to np.bincount
Browse files Browse the repository at this point in the history
This adds an axis argument to np.bincount, which can now take multidimensional
arrays and do the counting over an arbitrary number of axes. `axis` defaults
to `None`, which does the counting over all the axes, i.e. over the flattened
array.

The shape of the output is computed by removing from `list` (the input array)
all dimensions in `axis`, and appending a dimension of length
`max(max(list) + 1, minlength)` to the end. `out[..., n]` will hold the number
of occurrences of `n` at the given position over the selected axes.

If a `weights` argument is provided, its shape is broadcasted with `list`
before removing the axes. In this case, `axis` refers to the axes of `list`
*before* broadcasting, and `out[..., n]` will hold the sum of `weights` over
the selected axes, at all positions where `list` takes the value `n`.

The general case is handled with nested iterators, but shortcuts without
having to set up an iterator are provided for 1D cases, with no performance
loss against the previous version.

As a plus, this PR also solves numpy#823, by providing specialized functions for
all integer types to find the max. There are also specialized functions for
all integer types for counting and doing weighted counting.
  • Loading branch information
jaimefrio committed Feb 19, 2014
1 parent db198d5 commit 0991272
Show file tree
Hide file tree
Showing 5 changed files with 868 additions and 125 deletions.
80 changes: 58 additions & 22 deletions numpy/add_newdocs.py
Original file line number Diff line number Diff line change
Expand Up @@ -4987,48 +4987,58 @@ def luf(lamdaexpr, *args, **kwargs):

add_newdoc('numpy.lib._compiled_base', 'bincount',
"""
bincount(x, weights=None, minlength=None)
bincount(list, weights=None, minlength=None, axis=None)
Count number of occurrences of each value in array of non-negative ints.
Count number of occurrences of each value in an array of non-negative ints
along the selected axes.
The number of bins (of size 1) is one larger than the largest value in
`x`. If `minlength` is specified, there will be at least this number
of bins in the output array (though it will be longer if necessary,
depending on the contents of `x`).
Each bin gives the number of occurrences of its index value in `x`.
If `weights` is specified the input array is weighted by it, i.e. if a
value ``n`` is found at position ``i``, ``out[n] += weight[i]`` instead
of ``out[n] += 1``.
Each bin gives the number of occurrences of its index value in `list`
along the axes in `axis`. If `weights` is specified, 'list' is weighted
by it, i.e. if ``list[i] == n``, then ``out[n] += weight[i]`` instead of
``out[n] += 1``.
Parameters
----------
x : array_like, 1 dimension, nonnegative ints
list : array_like, nonnegative ints
Input array.
weights : array_like, optional
Weights, array of the same shape as `x`.
Weights, array broadcastable with `list`.
minlength : int, optional
.. versionadded:: 1.6.0
A minimum number of bins for the output array.
axis : int or tuple of ints, optional
.. versionadded:: 1.9.0
The axes of `x` over which to count, defaults to counting over
the flattened array.
Returns
-------
out : ndarray of ints
out : ndarray of ints (or doubles if weights is specified)
The result of binning the input array.
The length of `out` is equal to ``np.amax(x)+1``.
Raises
------
ValueError
If the input is not 1-dimensional, or contains elements with negative
values, or if `minlength` is non-positive.
If the input contains elements with negative values, if there are
entries in axis repeated or out of bounds, or if the input array
and `weights` cannot be broadcasted together.
TypeError
If the type of the input is float or complex.
If the type of the input, `minlength` or `axis` is not an integer, or
if `weights` cannot be safely cast to a double.
See Also
--------
histogram, digitize, unique
Notes
-----
To determine the output shape, `list` and `weights` are broadcasted
together, the axes in `axis` are removed from the broadcasted shape, and
a dimension of size `max(max(list) + 1, minlength)` is appended to the
end. Note that the values in `axis` refer to the shape of `list` prior to
broadcasting, not after. Then entry at position `out[..., n]` will be the
(weighted) count of all instances of `n` in `list` along the selected axes.
Examples
--------
>>> np.bincount(np.arange(5))
Expand All @@ -5037,7 +5047,7 @@ def luf(lamdaexpr, *args, **kwargs):
array([1, 3, 1, 1, 0, 0, 0, 1])
>>> x = np.array([0, 1, 1, 3, 2, 1, 7, 23])
>>> np.bincount(x).size == np.amax(x)+1
>>> np.bincount(x).size == np.max(x)+1
True
The input array needs to be of integer dtype, otherwise a
Expand All @@ -5046,16 +5056,42 @@ def luf(lamdaexpr, *args, **kwargs):
>>> np.bincount(np.arange(5, dtype=np.float))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: array cannot be safely cast to required type
TypeError: input must be an array of integers
A possible use of ``bincount`` is to perform sums over
variable-size chunks of an array, using the ``weights`` keyword.
A possible use of `bincount` is to perform sums over
variable-size chunks of an array, using the `weights` keyword.
>>> w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights
>>> x = np.array([0, 1, 1, 2, 2, 2])
>>> np.bincount(x, weights=w)
array([ 0.3, 0.7, 1.1])
Using the `axis` argument allows (weighted) counting over multiple axes:
>>> x = np.arange(2*3).reshape(3, 2, 1)
>>> np.bincount(x, axis=(1, 2))
array([[1, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 1]])
>>> w = x * x
>>> np.bincount(x, w, axis=(1, 2))
array([[ 0., 1., 0., 0., 0., 0.],
[ 0., 0., 4., 9., 0., 0.],
[ 0., 0., 0., 0., 16., 25.]])
With broadcasting, multidimensional `weights` can be summed:
>>> x = [0, 1, 1, 2]
>>> w = np.arange(8).reshape(4, 2)
>>> np.bincount(x, w.T).T
array([[ 0., 1.],
[ 6., 8.],
[ 6., 7.]])
Note that in this last example, the broadcasted shape of `x` is
`(1, 1, 4)`, but counting is only done over the last axis, not all three,
because the `axis` value (which defaults to all axes) refers to the shape
prior to broadcasting.
""")

add_newdoc('numpy.lib._compiled_base', 'ravel_multi_index',
Expand Down
2 changes: 1 addition & 1 deletion numpy/lib/bento.info
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ HookFile: bscript
Library:
Extension: _compiled_base
Sources:
src/_compiled_base.c
src/_compiled_base.c.src
2 changes: 1 addition & 1 deletion numpy/lib/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def configuration(parent_package='',top_path=None):


config.add_extension('_compiled_base',
sources=[join('src', '_compiled_base.c')]
sources=[join('src', '_compiled_base.c.src')]
)

config.add_data_dir('benchmarks')
Expand Down

0 comments on commit 0991272

Please sign in to comment.