New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bincount does not accept input of type > N.uint16 (Trac #225) #823
Comments
@stefanv wrote on 2006-08-03 Can be improved by changing to NPY_UINTP, but that still won't allow 64-bit integers. Is there a way to easily fix this for 32-bit systems? |
Milestone changed to |
Milestone changed to |
@mwiebe wrote on 2011-03-23 This function still errors (with uint64 on a 64-bit machine). Bincount is a candidate to convert to using the iterator with buffering, since currently it will cause a copy if the input isn't contiguous and the right type. |
@bsouthey wrote on 2011-05-09 My C is not very good but hopefully someone with can correct the patch. The less obvious change (at least was to me) was that PyArray_ContiguousFromAny was being called with unsigned int, PyArray_INTP, rather than PyArray_UINTP. According to the comment on the bincount code in "numpy/lib/src/_compiled_base.c", the first argument must be an array of non-negative integers (relevant lines):
Consequently I changed bincount to use unsigned ints instead of signed ints. The patch is incorrect if the mxx and mnx functions can be used outside of bincount. In that case new functions would have to declared. |
Attachment added by @bsouthey on 2011-05-09: 0001-bincount-unsigned-ints.patch |
Attachment added by @bsouthey on 2011-05-09: 0002-Redo-bincount-signed-int-change.patch |
@bsouthey wrote on 2011-05-09 Okay, still was not that simple! I had to change one PyObject variable into PyArrayObject. This allowed to get the input dtype for PyArray?_ContiguousFromAny. Both patches need to be applied because I do not know how to get a single patch with git without redoing everything - sorry! |
This adds an axis argument to np.bincount, which can now take multidimensional arrays and do the counting over an arbitrary number of axes. `axis` defaults to `None`, which does the counting over all the axes, i.e. over the flattened array. The shape of the output is computed by removing from `list` (the input array) all dimensions in `axis`, and appending a dimension of length `max(max(list) + 1, minlength)` to the end. `out[..., n]` will hold the number of occurrences of `n` at the given position over the selected axes. If a `weights` argument is provided, its shape is broadcasted with `list` before removing the axes. In this case, `axis` refers to the axes of `list` *before* broadcasting, and `out[..., n]` will hold the sum of `weights` over the selected axes, at all positions where `list` takes the value `n`. The general case is handled with nested iterators, but shortcuts without having to set up an iterator are provided for 1D cases, with no performance loss against the previous version. As a plus, this PR also solves numpy#823, by providing specialized functions for all integer types to find the max. There are also specialized functions for all integer types for counting and doing weighted counting.
Is this still an issue? |
@OmerJog The original example doesn't fail:
but this does:
|
Reasons: - np.bincount can't work with uint64 safely open issues: numpy/numpy#17760 numpy/numpy#823 numpy/numpy#4384 - Not easy to exceed the largest 32-bit uint: 4294967295 - UTF-8 can have 4 bytes or 32 bits max - Platform-independence is not the goal since other parts of the module have 64-bit floats
- The function entropy.lziv_complexity now converts all valid input types to arrays of type np.uint32 for fast computation with Numba. - For strings and list of strings, utf-8 numeric representations are used for integer arrays using built-in ord - Normalization uses np.bincount instead of set on numpy arrays, significant speedup on large arrays. A major reason for using 32-bit and not 64-bint uints as np.bincount can't work with uint64 safely. Several open issues about this: numpy/numpy#17760 numpy/numpy#823 numpy/numpy#4384
- The function entropy.lziv_complexity now converts all valid input types to arrays of type np.uint32 for fast computation with Numba. - For strings and list of strings, utf-8 numeric representations are used for integer arrays using built-in ord - Normalization uses np.bincount instead of set on numpy arrays, significant speedup on large arrays. A major reason for using 32-bit and not 64-bint uints as np.bincount can't work with uint64 safely. Several open issues about this: numpy/numpy#17760 numpy/numpy#823 numpy/numpy#4384
Reasons: - np.bincount can't work with uint64 safely open issues: numpy/numpy#17760 numpy/numpy#823 numpy/numpy#4384 - Not easy to exceed the largest 32-bit uint: 4294967295 - UTF-8 can have 4 bytes or 32 bits max - Platform-independence is not the goal since other parts of the module have 64-bit floats
- The function entropy.lziv_complexity now converts all valid input types to arrays of type np.uint32 for fast computation with Numba. - For strings and list of strings, utf-8 numeric representations are used for integer arrays using built-in ord - Normalization uses np.bincount instead of set on numpy arrays, significant speedup on large arrays. A major reason for using 32-bit and not 64-bint uints as np.bincount can't work with uint64 safely. Several open issues about this: numpy/numpy#17760 numpy/numpy#823 numpy/numpy#4384
Original ticket http://projects.scipy.org/numpy/ticket/225 on 2006-08-03 by @stefanv, assigned to @teoliphant.
Under r2944:
The text was updated successfully, but these errors were encountered: