Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal function at indices must have fewer than 2**31 elements or else they are ignored #13286

Closed
erykoff opened this issue Apr 8, 2019 · 1 comment · Fixed by #13323
Closed

Comments

@erykoff
Copy link

erykoff commented Apr 8, 2019

When running an accumulator (for example with np.add.at) then the length of the indices must be fewer than 2**32 elements. Otherwise, the accumulator only uses the first np.int32(indices.size) elements in the accumulator. This fails on all numpys up to and including 1.16.2.

Reproducing code example:

Note that the following code needs to be run on a computer with a lot of memory, >~ 64Gb.

This code does not work. Expectation is that inds.size is the same as accum.sum(), which is not true. You might notice that accum is mostly 0s.

import numpy as np

print(np.__version__)
np.random.seed(12345)

arr = np.arange(1000)
inds = np.random.choice(arr, size=2**32 - 1)
print(inds.size)

accum = np.zeros(arr.size, dtype=np.int64)
np.add.at(accum, inds, 1)
np.testing.assert_equal(accum.sum(), inds.size)

The following code does work, and seems to be the largest size that does:

import numpy as np

print(np.__version__)
np.random.seed(12345)

arr = np.arange(1000)
inds = np.random.choice(arr, size=2**31 - 1)
print(inds.size)

accum = np.zeros(arr.size, dtype=np.int64)
np.add.at(accum, inds, 1)

np.testing.assert_equal(accum.sum(), inds.size)

Error message:

There is no error logged from numpy , the answer is simply wrong. In the test code:

AssertionError: 
Items are not equal:
 ACTUAL: 0
 DESIRED: 4294967295

Numpy/Python version information:

1.16.2 3.7.2 (default, Dec 29 2018, 06:19:36)
[GCC 7.3.0]

@seberg
Copy link
Member

seberg commented Apr 9, 2019

A simpler reproducer:

arr = np.ones(0)
indices = np.zeros(1, dtype=np.intp)
indices = np.broadcast_to(indices, 2**32)

np.add.at(arr, indices, 1)  # should take very long, finishes quickly with wrong result
print(arr)
assert arr[0] == 2**32

EDIT: Actually meant to write np.ones(1), fixed the "test"

seberg added a commit to seberg/numpy that referenced this issue Apr 13, 2019
The iteration variable has to be intp of course, unfortunately
a test is too slow to be practical (even as a slow test).

This code needs more refactoring, but since it is a minimal fix...

Closes numpygh-13286
charris pushed a commit to charris/numpy that referenced this issue Apr 16, 2019
The iteration variable has to be intp of course, unfortunately
a test is too slow to be practical (even as a slow test).

This code needs more refactoring, but since it is a minimal fix...

Closes numpygh-13286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants