Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int32 overflow when creating array from large list #5783

Closed
psinger opened this issue Apr 22, 2015 · 15 comments · Fixed by #5784
Closed

Int32 overflow when creating array from large list #5783

psinger opened this issue Apr 22, 2015 · 15 comments · Fixed by #5784

Comments

@psinger
Copy link

psinger commented Apr 22, 2015

When trying to create an array from a huge list, I receive the following error:

ValueError: negative dimensions are not allowed

After some debugging, I found that it seems to be a problem with some INT32 overflow in the C code of numpy. So you can quite easily reproduce the error the following way:

In [25]: l = range(2147483647)

In [26]: x = np.array(l)

In [27]: del x

In [28]: l = range(2147483648)

In [29]: x = np.array(l)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-ec8d2fa9a1ea> in <module>()
----> 1 x = np.array(l)

ValueError: negative dimensions are not allowed
@jaimefrio
Copy link
Member

What numpy version? 32 or 64 bit?

@juliantaylor
Copy link
Contributor

mh that needs more than 128gb ram to reproduce
on what platform are you encountering this? linux, windows, 32 bit 64 bit,... ?

@psinger
Copy link
Author

psinger commented Apr 22, 2015

Python is 64bit so I assume numpy is also 64bit.

I am encountering it on Ubuntu and yeah you need quite some memory to reproduce, so I guess my statement about easy to reproduce is wrong.

@juliantaylor
Copy link
Contributor

best machine I have access to has 128gb which is probably just not enough

there is a similar problem in python3 where range is a generator, array of small range expands the generator, array(range(2**31)) gives you a one element object array and array(range(2**32)) fails with an error

@psinger
Copy link
Author

psinger commented Apr 22, 2015

The range function is not the issue. I first encountered this error when I tried to create an array of a very large hand-created list (actually while building a sparse scipy matrix).

@jaimefrio
Copy link
Member

The error is very likely being raised when creating the array to fill with the sequence here.

That points to the shape discovery code as the likely culprit. While it shouldn't be an issue if your C ints are 64 bits, this right here is wrong... The n in that line is defined as an int several lines before. Can you check if defining n to be npy_intp instead solves the issue?

I don't have the RAM to even dream of attempting this... :-(

@juliantaylor
Copy link
Contributor

nice find, that definitely looks wrong, int is 32 bit on all platforms I know of

@jaimefrio
Copy link
Member

Right, it's long that gets messed up by Windows, right? I always need to look that stuff up... Then I'm pretty sure we have a winner. Can you test it @psinger?

@njsmith
Copy link
Member

njsmith commented Apr 22, 2015

Yeah, on Win64 long is 32 bit. On every other platform we care about, long
is the same size as pointers.

On Wed, Apr 22, 2015 at 4:38 PM, Jaime notifications@github.com wrote:

Right, it's long that gets messed up by Windows, right? I always need to
look that stuff up... Then I'm pretty sure we have a winner. Can you test
it @psinger https://github.com/psinger?


Reply to this email directly or view it on GitHub
#5783 (comment).

Nathaniel J. Smith -- http://vorpus.org

@jaimefrio
Copy link
Member

I found a couple other uses of int to store a Py_ssize_t and fixed those as well.

@psinger
Copy link
Author

psinger commented Apr 23, 2015

I will try it out soon!

@njsmith
Copy link
Member

njsmith commented Apr 23, 2015

I think this could be tested with more modest memory usage ("only" 16
gigabytes or so) using something like array([1] * 2**31)?
On Apr 22, 2015 11:44 PM, "Philipp Singer" notifications@github.com wrote:

I will try it out soon!


Reply to this email directly or view it on GitHub
#5783 (comment).

@seberg
Copy link
Member

seberg commented Apr 23, 2015

If you avoid lists, you might get much more efficient, but it is still incredulously slow (python2 here):

x = xrange(2**31)
arr = np.array(x, dtype=np.uint8)   # sure, not a safe cast...

The improved version is maybe this:

class l(object):
    def __len__(self):
        return self.length
    def __getitem__(self, item):
        raise AssertionError('Will not actually give you anything')
    def __init__(self, length):
        self.length = length

assert_raises(AssertionError, np.array, l(2**31))

Seems to run into similar issues (only checked 1.8.2) and is instant of course. But with all this, I am not sure if we do not have some list fast paths here, so that we have to use lists and not some custom sequence-like types.

@psinger
Copy link
Author

psinger commented Apr 23, 2015

@jaimefrio Seems to work!

@jaimefrio
Copy link
Member

That would be nice if it worked, @seberg, but there is a call to PySequence_Fast in the code, so any sequence-like object will be converted to a list or tuple. I have not been able to reproduce @psinger's error with that object and current master, just the assertion error...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants