Int32 overflow when creating array from large list #5783

psinger · 2015-04-22T20:44:29Z

When trying to create an array from a huge list, I receive the following error:

ValueError: negative dimensions are not allowed

After some debugging, I found that it seems to be a problem with some INT32 overflow in the C code of numpy. So you can quite easily reproduce the error the following way:

In [25]: l = range(2147483647)

In [26]: x = np.array(l)

In [27]: del x

In [28]: l = range(2147483648)

In [29]: x = np.array(l)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-ec8d2fa9a1ea> in <module>()
----> 1 x = np.array(l)

ValueError: negative dimensions are not allowed

jaimefrio · 2015-04-22T21:04:08Z

What numpy version? 32 or 64 bit?

juliantaylor · 2015-04-22T21:05:37Z

mh that needs more than 128gb ram to reproduce
on what platform are you encountering this? linux, windows, 32 bit 64 bit,... ?

psinger · 2015-04-22T21:08:32Z

Python is 64bit so I assume numpy is also 64bit.

I am encountering it on Ubuntu and yeah you need quite some memory to reproduce, so I guess my statement about easy to reproduce is wrong.

juliantaylor · 2015-04-22T21:13:33Z

best machine I have access to has 128gb which is probably just not enough

there is a similar problem in python3 where range is a generator, array of small range expands the generator, array(range(2**31)) gives you a one element object array and array(range(2**32)) fails with an error

psinger · 2015-04-22T21:19:17Z

The range function is not the issue. I first encountered this error when I tried to create an array of a very large hand-created list (actually while building a sparse scipy matrix).

jaimefrio · 2015-04-22T22:09:57Z

The error is very likely being raised when creating the array to fill with the sequence here.

That points to the shape discovery code as the likely culprit. While it shouldn't be an issue if your C ints are 64 bits, this right here is wrong... The n in that line is defined as an int several lines before. Can you check if defining n to be npy_intp instead solves the issue?

I don't have the RAM to even dream of attempting this... :-(

juliantaylor · 2015-04-22T23:22:06Z

nice find, that definitely looks wrong, int is 32 bit on all platforms I know of

jaimefrio · 2015-04-22T23:38:50Z

Right, it's long that gets messed up by Windows, right? I always need to look that stuff up... Then I'm pretty sure we have a winner. Can you test it @psinger?

njsmith · 2015-04-22T23:51:36Z

Yeah, on Win64 long is 32 bit. On every other platform we care about, long
is the same size as pointers.

On Wed, Apr 22, 2015 at 4:38 PM, Jaime notifications@github.com wrote:

Right, it's long that gets messed up by Windows, right? I always need to
look that stuff up... Then I'm pretty sure we have a winner. Can you test
it @psinger https://github.com/psinger?

—
Reply to this email directly or view it on GitHub
#5783 (comment).

Nathaniel J. Smith -- http://vorpus.org

Closes numpy#5783

jaimefrio · 2015-04-23T03:00:09Z

I found a couple other uses of int to store a Py_ssize_t and fixed those as well.

psinger · 2015-04-23T06:44:31Z

I will try it out soon!

njsmith · 2015-04-23T07:39:22Z

I think this could be tested with more modest memory usage ("only" 16
gigabytes or so) using something like array([1] * 2**31)?
On Apr 22, 2015 11:44 PM, "Philipp Singer" notifications@github.com wrote:

I will try it out soon!

—
Reply to this email directly or view it on GitHub
#5783 (comment).

seberg · 2015-04-23T09:03:26Z

If you avoid lists, you might get much more efficient, but it is still incredulously slow (python2 here):

x = xrange(2**31)
arr = np.array(x, dtype=np.uint8)   # sure, not a safe cast...

The improved version is maybe this:

class l(object):
    def __len__(self):
        return self.length
    def __getitem__(self, item):
        raise AssertionError('Will not actually give you anything')
    def __init__(self, length):
        self.length = length

assert_raises(AssertionError, np.array, l(2**31))

Seems to run into similar issues (only checked 1.8.2) and is instant of course. But with all this, I am not sure if we do not have some list fast paths here, so that we have to use lists and not some custom sequence-like types.

psinger · 2015-04-23T10:03:44Z

@jaimefrio Seems to work!

jaimefrio · 2015-04-23T13:00:52Z

That would be nice if it worked, @seberg, but there is a call to PySequence_Fast in the code, so any sequence-like object will be converted to a list or tuple. I have not been able to reproduce @psinger's error with that object and current master, just the assertion error...

jaimefrio added a commit to jaimefrio/numpy that referenced this issue Apr 23, 2015

BUG: Use npy_intp instead of int in ctors.c

8a590fb

Closes numpy#5783

jaimefrio mentioned this issue Apr 23, 2015

BUG: Use npy_intp instead of int in ctors.c #5784

Merged

charris closed this as completed in #5784 Apr 23, 2015

psinger mentioned this issue Jul 16, 2015

preprocessing.normalize: ValueError: Buffer dtype mismatch, expected 'DOUBLE' but got 'float' scikit-learn/scikit-learn#4988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int32 overflow when creating array from large list #5783

Int32 overflow when creating array from large list #5783

psinger commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

psinger commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

psinger commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

njsmith commented Apr 22, 2015

jaimefrio commented Apr 23, 2015

psinger commented Apr 23, 2015

njsmith commented Apr 23, 2015

seberg commented Apr 23, 2015

psinger commented Apr 23, 2015

jaimefrio commented Apr 23, 2015

Int32 overflow when creating array from large list #5783

Int32 overflow when creating array from large list #5783

Comments

psinger commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

psinger commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

psinger commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

juliantaylor commented Apr 22, 2015

jaimefrio commented Apr 22, 2015

njsmith commented Apr 22, 2015

jaimefrio commented Apr 23, 2015

psinger commented Apr 23, 2015

njsmith commented Apr 23, 2015

seberg commented Apr 23, 2015

psinger commented Apr 23, 2015

jaimefrio commented Apr 23, 2015