Cast array size to int64 when loading from archive #7598

drasmuss · 2016-05-02T21:39:30Z

When loading an array from a npz archive, the size of the array is computed via numpy.multiply.reduce(shape). This defaults to int32 on some systems, including 64bit systems where it is possible to create arrays large enough to cause that value to overflow. Here is a minimal example illustrating the problem:

import platform
import sys

import numpy as np

print(platform.architecture())
print(sys.version)
print("np version:", np.__version__)
print("default int type:", np.int_)

a = np.empty((2**30, 2), dtype=np.uint8)
with open("tmp.npz", "wb") as f:
    np.savez(f, a)

print(a.shape)
print(np.multiply.reduce(a.shape))

with open("tmp.npz", "rb") as f:
    data = np.load(f)
    new_a = data["arr_0"]

output on my system:

('64bit', 'WindowsPE')
3.4.4 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:54:04) [MSC v.1600 64 bit (AMD64)]
np version: 1.11.0
default int type: <class 'numpy.int32'>
(1073741824, 2)
-2147483648
Traceback (most recent call last):
  File "<...>/tmp.py", line 20, in <module>
    new_a = data["arr_0"]
  File "<...>\lib\site-packages\numpy\lib\npyio.py", line 224, in __getitem__
    pickle_kwargs=self.pickle_kwargs)
  File "<...>\lib\site-packages\numpy\lib\format.py", line 660, in read_array
    array = numpy.empty(count, dtype=dtype)
ValueError: negative dimensions are not allowed

I think the solution is just to change it to numpy.multiply.reduce(shape, dtype=numpy.int64)

njsmith · 2016-05-02T22:25:26Z

I think it should be dtype=np.intp -- intp is "integer pointer" i.e. 32 bits on machines with 32 bits of addressable memory and 64 bits on machines with 64 bits of addressable memory.

Would it be possible to add a test? I guess this will be a somewhat heavyweight test since it has to write/read several gibibytes of data, but that's okay if you use the @dec.slow decorator to mark it as "slow".

drasmuss · 2016-05-03T15:54:54Z

Didn't know about intp, that's definitely better. Added a test as well.

charris · 2016-05-03T17:24:41Z

@njsmith npy_intp might overflow on 32 bit systems if the file is produced in a 64 bit system, so it would probably be safer to use 64 bits and then check if the size of the result fits in npy_intp.

njsmith · 2016-05-03T18:30:45Z

@charris: oh, great point

charris · 2016-05-04T01:29:58Z

@drasmuss Needs a bit more work.

drasmuss · 2016-05-04T14:25:01Z

What do you think the behaviour should be in that case? I would lean towards just trying to open the file, and then failing with the normal out of memory error. We could also raise a custom error here though, if that seems like it wouldn't be informative enough.

charris · 2016-05-05T18:20:30Z

Either way works for me. IIRC (I checked yesterday), count is converted to Py_ssize_t, which is the same as npy_intp these days, so I expect the error should occur when the arguments are converted from Python objects to C. Might check that that actually happens.

Prevents overflow errors for large arrays on systems where the default int type is int32.

drasmuss · 2016-05-09T13:50:25Z

Alright, changed it back to int64, which I think will cause an out of memory error at the array = numpy.empty(count, dtype=dtype) line if someone tries to load a file too large for their system. That seems like the simplest solution, since that is the same error someone would get regardless of any int32/int64 issues. I don't have access to a 32bit system to confirm that though.

charris · 2016-05-09T18:37:19Z

Thanks @drasmuss .

drasmuss force-pushed the master branch 2 times, most recently from 7df0f25 to fad6296 Compare May 3, 2016 15:08

charris added 00 - Bug component: numpy.lib labels May 3, 2016

BUG: Cast size to int64 when loading from archive

da668fc

Prevents overflow errors for large arrays on systems where the default int type is int32.

drasmuss force-pushed the master branch from fad6296 to da668fc Compare May 9, 2016 13:46

charris merged commit cb6a5bf into numpy:master May 9, 2016

charris mentioned this pull request Jan 13, 2017

numpy.load fails loading large arrays on windows due to integer overflow #8469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cast array size to int64 when loading from archive #7598

Cast array size to int64 when loading from archive #7598

drasmuss commented May 2, 2016 •

edited

Loading

njsmith commented May 2, 2016

drasmuss commented May 3, 2016

charris commented May 3, 2016

njsmith commented May 3, 2016

charris commented May 4, 2016

drasmuss commented May 4, 2016

charris commented May 5, 2016

drasmuss commented May 9, 2016

charris commented May 9, 2016

Cast array size to int64 when loading from archive #7598

Cast array size to int64 when loading from archive #7598

Conversation

drasmuss commented May 2, 2016 • edited Loading

njsmith commented May 2, 2016

drasmuss commented May 3, 2016

charris commented May 3, 2016

njsmith commented May 3, 2016

charris commented May 4, 2016

drasmuss commented May 4, 2016

charris commented May 5, 2016

drasmuss commented May 9, 2016

charris commented May 9, 2016

drasmuss commented May 2, 2016 •

edited

Loading