New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot save large numpy arrays (larger than 4 GiB) #18784
Comments
Pickle3 is the most compatible choice as long as we support Python3.7. When we drop 3.7 sometime next year, we can make pickle4 the default, it supports big files. |
Thanks! |
@charris actually pickle4 was added in Python 3.4: https://www.python.org/dev/peps/pep-3154/ Perhaps is makes sense to reopen this issue & do the bump considering that numpy only supports versions of Python that have pickle4? |
@tbenst You could propose that on the mailing list. We will probably keep Python 3.7 for the 1.22 release but it is a close run thing. |
@charris This issue still exists when I use python 3.8.15 and numpy 1.23.4. I solved this problem by setting I wonder why setting |
still getting pickle.dump(array, fp, protocol=3, **pickle_kwargs) OverflowError: serializing a bytes object larger than 4 GiB requires pickle protocol 4 or higher with Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pickle >>> print(pickle.format_version) 4.0 >>> exit and Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.version.version '1.26.4' >>> exit Use exit() or Ctrl-D (i.e. EOF) to exit I think this is incorrect still set to 3. Or am I missing something? Line 744 in e59c074
|
numpy/numpy/lib/format.py
Line 680 in 181f273
My colleague was trying to save a large numpy array of output from a neural network (> 4 GiB). They were unable to save with pickle protocol = 3 but could save using pickle.HIGHEST_PROTOCOL by changing the source code. Is there any reason why this is set to 3 or why pickle.HIGHEST_PROTOCOL is not used? Is this something that could be specified in the numpy.save function?
Thanks
The text was updated successfully, but these errors were encountered: