Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot save large numpy arrays (larger than 4 GiB) #18784

Closed
zacmon opened this issue Apr 16, 2021 · 6 comments
Closed

Cannot save large numpy arrays (larger than 4 GiB) #18784

zacmon opened this issue Apr 16, 2021 · 6 comments

Comments

@zacmon
Copy link

zacmon commented Apr 16, 2021

pickle.dump(array, fp, protocol=3, **pickle_kwargs)

My colleague was trying to save a large numpy array of output from a neural network (> 4 GiB). They were unable to save with pickle protocol = 3 but could save using pickle.HIGHEST_PROTOCOL by changing the source code. Is there any reason why this is set to 3 or why pickle.HIGHEST_PROTOCOL is not used? Is this something that could be specified in the numpy.save function?

Thanks

@charris
Copy link
Member

charris commented Apr 16, 2021

  • Pickle3 is the default in Python3.0-3.7.
  • Pickle4 is the default in Python3.8+.
  • Pickle5 is only native in Python3.8+.

Pickle3 is the most compatible choice as long as we support Python3.7. When we drop 3.7 sometime next year, we can make pickle4 the default, it supports big files.

@zacmon
Copy link
Author

zacmon commented Apr 19, 2021

Thanks!

@zacmon zacmon closed this as completed Apr 19, 2021
@tbenst
Copy link

tbenst commented Aug 3, 2021

@charris actually pickle4 was added in Python 3.4: https://www.python.org/dev/peps/pep-3154/

Perhaps is makes sense to reopen this issue & do the bump considering that numpy only supports versions of Python that have pickle4?

@charris
Copy link
Member

charris commented Aug 3, 2021

@tbenst You could propose that on the mailing list. We will probably keep Python 3.7 for the 1.22 release but it is a close run thing.

@TsingZ0
Copy link

TsingZ0 commented Dec 4, 2022

@charris This issue still exists when I use python 3.8.15 and numpy 1.23.4. I solved this problem by setting protocol=4 in pickle.dump(array, fp, protocol=3, **pickle_kwargs) .

I wonder why setting protocol=3 by default?

@khood5
Copy link

khood5 commented Apr 7, 2024

still getting

pickle.dump(array, fp, protocol=3, **pickle_kwargs)                                                             
OverflowError: serializing a bytes object larger than 4 GiB requires pickle protocol 4 or higher

with

Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.                                              
>>> import pickle                                                                                                   
>>> print(pickle.format_version)                                                                                    
4.0                                                                                                                
 >>> exit 

and

Python 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] on linux                    
Type "help", "copyright", "credits" or "license" for more information.                                              
>>> import numpy                                                                                                    
>>> numpy.version.version                                                                                           
'1.26.4'                                                                                                            
>>> exit                                                                                                            
Use exit() or Ctrl-D (i.e. EOF) to exit 

I think this is incorrect still set to 3. Or am I missing something?

pickle.dump(array, fp, protocol=3, **pickle_kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants