Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Faster numpy.save and numpy.load #22898

Open
divideconcept opened this issue Dec 28, 2022 · 11 comments
Open

ENH: Faster numpy.save and numpy.load #22898

divideconcept opened this issue Dec 28, 2022 · 11 comments

Comments

@divideconcept
Copy link

Proposed new feature or change:

I find numpy.save and numpy.load quite slow, in particular with small arrays, which can become a bottleneck when dealing with large datasets in machine learning.

I came up with faster save/load routines for simple arrays (standard dtypes, no Fortran ordering or nested/labelled data) - which probably represents the vast majority of use cases: https://github.com/divideconcept/fastnumpyio

numpy.save: 0:00:00.813492
fastnumpyio.save: 0:00:00.334398
numpy.load: 0:00:07.910804
fastnumpyio.load: 0:00:00.306737

Maybe there's a way to optimize numpy.save()/numpy.load() npy header parsing for simple arrays ?

@charris
Copy link
Member

charris commented Dec 28, 2022

What NumPy version is that?

@divideconcept
Copy link
Author

The latest one for both platforms, on recent machines and OS, same speed difference on both:

Windows 11, Python 3.9.12 x64, Numpy 1.24.0, Intel Core i7-12700K:

numpy.save: 0:00:00.786250
fastnumpyio.save: 0:00:00.329080
numpy.load: 0:00:09.689329
fastnumpyio.load: 0:00:00.341074

macOS 12.5, Python 3.9.15 arm64, Numpy 1.24.0, Apple M1:

numpy.save: 0:00:00.831839
fastnumpyio.save: 0:00:00.389113
numpy.load: 0:00:07.552911
fastnumpyio.load: 0:00:00.301430

See https://github.com/divideconcept/fastnumpyio for more details about the benchmark and implementation.

@xor2k
Copy link
Contributor

xor2k commented Dec 28, 2022

Hi everybody, great to see some more effort in this topic 🤩

Your demo definitely shows how easy handling .npy files could be like.

npyio.py and format.py, the home of numpy.load and numpy.save have a lot of complexity, in particular with handling the different .npy versions described in

https://numpy.org/neps/nep-0001-npy-format.html
https://numpy.org/devdocs/reference/generated/numpy.lib.format.html

One could try to find the bottleneck. Maybe it can be fixed easily by modifying some lines here and there.

Meanwhile, maybe

https://github.com/xor2k/npy-append-array

could help you out. I mostly use it by putting all the data into one big .npy file and having an extra array with offsets with which I can access every individual dataset (handling the offsets is just 20 lines or so). Memory mapping also speeds up things a lot. I pushed a big commit today with a big cleanup and some extra features (ensure_appendable and recover).

@xor2k
Copy link
Contributor

xor2k commented Dec 28, 2022

Btw. I've just seen quite an interesting talk regarding .npy performance:

https://www.youtube.com/watch?v=HLH5AwF-jx4

@divideconcept
Copy link
Author

Thanks for the references, I'll have a look.

To me the whole bottleneck of numpy.save/numpy.load (specially numpy.load) is that the .npy format has an ascii header with a lot of possible combinations. It's great to have this flexibility, but I'm pretty sure 99% of numpy users just use plain simple arrays, without data nesting, fortran ordering or labelling, and just npy format version 1.0 - and trying to parse this header fully for all possible cases is just a waste of time for the vast majority of numpy users.

To speed up loading in that case, there should be a quick detector at the start of numpy.load to check if we're dealing with a simple array or something more complex. This can be achieved with a simple character detection in the header text, for example count the number of : ' and (. If it's a simple header go with quick parsing, if not do the full parsing.

@xor2k
Copy link
Contributor

xor2k commented Dec 28, 2022

I ran SnakeViz on it, result see below. Tokenize and untokenize consume a lot of time. Is this really necessary or can this be sped up? I guess json.load might be a lot faster.

image

image

@seberg
Copy link
Member

seberg commented Jan 2, 2023

_filter_header is just for Python 2 support. That practically never happens, so could probably done after a try:/except: statement.

@xor2k
Copy link
Contributor

xor2k commented Jan 3, 2023

_filter_header is just for Python 2 support. That practically never happens, so could probably done after a try:/except: statement.

I have implemented this, pull request see #22916

Here some new benchmarks (different machine, don't compare to the images above):

Before:
image

Times without using cProfile (more realistic):

numpy.save: 0:00:00.876054
fastnumpyio.save: 0:00:00.268091
fastnumpyio.pack: 0:00:00.231627
numpy.load: 0:00:07.604519
fastnumpyio.load: 0:00:00.282154
fastnumpyio.unpack: 0:00:00.176749
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

After:
image

Times without using cProfile (more realistic):

numpy.save: 0:00:00.789263
fastnumpyio.save: 0:00:00.268508
fastnumpyio.pack: 0:00:00.231895
numpy.load: 0:00:02.769206
fastnumpyio.load: 0:00:00.293511
fastnumpyio.unpack: 0:00:00.176417
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

@xor2k
Copy link
Contributor

xor2k commented Jan 3, 2023

Here for comparison a hypothetical case when using JSON for header encoding/decoding (same machine as in comment before):

image

Times without using cProfile (more realistic):

numpy.save: 0:00:00.915063
fastnumpyio.save: 0:00:00.267978
fastnumpyio.pack: 0:00:00.231612
numpy.load: 0:00:01.389364
fastnumpyio.load: 0:00:00.278797
fastnumpyio.unpack: 0:00:00.173742
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

@seberg
Copy link
Member

seberg commented Jan 3, 2023

I won't disagree that it would be nice to use json (eventually?), but how do you ensure forward compatibility (old versions loading newly generated files)? It also may be non-trivial for dtypes with fields.

@xor2k
Copy link
Contributor

xor2k commented Jan 3, 2023

This is a future topic. The test itsself was easy to do, so I just did it. I'll open a new issue then, let's first finish this one 😅

seberg added a commit that referenced this issue Jan 13, 2023
This pull requests speeds up numpy.load. Since _filter_header is quite a bottleneck, we only run it if we must. Users will get a warning if they have a legacy Numpy file so that they can save it again for faster loading.

Main discussion and benchmarks see #22898

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants