ENH: Faster numpy.save and numpy.load #22898

divideconcept · 2022-12-28T08:16:07Z

Proposed new feature or change:

I find numpy.save and numpy.load quite slow, in particular with small arrays, which can become a bottleneck when dealing with large datasets in machine learning.

I came up with faster save/load routines for simple arrays (standard dtypes, no Fortran ordering or nested/labelled data) - which probably represents the vast majority of use cases: https://github.com/divideconcept/fastnumpyio

numpy.save: 0:00:00.813492
fastnumpyio.save: 0:00:00.334398
numpy.load: 0:00:07.910804
fastnumpyio.load: 0:00:00.306737

Maybe there's a way to optimize numpy.save()/numpy.load() npy header parsing for simple arrays ?

The text was updated successfully, but these errors were encountered:

charris · 2022-12-28T14:03:25Z

What NumPy version is that?

divideconcept · 2022-12-28T14:40:28Z

The latest one for both platforms, on recent machines and OS, same speed difference on both:

Windows 11, Python 3.9.12 x64, Numpy 1.24.0, Intel Core i7-12700K:

numpy.save: 0:00:00.786250
fastnumpyio.save: 0:00:00.329080
numpy.load: 0:00:09.689329
fastnumpyio.load: 0:00:00.341074

macOS 12.5, Python 3.9.15 arm64, Numpy 1.24.0, Apple M1:

numpy.save: 0:00:00.831839
fastnumpyio.save: 0:00:00.389113
numpy.load: 0:00:07.552911
fastnumpyio.load: 0:00:00.301430

See https://github.com/divideconcept/fastnumpyio for more details about the benchmark and implementation.

xor2k · 2022-12-28T19:36:58Z

Hi everybody, great to see some more effort in this topic 🤩

Your demo definitely shows how easy handling .npy files could be like.

npyio.py and format.py, the home of numpy.load and numpy.save have a lot of complexity, in particular with handling the different .npy versions described in

https://numpy.org/neps/nep-0001-npy-format.html
https://numpy.org/devdocs/reference/generated/numpy.lib.format.html

One could try to find the bottleneck. Maybe it can be fixed easily by modifying some lines here and there.

Meanwhile, maybe

https://github.com/xor2k/npy-append-array

could help you out. I mostly use it by putting all the data into one big .npy file and having an extra array with offsets with which I can access every individual dataset (handling the offsets is just 20 lines or so). Memory mapping also speeds up things a lot. I pushed a big commit today with a big cleanup and some extra features (ensure_appendable and recover).

xor2k · 2022-12-28T20:13:43Z

Btw. I've just seen quite an interesting talk regarding .npy performance:

https://www.youtube.com/watch?v=HLH5AwF-jx4

divideconcept · 2022-12-28T21:00:12Z

Thanks for the references, I'll have a look.

To me the whole bottleneck of numpy.save/numpy.load (specially numpy.load) is that the .npy format has an ascii header with a lot of possible combinations. It's great to have this flexibility, but I'm pretty sure 99% of numpy users just use plain simple arrays, without data nesting, fortran ordering or labelling, and just npy format version 1.0 - and trying to parse this header fully for all possible cases is just a waste of time for the vast majority of numpy users.

To speed up loading in that case, there should be a quick detector at the start of numpy.load to check if we're dealing with a simple array or something more complex. This can be achieved with a simple character detection in the header text, for example count the number of : ' and (. If it's a simple header go with quick parsing, if not do the full parsing.

xor2k · 2022-12-28T22:38:38Z

I ran SnakeViz on it, result see below. Tokenize and untokenize consume a lot of time. Is this really necessary or can this be sped up? I guess json.load might be a lot faster.

seberg · 2023-01-02T11:50:00Z

_filter_header is just for Python 2 support. That practically never happens, so could probably done after a try:/except: statement.

xor2k · 2023-01-03T10:28:38Z

_filter_header is just for Python 2 support. That practically never happens, so could probably done after a try:/except: statement.

I have implemented this, pull request see #22916

Here some new benchmarks (different machine, don't compare to the images above):

Before:

Times without using cProfile (more realistic):

numpy.save: 0:00:00.876054
fastnumpyio.save: 0:00:00.268091
fastnumpyio.pack: 0:00:00.231627
numpy.load: 0:00:07.604519
fastnumpyio.load: 0:00:00.282154
fastnumpyio.unpack: 0:00:00.176749
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

After:

Times without using cProfile (more realistic):

numpy.save: 0:00:00.789263
fastnumpyio.save: 0:00:00.268508
fastnumpyio.pack: 0:00:00.231895
numpy.load: 0:00:02.769206
fastnumpyio.load: 0:00:00.293511
fastnumpyio.unpack: 0:00:00.176417
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

xor2k · 2023-01-03T11:53:50Z

Here for comparison a hypothetical case when using JSON for header encoding/decoding (same machine as in comment before):

Times without using cProfile (more realistic):

numpy.save: 0:00:00.915063
fastnumpyio.save: 0:00:00.267978
fastnumpyio.pack: 0:00:00.231612
numpy.load: 0:00:01.389364
fastnumpyio.load: 0:00:00.278797
fastnumpyio.unpack: 0:00:00.173742
numpy.save+numpy.load == fastnumpyio.save+fastnumpyio.load: True
numpy.save+numpy.load == fastnumpyio.pack+fastnumpyio.unpack: True

seberg · 2023-01-03T12:09:14Z

I won't disagree that it would be nice to use json (eventually?), but how do you ensure forward compatibility (old versions loading newly generated files)? It also may be non-trivial for dtypes with fields.

xor2k · 2023-01-03T12:36:22Z

This is a future topic. The test itsself was easy to do, so I just did it. I'll open a new issue then, let's first finish this one 😅

This pull requests speeds up numpy.load. Since _filter_header is quite a bottleneck, we only run it if we must. Users will get a warning if they have a legacy Numpy file so that they can save it again for faster loading. Main discussion and benchmarks see #22898 Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

xor2k mentioned this issue Jan 3, 2023

ENH: Faster numpy.load (try/except _filter_header) #22916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Faster numpy.save and numpy.load #22898

ENH: Faster numpy.save and numpy.load #22898

divideconcept commented Dec 28, 2022

charris commented Dec 28, 2022

divideconcept commented Dec 28, 2022

xor2k commented Dec 28, 2022 •

edited

xor2k commented Dec 28, 2022

divideconcept commented Dec 28, 2022

xor2k commented Dec 28, 2022 •

edited

seberg commented Jan 2, 2023

xor2k commented Jan 3, 2023 •

edited

xor2k commented Jan 3, 2023 •

edited

seberg commented Jan 3, 2023

xor2k commented Jan 3, 2023

ENH: Faster numpy.save and numpy.load #22898

ENH: Faster numpy.save and numpy.load #22898

Comments

divideconcept commented Dec 28, 2022

Proposed new feature or change:

charris commented Dec 28, 2022

divideconcept commented Dec 28, 2022

xor2k commented Dec 28, 2022 • edited

xor2k commented Dec 28, 2022

divideconcept commented Dec 28, 2022

xor2k commented Dec 28, 2022 • edited

seberg commented Jan 2, 2023

xor2k commented Jan 3, 2023 • edited

xor2k commented Jan 3, 2023 • edited

seberg commented Jan 3, 2023

xor2k commented Jan 3, 2023

xor2k commented Dec 28, 2022 •

edited

xor2k commented Dec 28, 2022 •

edited

xor2k commented Jan 3, 2023 •

edited

xor2k commented Jan 3, 2023 •

edited