-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.load("a.npz") very slow when a.npz file very large #26498
Comments
There is actually a https://github.com/numpy/numpy/blob/main/numpy/lib/_npyio_impl.py#L235-L263 |
#22916 proposes an enhancement to speed up numpy.load, which could potentially address our performance concerns. By improving loading speeds, we may be able to mitigate our issue. #2922 seems somewhat similar to this, involving the inability to load data larger than 2GB on 64-bit systems. Perhaps the root causes of both issues are related, and it might be worth exploring further. |
Robert already linked to what seems to be the interesting part, the above seem both unrelated and any recent fixes/speedups would not affect the difference between the two paths shown. |
BTW, If it's first time to read a.npz, this code will costs same time like before or slower than before. When I try to use multiThread to read this file, It will cost more time, I have no idea, |
File caches. Hard disks are slow, the two extra copies just don't matter in that case. |
Yes, the most time was cost for read from disk |
The more I look at this, the more I think that Doesn't mean we cannot add a work-around, but the PR isn't thread-safe, and I think it needs to be (even if I am not sure all of zipfile is). |
Another solution is to use savez_compressed() and then load the data. In this case, reading the data will no longer be a problem, Decompression in picked. load will become the main issue |
Describe the issue:
np.load("a.npz") very slow when a.npz file very large
Reproduce the code example:
the different is the input of read_array, if use zf.open(name + '.npy'), read_array will be very slow
all test code :
Error message:
Python and NumPy Versions:
python3.9
numpy1.24.4
Runtime Environment:
No response
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: