New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: unable to read .npy file with non-eval'able metadata #23169
Comments
Thanks for the report. To note, this should be a duplicate of gh-14142 and maybe gh-15488. I am not sure I think we should be storing metadata, since metadata is way too flexible for a simple format like I though we give a warning on storing now, but I am not sure. A warning+metadata stripping is the solution (IIRC) that seemed most plausible to me. But I am not sure you would like that.
Yes, that would be nice. There was some history of a PR trying to relax |
Thanks for your quick reply. I have re-read the issues you mentioned. But if I understand correctly, there are two separate issues. Please correct me if I am wrong.
Problem # 2 is revealed by the following snippet. import numpy as np
from numpy.lib.utils import safe_eval
mytype = np.dtype('<f8', metadata={"BLAH":10})
descr = np.dtype([('x', mytype)]).descr
print(descr)
print(safe_eval(repr(descr))) # no problem #1 here
dtype(descr) # problem #2: ValueError: invalid shape in fixed-type tuple. The solution to problem # 1 is unambiguous. As for problem # 2, I don't know how it can be solved, other than guessing in |
Yes, but in a weird way. @ninousf just opened the PR to drop metadata. That still seems like a decent solution to me (I have to look at the PR in detail). Maybe it would work for you if |
For me, that would work. I don't care about the metadata, and more importantly, I control both ends of the conversion pipeline ( |
Describe the issue:
I have a .npy file generated from an h5py dataset. The data is a pure Numpy array (no subclass) and the file has been written by Numpy, but the dtype contains metadata set by h5py.
In this particular case, the metadata contains a
<class 'bytes'>
expression, which cannot be handled bysafe_eval
. I don't care about the metadata, but it renders the entire file unreadable.I would consider this a bug in
np.save
. Serialization of a dtype descriptor that results in an unreadable header should fail loudly. It would be nice if thenp.save
could then be re-tried with the metadata stripped.Reproduce the code example:
Error message:
Runtime information:
Numpy version:
1.24.0
3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03)
[GCC 11.3.0]
show_runtime:
WARNING:
threadpoolctl
not found in system! Install it bypip install threadpoolctl
. Once installed, trynp.show_runtime
again for more detailed build information[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}}]
Context for the issue:
A pure Numpy array (no subclasses) saved with
np.save
should be readable bynp.load
. This issue gives a counter-example.I could try to write a patch to fix this issue, if you are interested.
The text was updated successfully, but these errors were encountered: