-
-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_cdf() crashes if time-dependent variable doesn't provide 'UNITS' #5907
Comments
According to the ISTP guide on CDF files, which is the standard we support, UNITS or UNITS_PTR is a required field - https://spdf.gsfc.nasa.gov/istp_guide/variables.html. So if there are neither in the CDF file, I think we should be throwing an error. We should catch the KeyError though, and raise a nicer error message ourselves. |
Thinking a bit more about this, maybe the pragmatic thing to do is to raise a warning, and either:
This way it's still possible to read the rest of the CDF file. I think I'd choose 1. to be on the safe side (instead of assuming untis), @jgieseler do you have any thoughts on which would be preferable? |
Yeah, I already guessed that the files might not follow the standard (they started 2006). I think a pragmatic approach is worthwhile in this case, and I actually would favor option 2. Because right now also dimensionless units are assigned if the provided unit string is not understood. We wouldn't provide "wrong" information anyhow. And we don't know if some CDF file might contain some crucial variable that would get omitted. I think for the users, it's better to get a variable without units so that they really notice it and have the option to do something about it, rather than just drop the variables and provide a small warning that might easily get overlooked. |
Good point about what we currently do for not understood unit strings 👍 Should be a pretty simple fix then - would you be up for opening a pull request with it? |
Yes, I'll take care of it. |
If I understand this correctly, there is an error in the CDF file. Do you know if the SPDF is aware of this error? |
I'm not sure if this is a real error in the CDF files or just "not following the (current) standard". And I don't know if SPDF is aware. Because the mission is already quite old (launched in 2006), I'm afraid there won't be too many people left working on the data. 😕 But as I just realize, the problem seems to affect the whole Level-1 dataset of the IMPACT suite of instruments of STEREO. Usually people would just use the Level-2 data anyhow. But one problem is that for some data only Level-1 is available (e.g. the 1-min STEREO/LET data from the beginning of this issue). And another problem is that CDAWeb for most IMPACT instruments only provides the Level-1 data. I guess because for most (all?) instruments only Level-1 data was released as CDF files... So long story short, I probably should inform SPDF and some part of the IMPACT team (I think I remember who created some of these CDF files back in the day). |
Yes, please contact the SPDF and the IMPACT team. I think they would be interested in this, if they don't already know. As a separate matter I wonder if we should have a couple of new labels to point out issues with source CDF and FITS files. If SunPy users are having problems with the source data then the data providers should be made aware of them. Having separate GitHub labels would make it easy to identify these in the SunPy issue list. |
I think the proposed fix of issuing a warning and setting UNITS to dimensionless is the way to go. While the ISTP Guidelines calls for the UNITS (or UNIT_PTR for multiple units attribute, and set to a blank space if dimensionless) as required, it's not uncommon to miss an attribute sometimes, especially for unit-less variables. SPDF/CDAWeb uses a set of metadata-only CDF called master CDFs to over-ride and add metadata in the data files, in lieu of modifying the archived data files themselves. It might not be feasible, but the SunPy read_cdf routine could check https://spdf.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/ for a matching master and open it first to get the metadata. For attributes with a blank in the Master CD, the values for that attribute should be taken from the data CDF instead if available. Alternatively, one could call our Python cdawsws library https://cdaweb.gsfc.nasa.gov/WebServices/REST/py/cdasws/ and ask for the specific variables and time range and get back a generated CDF that will already have the better metadata. |
Describe the bug
I try to obtain STEREO in-situ particle data of the LET instrument from CDAWeb via Sunpy's new CDAWeb client (#5558). The actual data files are CDF's, so internally
read_cdf()
from sunpy.io.cdf (#5435) is used (both by @dstansby).The download of the CDF files is working, but when I try to read the cdf files,
read_cdf()
produces aKeyError
because it tries to read the attribute'UNITS'
that is not provided in the CDF file for this specific variable.To Reproduce
What happened?
The problem is that
read_cdf()
addresses the variable attribute'UNITS'
without checking if it exists:sunpy/sunpy/io/cdf.py
Line 74 in e21846a
So a quick workaround would be this:
I can submit that as a pull request, but before I wanted to check that this is an acceptable approach. Or whether the general opinion is that the problem lies in the ill-defined CDF file and that it should be fixed there.
By the way, the (first) variable causing the KeyError in this case actually is dimensionless, while the other physical measurements do provide
'UNITS'
:Expected behavior
Return (a list of) GenericTimeSeries
Screenshots
No response
System Details
Installation method
conda
The text was updated successfully, but these errors were encountered: