-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: updated meta assignment #961
Conversation
Updated the meta data assignment by: - checking type against expected label type, - updating MetaLabels if needed, and - allowing array inputs.
Fixed bugs in the str/bytes logic to handle initial and subsequent treatment of bytes as strings.
Added two new MetaLabel methods, `update` and `_eval_label_type`. The first method updates the MetaLabel class and the second one raises a TypeError if someone tries to set iterable metadata.
Added MetaLabels unit tests for the new class methods.
Added a summary of these changes to the changelog.
Removed extra whitespace and unused variables.
Improved expected meta behaviour by: - allowing MetaLabel updates in `Meta.accept_default_labels`, - casting iterables to strings by creating a string with newline separators, and - casting mismatched data to the expected type.
Added new meta tests and updated the existing tests for new behaviour.
Updated the integration test to reflect new default type.
@rstoneback can you verify this does/does not fully address your issue that was linked? |
Yep. I will take a look today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aburrell! Yep, the code does support the type of feature described in #940. I don't think I originally considered the auto new label support (at high Meta level) you added but that is a nice, user-friendly touch! I tried out the other metadata changes on an ICON IVM and MIGHTI file and I think we need to turn down the sensitivity on when the warnings are generated. More below.
pysat/_meta.py
Outdated
iattr](to_be_set) | ||
|
||
# Inform user data was recast | ||
pysat.logger.warning(''.join(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this branch out on an ICON file. It leads to a whole long list of warnings.
In [8]: ivm = pysat.Instrument('icon', 'ivm', inst_id='a', update_files=True)
In [10]: ivm.load(2020, 1)
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Altitude' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'numpy.float64'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'Valid_Min'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'ValidMax'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'ValidMin'
pysat WARNING: Metadata does not have expected type <class 'numpy.float64'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'Valid_Max'
Only showing a small fraction since it is way too much. Two contrasting requests:
- From the error I don't know what type the data loaded as. I think adding that info may be helpful.
- There are too many messages :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also tried it out on a MIGHTI file as a check to make sure it isn't just my files that produce warnings. It started to produce warnings but ran into an issue.
In [16]: mighti = pysat.Instrument('icon', 'mighti', inst_id='vector', tag='vector_wind_green')
In [19]: mighti.load(2020, 1)
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'ValidMin'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'ValidMax'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-9f1312956ee0> in <module>
----> 1 mighti.load(2020, 1)
~/Code/pysat/pysat/_meta.py in __setitem__(self, data_vars, input_dat)
442 # Update the MetaLabels object and the existing
443 # metadata to ensure all data have all labels
--> 444 self.labels.update(iattr, ikey, itype)
445 self._label_setter(ikey, ikey, type(to_be_set))
446
~/Code/pysat/pysat/_meta.py in update(self, lattr, lname, ltype)
1821 else:
1822 # This is an invalid meta data type, raise a TypeError
-> 1823 raise TypeError(''.join(['iterable types like ', repr(ltype),
1824 ' (set for ', repr(lattr),
1825 ') are not allowed']))
TypeError: iterable types like <class 'numpy.ndarray'> (set for 'Valid_Range') are not allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is foreseen. Now that there's a type check, the arrays that can sneak into meta when you first initialize the data are caught. This should be rectified separately in pysat/pysatNASA#99.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many messages :(
I could put the messages at the info level. Since the data still exists and is loaded, that could be seen as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is foreseen.
Right... I missed your note about that at the top. eek! My bad :(
I could put the messages at the info level. Since the data still exists and is loaded, that could be seen as appropriate.
Info is definitely better, thinking maybe even debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to put it at DEBUG because it would be better if people used the correct type of data, thus avoiding potential casting errors. But I am fine with INFO, because it's just not that dire. 🐺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree that it is best to use the correct types of data. However, the file creation process is separated from pysat and we are unlikely to get any existing datasets changed. I thought I had the correct types for IVM but I'm getting pages of warnings every load. The log will be full of items people can't do anything about.
I'm curious what the issue is for the IVM files. For some metadata I did pass through what I was given from earlier files but most of it was set within pysat. v2.x pysat, but still pysat.
I also thought of a potential issue with setting new labels automagically since I did the review. If a user has a typo when trying to set an existing label they'll get a new label instead. The software will still run but the result will be unexpected from user's perspective. It is their fault, but nice if software prevents issues, where possible. Confining new label creation to a function gets around that, but at the cost of requiring the user to know that a label is new. In practice, for users using non-default meta labels, these checks could potentially be repeated quite often. For those users, I think the hassle of additional checks favors the auto-magic option. Default meta label users may prefer option 2, it reduces potential for typos and they don't care about adding labels. A full cost-benefit (?) here would require info on how many users are doing what with pysat, which we don't have. This comment doesn't require a response, documenting my review thoughts on the topic. I'll probably think about it more. |
The error messages do work. I did a simple tweak to add existing type.
For the magnetic basis vectors I only see that warning on Went over to OMMBV. Sure enough,
typo on the |
I put some thought into your comment and wanted to add further clarification to advance the discussion. The reason I have Meta update automagically is to maintain the current behaviour, which updates the actual meta data for a data value using the dict keys regardless of the previous existence of those keys or not. This pull request now just makes Meta self consistent. If in the future we decide that we don't want to do this anymore, it would be a breaking change. Perhaps it would be worth saving your thoughts as an issue for discussion/recorded decision on how flexible we want Meta to be. |
Thanks and good point.
I like the feature you added. I was definitely poking at it but am not against it. A number of pysat choices were made some time back so I try to reevaluate things as we go based on the issues we've seen in the interim. Every design choice has strengths/weaknesses and as a reviewer I start out behind. I'm good now.
|
Implemented stringify, changed level of logging message, and expanded logging message.
Updated warning message to check against.
Description
Updated the meta data assignment by checking type against expected label type, updating MetaLabels if needed, and not allowing array inputs. Possibly addresses #940 and was meant to address pysat/pysatNASA#99.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Now let's raise an error!
Test Configuration:
Checklist:
develop
(notmain
) branchCHANGELOG.md
, summarizing the changesIf this is a release PR, replace the first item of the above checklist with the release
checklist on the wiki: https://github.com/pysat/pysat/wiki/Checklist-for-Release