Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: updated meta assignment #961

Merged
merged 11 commits into from
Nov 29, 2021
Merged

ENH: updated meta assignment #961

merged 11 commits into from
Nov 29, 2021

Conversation

aburrell
Copy link
Member

@aburrell aburrell commented Nov 15, 2021

Description

Updated the meta data assignment by checking type against expected label type, updating MetaLabels if needed, and not allowing array inputs. Possibly addresses #940 and was meant to address pysat/pysatNASA#99.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

import pysat
test_inst = pysat.Instrument('pysat', 'testing')
test_inst.load(2009, 1)
test_inst.meta['dummy1'] = {'new meta': 'test string'}
test_inst.meta['dummy1'] = {'new meta': 'test string'}
test_inst.meta['dummy1'] = {'new meta': b'test bytes'}
test_inst.meta['dummy2'] = {'new_bytes': b'test bytes'}
test_inst.meta.labels.label_type

{'units': str,
 'name': str,
 'notes': str,
 'desc': str,
 'min_val': numpy.float64,
 'max_val': numpy.float64,
 'fill_val': numpy.float64,
 'new_meta': str,
 'new_bytes': str}

Now let's raise an error!

import numpy as np

test_inst.meta['dummy1'] = {'new_array': np.array([0, 1.0])}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-05d199d2ff20> in <module>
----> 1 test_inst.meta['dummy1'] = {'new_array': np.array([0, 1.0])}

~/Programs/Git/pysat/pysat/_meta.py in __setitem__(self, data_vars, input_dat)
    418                             # Update the MetaLabels object and the existing
    419                             # metadata to ensure all data have all labels
--> 420                             self.labels.update(iattr, ikey, itype)
    421                             self._label_setter(ikey, ikey, type(to_be_set))
    422 

~/Programs/Git/pysat/pysat/_meta.py in update(self, lattr, lname, ltype)
   1788         else:
   1789             # This is an invalid meta data type, raise a TypeError
-> 1790             raise TypeError(''.join(['iterable types like ', repr(ltype),
   1791                                      ' (set for ', repr(lattr),
   1792                                      ') are not allowed']))

TypeError: iterable types like <class 'numpy.ndarray'> (set for 'new_array') are not allowed

Test Configuration:

  • Operating system: OS X Mojave
  • Version number: Python 3.8
  • Any details about your local setup that are relevant

Checklist:

  • Make sure you are merging into the develop (not main) branch
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • Add a note to CHANGELOG.md, summarizing the changes

If this is a release PR, replace the first item of the above checklist with the release
checklist on the wiki: https://github.com/pysat/pysat/wiki/Checklist-for-Release

Updated the meta data assignment by:
- checking type against expected label type,
- updating MetaLabels if needed, and
- allowing array inputs.
Fixed bugs in the str/bytes logic to handle initial and subsequent treatment of bytes as strings.
Added two new MetaLabel methods, `update` and `_eval_label_type`.  The first method updates the MetaLabel class and the second one raises a TypeError if someone tries to set iterable metadata.
Added MetaLabels unit tests for the new class methods.
Added a summary of these changes to the changelog.
Removed extra whitespace and unused variables.
Improved expected meta behaviour by:
- allowing MetaLabel updates in `Meta.accept_default_labels`,
- casting iterables to strings by creating a string with newline separators, and
- casting mismatched data to the expected type.
Added new meta tests and updated the existing tests for new behaviour.
Updated the integration test to reflect new default type.
@aburrell aburrell marked this pull request as ready for review November 19, 2021 18:48
@aburrell
Copy link
Member Author

@rstoneback can you verify this does/does not fully address your issue that was linked?

@aburrell aburrell added this to the 3.1.0 Release milestone Nov 19, 2021
@aburrell aburrell linked an issue Nov 19, 2021 that may be closed by this pull request
@rstoneback
Copy link
Collaborator

@rstoneback can you verify this does/does not fully address your issue that was linked?

Yep. I will take a look today.

Copy link
Collaborator

@rstoneback rstoneback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aburrell! Yep, the code does support the type of feature described in #940. I don't think I originally considered the auto new label support (at high Meta level) you added but that is a nice, user-friendly touch! I tried out the other metadata changes on an ICON IVM and MIGHTI file and I think we need to turn down the sensitivity on when the warnings are generated. More below.

pysat/_meta.py Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
pysat/_meta.py Outdated
iattr](to_be_set)

# Inform user data was recast
pysat.logger.warning(''.join((
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this branch out on an ICON file. It leads to a whole long list of warnings.

In [8]: ivm = pysat.Instrument('icon', 'ivm', inst_id='a', update_files=True)

In [10]: ivm.load(2020, 1)
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Altitude' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'numpy.float64'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'Valid_Min'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'ValidMax'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'ValidMin'
pysat WARNING: Metadata does not have expected type <class 'numpy.float64'>. Recasting input for 'ICON_L27_Fractional_Ion_Density_H' with key 'Valid_Max'

Only showing a small fraction since it is way too much. Two contrasting requests:

  • From the error I don't know what type the data loaded as. I think adding that info may be helpful.
  • There are too many messages :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tried it out on a MIGHTI file as a check to make sure it isn't just my files that produce warnings. It started to produce warnings but ran into an issue.

In [16]: mighti = pysat.Instrument('icon', 'mighti', inst_id='vector', tag='vector_wind_green')

In [19]: mighti.load(2020, 1)
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'FillVal'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'ValidMin'
pysat WARNING: Metadata does not have expected type <class 'float'>. Recasting input for 'Epoch' with key 'ValidMax'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-9f1312956ee0> in <module>
----> 1 mighti.load(2020, 1)

~/Code/pysat/pysat/_meta.py in __setitem__(self, data_vars, input_dat)
    442                             # Update the MetaLabels object and the existing
    443                             # metadata to ensure all data have all labels
--> 444                             self.labels.update(iattr, ikey, itype)
    445                             self._label_setter(ikey, ikey, type(to_be_set))
    446 

~/Code/pysat/pysat/_meta.py in update(self, lattr, lname, ltype)
   1821         else:
   1822             # This is an invalid meta data type, raise a TypeError
-> 1823             raise TypeError(''.join(['iterable types like ', repr(ltype),
   1824                                      ' (set for ', repr(lattr),
   1825                                      ') are not allowed']))

TypeError: iterable types like <class 'numpy.ndarray'> (set for 'Valid_Range') are not allowed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is foreseen. Now that there's a type check, the arrays that can sneak into meta when you first initialize the data are caught. This should be rectified separately in pysat/pysatNASA#99.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many messages :(

I could put the messages at the info level. Since the data still exists and is loaded, that could be seen as appropriate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is foreseen.

Right... I missed your note about that at the top. eek! My bad :(

I could put the messages at the info level. Since the data still exists and is loaded, that could be seen as appropriate.

Info is definitely better, thinking maybe even debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to put it at DEBUG because it would be better if people used the correct type of data, thus avoiding potential casting errors. But I am fine with INFO, because it's just not that dire. 🐺

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree that it is best to use the correct types of data. However, the file creation process is separated from pysat and we are unlikely to get any existing datasets changed. I thought I had the correct types for IVM but I'm getting pages of warnings every load. The log will be full of items people can't do anything about.

I'm curious what the issue is for the IVM files. For some metadata I did pass through what I was given from earlier files but most of it was set within pysat. v2.x pysat, but still pysat.

@rstoneback
Copy link
Collaborator

I also thought of a potential issue with setting new labels automagically since I did the review. If a user has a typo when trying to set an existing label they'll get a new label instead. The software will still run but the result will be unexpected from user's perspective. It is their fault, but nice if software prevents issues, where possible.

Confining new label creation to a function gets around that, but at the cost of requiring the user to know that a label is new. In practice, for users using non-default meta labels, these checks could potentially be repeated quite often. For those users, I think the hassle of additional checks favors the auto-magic option. Default meta label users may prefer option 2, it reduces potential for typos and they don't care about adding labels.

A full cost-benefit (?) here would require info on how many users are doing what with pysat, which we don't have.

This comment doesn't require a response, documenting my review thoughts on the topic. I'll probably think about it more.

@rstoneback
Copy link
Collaborator

The error messages do work.

I did a simple tweak to add existing type.

pysat WARNING: Metadata does not have expected type <class 'numpy.float64'><class 'numpy.int64'>. Recasting input for 'ICON_L27_Unit_Vector_Meridional_Y' with key 'Valid_Max'
pysat WARNING: Metadata does not have expected type <class 'float'><class 'numpy.int64'>. Recasting input for 'ICON_L27_Unit_Vector_Meridional_Z' with key 'ValidMax'

For the magnetic basis vectors I only see that warning on ValidMax.

Went over to OMMBV. Sure enough,

                              'value_min': -1.,
                              'value_max': 1}

typo on the value_max setting, copied in a number of places in the code.

@aburrell
Copy link
Member Author

I also thought of a potential issue with setting new labels automagically since I did the review.

I put some thought into your comment and wanted to add further clarification to advance the discussion. The reason I have Meta update automagically is to maintain the current behaviour, which updates the actual meta data for a data value using the dict keys regardless of the previous existence of those keys or not. This pull request now just makes Meta self consistent. If in the future we decide that we don't want to do this anymore, it would be a breaking change. Perhaps it would be worth saving your thoughts as an issue for discussion/recorded decision on how flexible we want Meta to be.

@rstoneback
Copy link
Collaborator

I also thought of a potential issue with setting new labels automagically since I did the review.

I put some thought into your comment and wanted to add further clarification to advance the discussion. The reason I have Meta update automagically is to maintain the current behaviour, which updates the actual meta data for a data value using the dict keys regardless of the previous existence of those keys or not. This pull request now just makes Meta self consistent.

Thanks and good point.

If in the future we decide that we don't want to do this anymore, it would be a breaking change. Perhaps it would be worth saving your thoughts as an issue for discussion/recorded decision on how flexible we want Meta to be.

I like the feature you added. I was definitely poking at it but am not against it. A number of pysat choices were made some time back so I try to reevaluate things as we go based on the issues we've seen in the interim. Every design choice has strengths/weaknesses and as a reviewer I start out behind. I'm good now.

  • Since we have the check for NaNs in metadata when writing files there is protection against typos for floats. Not for strs. Int values default to -1 but since that is a valid value itself I don't think we should build a check there. It'd be nice if int data type supported nan.

Implemented stringify, changed level of logging message, and expanded logging message.
Updated warning message to check against.
@aburrell aburrell merged commit 30df7ee into develop Nov 29, 2021
@aburrell aburrell deleted the meta_labels_type branch November 29, 2021 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easier to expand metalabels
2 participants