Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF: also decode time bounds when available #2571

Merged
merged 7 commits into from Dec 19, 2018
Merged

Conversation

fmaussion
Copy link
Member

@fmaussion fmaussion commented Nov 24, 2018

Not sure if this is the best way to handle it, but it seems to work

@pep8speaks
Copy link

Hello @fmaussion! Thanks for submitting the PR.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @fmaussion! This seems like the right place to handle this.

One minor design question here is: how strictly do we want to adhere to roundtripping variable attributes in Datasets? With this implementation, if one decodes the times, and then saves things back out to a file, a "time bounds" variable will contain units and calendar attributes even though it might not have them initially. I'm not sure if this is a big deal (in this case it is simply adding redundant metadata), but I just wanted to bring it up in case it was a concern.

@@ -624,6 +624,62 @@ def test_decode_cf(calendar):
assert ds.test.dtype == np.dtype('M8[ns]')


def test_decode_cf_time_bounds():

da = DataArray(np.arange(6).reshape((3, 2)), coords={'time': [1, 2, 3]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it make sense to make this DataArray a fixture and split these cases into separate tests?

new_attrs['calendar'] = attrs['calendar']
to_update[attrs['bounds']] = new_attrs
# For all bound variables, update their time attribute if not present
for k, new_attrs in to_update.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could update the attributes of the potential bounds variables immediately within the for v in variables.values() loop rather than use another loop here (this would remove the need for the to_update dictionary).

@@ -350,6 +350,25 @@ def stackable(dim):
drop_variables = []
drop_variables = set(drop_variables)

# Time bounds coordinates might miss the decoding attributes
# https://github.com/pydata/xarray/issues/2565
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it also make sense to add a comment with the URL for the "Cell Boundaries" section of the CF conventions here?

to_update = dict()
for v in variables.values():
attrs = v.attrs
if ('units' in attrs and 'since' in attrs['units'] and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to write something like:

has_date_units = 'units' in attrs and 'since' in attrs['units']
if has_date_units and 'bounds' in attrs:

@shoyer
Copy link
Member

shoyer commented Nov 25, 2018

Could we move this logic into a separate helper function? That would make things a little better organized and easier to test.

@fmaussion
Copy link
Member Author

fmaussion commented Nov 29, 2018

I addressed all comments except the fixtures, which seemed a bit overkill (but I simplified the tests). Thanks!

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fmaussion, just a minor optional naming suggestion, otherwise this looks good to me!

has_date_units = 'units' in attrs and 'since' in attrs['units']
if has_date_units and 'bounds' in attrs:
if attrs['bounds'] in variables:
to_update = variables[attrs['bounds']].attrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a clearer name would be bounds_attrs?

Suggested change
to_update = variables[attrs['bounds']].attrs
bounds_attrs = variables[attrs['bounds']].attrs

@fmaussion
Copy link
Member Author

Thanks! Made the change as requested. Regarding the general design:

With this implementation, if one decodes the times, and then saves things back out to a file, a "time bounds" variable will contain units and calendar attributes even though it might not have them initially. I'm not sure if this is a big deal (in this case it is simply adding redundant metadata), but I just wanted to bring it up in case it was a concern.

I agree - I don't think it is a big deal either. It must also be noted that this might break some code in downstream libraries (including mine) which has built workarounds for this and will now error because the bounds are already decoded. Here also, I'm quite confident that this is an edge case but its worth mentioning. Should I put my what's new in "Breaking changes" instead?

@spencerkclark
Copy link
Member

It must also be noted that this might break some code in downstream libraries (including mine) which has built workarounds for this and will now error because the bounds are already decoded. Here also, I'm quite confident that this is an edge case but its worth mentioning. Should I put my what's new in "Breaking changes" instead?

I for sure see this perspective. I also think a plausible case could be made that this change is like a "bug fix," that is something that people may have needed to work around before in various ways, but ultimately should not have needed to. So I think it's up to you what you think is best.

If you do decide to shift it to the breaking changes section, I would suggest being a little more specific about under what circumstances the behavior is changing (i.e. this only impacts the behavior for time bounds variables defined via CF conventions that do not already have "units" and "calendar" attributes).

@shoyer
Copy link
Member

shoyer commented Dec 12, 2018

+1 for putting this in "Breaking changes" (we try to be pretty conservative with the definition of a strict "Enhancement"), but otherwise this looks good to me.

@fmaussion
Copy link
Member Author

+1 for putting this in "Breaking changes"

Sorry for being so slow. I just done it, will merge tomorrow once the tests pass.

@dcherian dcherian merged commit 57348ab into pydata:master Dec 19, 2018
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 2, 2019
* master:
  DEP: drop python 2 support and associated ci mods (pydata#2637)
  TST: silence warnings from bottleneck (pydata#2638)
  revert to dev version
  DOC: fix docstrings and doc build for 0.11.1
  Source encoding always set when opening datasets (pydata#2626)
  Add flake check to travis (pydata#2632)
  Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633)
  silence import warning (pydata#2635)
  fill_value in shift (pydata#2470)
  Flake fixed (pydata#2629)
  Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413)
  Fix failure in time encoding for pandas < 0.21.1 (pydata#2630)
  Fix multiindex selection (pydata#2621)
  Close files when CachingFileManager is garbage collected (pydata#2595)
  added some logic to deal with rasterio objects in addition to filepaths (pydata#2589)
  Get 0d slices of ndarrays directly from indexing (pydata#2625)
  FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615)
  CF: also decode time bounds when available (pydata#2571)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants