Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Fix multiple bugs involving times #128

Merged
merged 18 commits into from
Jan 31, 2017

Conversation

spencerkclark
Copy link
Collaborator

@spencerkclark spencerkclark commented Jan 27, 2017

Another bug I introduced in the re-factor of calc.py. This led to an improper sum of time weights when not all time intervals were used (i.e. one that was too large, causing seasonal averages to be too small). Yikes.

Required tests:

  • Test that weighted time average is computed with the proper sum of weights
  • Test on data that does not have time bounds (e.g 'inst' data)
  • Test that Timestamp minimum year workaround is not applied to dates that are in the valid range
  • Test that units are added to the computed time weights in ensure_time_avg_has_cf_metadata

Other to-do's:

  • Double check that all DataArray calls to rename are compatible with xarray 0.9.0 (note there was no change to the Dataset method)
  • Refactor set_dt logic in calc.py such that the time weights are always subset when the variable itself is subset in time.
  • What's new entry

@spencerkclark
Copy link
Collaborator Author

spencerkclark commented Jan 27, 2017

I think these test failures are due to some incompatibilities with the latest version of xarray (0.9.0). I'll have a closer look tomorrow. The test suite passes on my local machine (with xarray 0.8.2).

@spencerkclark
Copy link
Collaborator Author

Tests are passing now. From the breaking changes section of the latest xarray docs:

DataArray.rename() behavior changed to strictly change the DataArray.name if called with string argument, or strictly change coordinate names if called with dict-like argument.

@spencerahill
Copy link
Owner

Ok cool so just to confirm we are good on xarray 0.8.2 and 0.9? Otherwise need to change our version requirements in setup.py etc.

And same questions as in #126 re: unit test

@spencerahill
Copy link
Owner

Let's grep for all rename calls to make sure we don't miss this elsewhere

1. Fixed bug where we would try to correct every time coordinate (even ones that were in the Timestamp-valid range)
2. Fixed bug involving incompatability of a dt of ones (of type float) with a value of type timedelta64[D] (when converting dt to units of days)
@spencerkclark spencerkclark changed the title BUG Need to subset self.dt for desired dates BUG Fix multiple bugs involving times Jan 27, 2017
@spencerkclark
Copy link
Collaborator Author

@spencerahill what do you think about renaming the AVERAGE_DT_STR coordinate TIME_WEIGHTS_STR and carrying it in all loaded variables (both variables where the time weights differ from one another (like in monthly mean time series data) and variables where the time weights are the same (like instantaneous data))?

This way, since it would be indexed by the same time coordinate as the data itself, it would always get subset whenever the variable were subset in time, so there would never be a mis-match. Every time we would want to use the time weights we would just pick it off from the DataArray corresponding to the variable. This would do away with the dt attribute of Calc objects.

If we do not add a units attribute, it will not be decoded as a
timedelta.  This causes problems in our logic in calc where we always
convert to units of days for dt.  If the type is not a timedelta type
this logic raises an exception.

A test was added for this in test/test_utils_times.py
@spencerahill
Copy link
Owner

@spencerkclark That seems like a good idea. Please go forward with it.

@spencerahill spencerahill added this to the v0.1.1 milestone Jan 27, 2017
@@ -63,7 +63,7 @@
(PFULL_STR, ('pfull',)),
(PLEVEL_STR, ('level', 'lev', 'plev')),
(TIME_STR, ('time',)),
(AVERAGE_DT_STR, ('average_DT',)),
(TIME_WEIGHTS_STR, ('time_weights, average_DT',)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you forgot the inner quote marks: ('time_weights', 'average_DT')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

"""
time = ds[internal_names.TIME_STR]
unit_interval = time.attrs['units'].split('since')[0].strip()
time_weights = xr.DataArray(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI xarray has now implemented ones_like and similar helper functions: http://xarray.pydata.org/en/stable/generated/xarray.ones_like.html?highlight=ones_like

Might be slightly cleaner to use xarray's

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@spencerahill
Copy link
Owner

Sorry, didn't realize that xr.ones_like is new to 0.9; hence the AppVeyor fail. So can you bump the dependency number in setup.py? That should do the trick.

@spencerahill
Copy link
Owner

And please ping when you're done with this, I'll give it another look over then and hopefully merge. Thanks!

@spencerkclark
Copy link
Collaborator Author

spencerkclark commented Jan 30, 2017

Test that weighted time average is computed with the proper sum of weights

@spencerahill I'm not sure if there is a straightforward way to add a unit test for this; the simplest thing that comes to mind actually would be to add some logic that raises an exception if the time coordinate on dt is not equal to the time coordinate on arr in Calc._avg_by_year:

   def _avg_by_year(self, arr, dt):
        """Average a sub-yearly time-series over each year."""
        TIME_STR = internal_names.TIME_STR
        message = ('Time weights not indexed by the same time coordinate as'
                   ' computed data.  This will lead to an improperly computed'
                   ' time weighted average.  Exiting.')
        assert (arr[TIME_STR].identical(dt[TIME_STR])), message
        
        yr_str = internal_names.TIME_STR + '.year'
        return ((arr*dt).groupby(yr_str).sum(internal_names.TIME_STR) /
                dt.groupby(yr_str).sum(internal_names.TIME_STR))

That way in the short term (until we refactor Calc more) we're never bitten by this failing silently. Does that sound like a reasonable compromise?

@spencerahill
Copy link
Owner

spencerahill commented Jan 30, 2017

@spencerkclark That makes sense to me. Ultimately we trust (1) that all of the xarray functionality we're using to compute the weighted average (namely groupby, sum, division, and multiplication) works, and (2) that mathematically we are actually computing it in the right way.

Therefore this reduces to, as you suggest, catching if the dt array is bad. And the only way we could be sure that's the case is if it doesn't align with the time array.

So yes, I'd say let's go this route. Is it feasible to test that the exception is actually being caught? Also let's change it to a ValueError rather than generic assert.

@spencerahill
Copy link
Owner

spencerahill commented Jan 30, 2017

Also, re: the appveyor fail; let's try changing the appveyor conda call to

  - "conda install --yes --quiet -c conda-forge pip pytest numpy scipy pandas netCDF4 toolz dask xarray"

xarray 0.9.0 isn't yet on the defaults channel, only conda-forge.

@spencerkclark
Copy link
Collaborator Author

xarray 0.9.0 isn't yet on the defaults channel, only conda-forge.

Ah makes sense; that would do it. Thanks

So yes, I'd say let's go this route. Is it feasible to test that the exception is actually being caught? Also let's change it to a ValueError rather than generic assert.

The best way to do this I think would be to create a separate function in utils.times to handle the error checking (call it say assert_matching_time_coord) and test that independently. I'll give that a try now.

@spencerkclark
Copy link
Collaborator Author

spencerkclark commented Jan 30, 2017

@spencerahill something else I've noticed is that there are numerous problems caused (all over the place) if we load in data that has only one time value; I've run into this in trying to construct a test case for #129 with a file consisting of a single monthly mean. Not sure if we want to address that here, or in a separate PR / issue.

It mostly has to do with the fact that in that case time is treated as a scalar coordinate and not an indexable dimension and there are numerous places where we try to do da.sel or da.isel on the time dimension throughout the code (these all fail because we can't index on time in that case).

@spencerkclark
Copy link
Collaborator Author

spencerkclark commented Jan 30, 2017

There may be a simple fix...let me try something out.

You know what let's move that discussion to an issue. I don't want us to have a hacky solution there. See #132.

@spencerahill
Copy link
Owner

@spencerkclark thanks for iterating so quickly on this today. Looks great; just add a what's new and then I'll merge.

@spencerkclark
Copy link
Collaborator Author

@spencerahill let me know if that's a detailed enough what's new entry here (it's my first time writing one).

- Bug fixes related to the start of the ``calc.py`` refactor in :issue:`90`
(see :issue:`126` and :issue:`128`)
- Compatability with xarray version 0.9.0

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking over xarray's and a few others' What's New sections, I think we should be more explicit:

Enhancements
~~~~~~~~~~~~
- Support for xarray 0.9.0.  By `Spencer Clark <https://github.com/spencerkclark>`_.

Bug fixes
~~~~~~~~~
- Fix an instance where the name for pressure half levels was mistakenly replaced with the name for pressure full levels (:issue:`126`).  By `Spencer Clark <https://github.com/spencerkclark>`_.
- Etc. for each other bug fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I updated things to be more detailed. Let me know if that looks good!

@spencerahill
Copy link
Owner

Great, thanks for all the work on this.

@spencerahill spencerahill merged commit bd6b0c4 into spencerahill:develop Jan 31, 2017
@spencerkclark spencerkclark deleted the fix-dt-subset branch January 31, 2017 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants