-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG Fix multiple bugs involving times #128
BUG Fix multiple bugs involving times #128
Conversation
I think these test failures are due to some incompatibilities with the latest version of xarray (0.9.0). I'll have a closer look tomorrow. The test suite passes on my local machine (with xarray 0.8.2). |
Tests are passing now. From the breaking changes section of the latest xarray docs:
|
Ok cool so just to confirm we are good on xarray 0.8.2 and 0.9? Otherwise need to change our version requirements in setup.py etc. And same questions as in #126 re: unit test |
Let's grep for all |
1. Fixed bug where we would try to correct every time coordinate (even ones that were in the Timestamp-valid range) 2. Fixed bug involving incompatability of a dt of ones (of type float) with a value of type timedelta64[D] (when converting dt to units of days)
@spencerahill what do you think about renaming the This way, since it would be indexed by the same time coordinate as the data itself, it would always get subset whenever the variable were subset in time, so there would never be a mis-match. Every time we would want to use the time weights we would just pick it off from the DataArray corresponding to the variable. This would do away with the |
If we do not add a units attribute, it will not be decoded as a timedelta. This causes problems in our logic in calc where we always convert to units of days for dt. If the type is not a timedelta type this logic raises an exception. A test was added for this in test/test_utils_times.py
@spencerkclark That seems like a good idea. Please go forward with it. |
@@ -63,7 +63,7 @@ | |||
(PFULL_STR, ('pfull',)), | |||
(PLEVEL_STR, ('level', 'lev', 'plev')), | |||
(TIME_STR, ('time',)), | |||
(AVERAGE_DT_STR, ('average_DT',)), | |||
(TIME_WEIGHTS_STR, ('time_weights, average_DT',)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot the inner quote marks: ('time_weights', 'average_DT')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
""" | ||
time = ds[internal_names.TIME_STR] | ||
unit_interval = time.attrs['units'].split('since')[0].strip() | ||
time_weights = xr.DataArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI xarray has now implemented ones_like
and similar helper functions: http://xarray.pydata.org/en/stable/generated/xarray.ones_like.html?highlight=ones_like
Might be slightly cleaner to use xarray's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Sorry, didn't realize that |
And please ping when you're done with this, I'll give it another look over then and hopefully merge. Thanks! |
@spencerahill I'm not sure if there is a straightforward way to add a unit test for this; the simplest thing that comes to mind actually would be to add some logic that raises an exception if the time coordinate on def _avg_by_year(self, arr, dt):
"""Average a sub-yearly time-series over each year."""
TIME_STR = internal_names.TIME_STR
message = ('Time weights not indexed by the same time coordinate as'
' computed data. This will lead to an improperly computed'
' time weighted average. Exiting.')
assert (arr[TIME_STR].identical(dt[TIME_STR])), message
yr_str = internal_names.TIME_STR + '.year'
return ((arr*dt).groupby(yr_str).sum(internal_names.TIME_STR) /
dt.groupby(yr_str).sum(internal_names.TIME_STR)) That way in the short term (until we refactor Calc more) we're never bitten by this failing silently. Does that sound like a reasonable compromise? |
@spencerkclark That makes sense to me. Ultimately we trust (1) that all of the xarray functionality we're using to compute the weighted average (namely groupby, sum, division, and multiplication) works, and (2) that mathematically we are actually computing it in the right way. Therefore this reduces to, as you suggest, catching if the So yes, I'd say let's go this route. Is it feasible to test that the exception is actually being caught? Also let's change it to a ValueError rather than generic assert. |
Also, re: the appveyor fail; let's try changing the appveyor conda call to
xarray 0.9.0 isn't yet on the defaults channel, only conda-forge. |
Ah makes sense; that would do it. Thanks
The best way to do this I think would be to create a separate function in |
@spencerahill something else I've noticed is that there are numerous problems caused (all over the place) if we load in data that has only one time value; I've run into this in trying to construct a test case for #129 with a file consisting of a single monthly mean. Not sure if we want to address that here, or in a separate PR / issue. It mostly has to do with the fact that in that case time is treated as a scalar coordinate and not an indexable dimension and there are numerous places where we try to do |
You know what let's move that discussion to an issue. I don't want us to have a hacky solution there. See #132. |
@spencerkclark thanks for iterating so quickly on this today. Looks great; just add a what's new and then I'll merge. |
@spencerahill let me know if that's a detailed enough what's new entry here (it's my first time writing one). |
- Bug fixes related to the start of the ``calc.py`` refactor in :issue:`90` | ||
(see :issue:`126` and :issue:`128`) | ||
- Compatability with xarray version 0.9.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In looking over xarray's and a few others' What's New sections, I think we should be more explicit:
Enhancements
~~~~~~~~~~~~
- Support for xarray 0.9.0. By `Spencer Clark <https://github.com/spencerkclark>`_.
Bug fixes
~~~~~~~~~
- Fix an instance where the name for pressure half levels was mistakenly replaced with the name for pressure full levels (:issue:`126`). By `Spencer Clark <https://github.com/spencerkclark>`_.
- Etc. for each other bug fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I updated things to be more detailed. Let me know if that looks good!
Great, thanks for all the work on this. |
Another bug I introduced in the re-factor of
calc.py
. This led to an improper sum of time weights when not all time intervals were used (i.e. one that was too large, causing seasonal averages to be too small). Yikes.Required tests:
'inst'
data)ensure_time_avg_has_cf_metadata
Other to-do's:
set_dt
logic incalc.py
such that the time weights are always subset when the variable itself is subset in time.