-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in numpy date range workaround #139
Fix bug in numpy date range workaround #139
Conversation
This fixes a bug in which the data loaded would always start from the first specified file, rather than the actual start date the user wanted.
aospy/calc.py
Outdated
|
||
# What was this used for (it isn't used anymore)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerahill I've commented out this code here; it appears that self.date_range
is never used by Calc
. In addition, this means that we never use utils.times.create_monthly_time_array
. Should we delete this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's get rid of it.
aospy/data_loader.py
Outdated
@@ -112,7 +112,7 @@ def _prep_time_data(ds): | |||
Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the docstring to reflect the new return of Dataset, min_year
@spencerahill thanks for the initial review -- let me know if you have any further comments. |
@@ -369,5 +371,22 @@ def test_data_name_gfdl_seasonal(self): | |||
self.assertEqual(result, expected) | |||
|
|||
|
|||
class LoadVariableTestCase(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerahill I think net the coverage decrease is because I removed a number of lines of code that would be run (but never actually did anything meaningful). This is the test case that should definitively make sure this bug is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Test looks good.
aospy/internal_names.py
Outdated
AVG_START_DATE_STR = 'avg_start_date' | ||
AVG_END_DATE_STR = 'avg_end_date' | ||
AVG_START_DATE_STR = 'data_start_date' | ||
AVG_END_DATE_STR = 'data_end_date' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerahill I think these attributes could use a name change. I'm not sure what the best option would be, but in this case, I think the repr of computed data is a little misleading. E.g. this is a calculation that was done on years 3 through 6 on some idealized model output:
<xarray.Dataset>
Dimensions: (lat: 64, lon: 128, pfull: 30)
Coordinates:
* lon (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 ...
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
* pfull (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 ...
avg_start_date datetime64[ns] 1678-01-01
avg_end_date datetime64[ns] 1684-01-01
sfc_area (lat, lon) float64 3.553e+09 3.553e+09 3.553e+09 ...
land_mask (lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Data variables:
equiv_potential_temp (pfull, lat, lon) float64 999.5 999.6 999.6 999.7 ...
These attributes make it seem as though the length of the time period the average was computed for was 6 years, when this is not the case. These attributes describe the full length of the data available to the DataLoader (not what is ultimately subset). I'm not super happy with the names I have here. Do you have any better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I agree that avg
is bad. What about just start_date
and end_date
?
There's a bigger issue though: we should really be encoding into the datasets the start and end dates of the actual calculation, not the span of data from which it was loaded.
How difficult do you think that would be? If not bad, you can include here. Otherwise let's create a new issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes serve an important internal purpose, so we shouldn't change what they represent (they fix #72). I'll add new ones to track the actual subset range.
I've renamed these 'raw_data_start_date'
and 'raw_data_end_date'
, and I've called the new attributes 'subset_start_date'
and 'subset_end_date'
. Let me know if that's a little clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New repr for same calculation:
<xarray.Dataset>
Dimensions: (lat: 64, lon: 128, pfull: 30)
Coordinates:
* lon (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 ...
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
* pfull (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 ...
raw_data_start_date datetime64[ns] 1678-01-01
raw_data_end_date datetime64[ns] 1684-01-01
subset_start_date datetime64[ns] 1680-01-01
subset_end_date datetime64[ns] 1683-12-31
sfc_area (lat, lon) float64 3.553e+09 3.553e+09 3.553e+09 ...
land_mask (lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Data variables:
equiv_potential_temp (pfull, lat, lon) float64 999.5 999.6 999.6 999.7 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect. Thanks!
aospy/utils/times.py
Outdated
def extract_date_range_and_months(time, start_date, end_date, months): | ||
"""Extract times within a specified date range and months of the year. | ||
def extract_months(time, months): | ||
"""Extract times within specified months of the year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space between specified and months
@spencerkclark I've finished looking through; nothing to add beyond my latest in-line comments. AppVeyor fail is something weird not related to your edits, so let's ignore. I'll look into if it keeps happening on other PRs. |
@spencerkclark do you feel like this is good to go now? (after the last couple nits I just added) |
aospy/utils/times.py
Outdated
@@ -155,18 +155,20 @@ def numpy_datetime_range_workaround(date): | |||
Parameters | |||
---------- | |||
date : datetime.datetime object | |||
min_year : int | |||
Minimum year described in the fileset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately obvious what "described" means. And maybe "Earliest" instead of "Minimum". Maybe an extra sentence explanation?
@@ -30,6 +30,9 @@ Bug fixes | |||
:py:class:`aospy.Var` (:pull:`128`). By `Spencer Clark <https://github.com/spencerkclark>`_. | |||
- Enable calculations to be completed using data saved as a single time-slice | |||
on disk (fixes :issue:`132` through :pull:`135`). By `Spencer Clark <https://github.com/spencerkclark>`_. | |||
- Fix bug where workaround for dates outside the ``pd.Timestamp`` valid range | |||
caused a mismatch between the data loaded and the data requested (fixes | |||
:issue:`138` through :pull:`139`). By `Spencer Clark <https://github.com/spencerkclark>`_. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update that we support xarray 0.9.1
@spencerkclark great, thanks for the hard work on this! Glad we've been able to identify and address these v0.1 bugs quickly. |
Closes #138
Ended needing to play with Calc a bit in process; basically I finally went through with the edit to simplify what was the
extract_date_range_and_months
method, which would both subset the data in time and select the desired months, to just select the desired months. We don't need to select the date range there, because it is already done inDataLoader.load_variable
.