Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in numpy date range workaround #139

Merged

Conversation

spencerkclark
Copy link
Collaborator

@spencerkclark spencerkclark commented Feb 2, 2017

Closes #138

Ended needing to play with Calc a bit in process; basically I finally went through with the edit to simplify what was the extract_date_range_and_months method, which would both subset the data in time and select the desired months, to just select the desired months. We don't need to select the date range there, because it is already done in DataLoader.load_variable.

  • Tests added for fix
  • What's new entry

This fixes a bug in which the data loaded would always start from the first specified file, rather than the actual start date the user wanted.
aospy/calc.py Outdated

# What was this used for (it isn't used anymore)?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill I've commented out this code here; it appears that self.date_range is never used by Calc. In addition, this means that we never use utils.times.create_monthly_time_array. Should we delete this code?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's get rid of it.

@@ -112,7 +112,7 @@ def _prep_time_data(ds):
Dataset
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring to reflect the new return of Dataset, min_year

@spencerkclark
Copy link
Collaborator Author

@spencerahill thanks for the initial review -- let me know if you have any further comments.

@@ -369,5 +371,22 @@ def test_data_name_gfdl_seasonal(self):
self.assertEqual(result, expected)


class LoadVariableTestCase(unittest.TestCase):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill I think net the coverage decrease is because I removed a number of lines of code that would be run (but never actually did anything meaningful). This is the test case that should definitively make sure this bug is fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Test looks good.

AVG_START_DATE_STR = 'avg_start_date'
AVG_END_DATE_STR = 'avg_end_date'
AVG_START_DATE_STR = 'data_start_date'
AVG_END_DATE_STR = 'data_end_date'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill I think these attributes could use a name change. I'm not sure what the best option would be, but in this case, I think the repr of computed data is a little misleading. E.g. this is a calculation that was done on years 3 through 6 on some idealized model output:

<xarray.Dataset>
Dimensions:               (lat: 64, lon: 128, pfull: 30)
Coordinates:
  * lon                   (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 ...
  * lat                   (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
  * pfull                 (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 ...
    avg_start_date        datetime64[ns] 1678-01-01
    avg_end_date          datetime64[ns] 1684-01-01
    sfc_area              (lat, lon) float64 3.553e+09 3.553e+09 3.553e+09 ...
    land_mask             (lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Data variables:
    equiv_potential_temp  (pfull, lat, lon) float64 999.5 999.6 999.6 999.7 ...

These attributes make it seem as though the length of the time period the average was computed for was 6 years, when this is not the case. These attributes describe the full length of the data available to the DataLoader (not what is ultimately subset). I'm not super happy with the names I have here. Do you have any better ideas?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I agree that avg is bad. What about just start_date and end_date?

There's a bigger issue though: we should really be encoding into the datasets the start and end dates of the actual calculation, not the span of data from which it was loaded.

How difficult do you think that would be? If not bad, you can include here. Otherwise let's create a new issue for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes serve an important internal purpose, so we shouldn't change what they represent (they fix #72). I'll add new ones to track the actual subset range.

I've renamed these 'raw_data_start_date' and 'raw_data_end_date', and I've called the new attributes 'subset_start_date' and 'subset_end_date'. Let me know if that's a little clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New repr for same calculation:

<xarray.Dataset>
Dimensions:               (lat: 64, lon: 128, pfull: 30)
Coordinates:
  * lon                   (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 ...
  * lat                   (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 ...
  * pfull                 (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 ...
    raw_data_start_date   datetime64[ns] 1678-01-01
    raw_data_end_date     datetime64[ns] 1684-01-01
    subset_start_date     datetime64[ns] 1680-01-01
    subset_end_date       datetime64[ns] 1683-12-31
    sfc_area              (lat, lon) float64 3.553e+09 3.553e+09 3.553e+09 ...
    land_mask             (lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Data variables:
    equiv_potential_temp  (pfull, lat, lon) float64 999.5 999.6 999.6 999.7 ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect. Thanks!

def extract_date_range_and_months(time, start_date, end_date, months):
"""Extract times within a specified date range and months of the year.
def extract_months(time, months):
"""Extract times within specified months of the year.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space between specified and months

@spencerahill
Copy link
Owner

@spencerkclark I've finished looking through; nothing to add beyond my latest in-line comments.

AppVeyor fail is something weird not related to your edits, so let's ignore. I'll look into if it keeps happening on other PRs.

@spencerahill
Copy link
Owner

spencerahill commented Feb 3, 2017

@spencerkclark do you feel like this is good to go now? (after the last couple nits I just added)

@@ -155,18 +155,20 @@ def numpy_datetime_range_workaround(date):
Parameters
----------
date : datetime.datetime object
min_year : int
Minimum year described in the fileset
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious what "described" means. And maybe "Earliest" instead of "Minimum". Maybe an extra sentence explanation?

@@ -30,6 +30,9 @@ Bug fixes
:py:class:`aospy.Var` (:pull:`128`). By `Spencer Clark <https://github.com/spencerkclark>`_.
- Enable calculations to be completed using data saved as a single time-slice
on disk (fixes :issue:`132` through :pull:`135`). By `Spencer Clark <https://github.com/spencerkclark>`_.
- Fix bug where workaround for dates outside the ``pd.Timestamp`` valid range
caused a mismatch between the data loaded and the data requested (fixes
:issue:`138` through :pull:`139`). By `Spencer Clark <https://github.com/spencerkclark>`_.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update that we support xarray 0.9.1

@spencerahill
Copy link
Owner

@spencerkclark great, thanks for the hard work on this! Glad we've been able to identify and address these v0.1 bugs quickly.

@spencerahill spencerahill merged commit 676b58f into spencerahill:develop Feb 3, 2017
@spencerkclark spencerkclark deleted the fix-numpy-workaround branch February 3, 2017 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants