Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP BUG don't neglect grid attrs saved as dims w/out coords #140

Merged
merged 6 commits into from Feb 3, 2017

Conversation

spencerahill
Copy link
Owner

@spencerahill spencerahill commented Feb 3, 2017

Closes #137.

@spencerkclark this was a little thornier than we originally thought. Ultimately for this data bnds was a dim without a coord, and this was missed. In addition, it was a dim without a coord, and this caused the subsequent isel/sel logic to crash. I don't fully understand that, but a solution I'm happy with is to just create a coord for any dims that don't have them.

Before I go any further, would you mind assessing if this seems like an OK approach?

I also decided to switch from NV_STR to BOUNDS_STR for the sake of readability and snuck in one xarray.to_dataset compat update for 0.9.1 (it gave a warning).

Poking at similar places as #139...will merge that first and then rebase as necessary for this.

Still to do:

  • Tests
  • What's new

MAINT switch name of NV_STR to BOUNDS_STR
COMPAT xr.DataArray.to_dataset call for 0.9.1
@spencerahill
Copy link
Owner Author

Hmm this is causing some calc tests to fail. I'm out of steam for tonight but will return to these tomorrow. Example:

---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:root:Initializing Calc instance: Calc object: sphum, example_proj, example_model, example_run (Fri Feb  3 00:14:18 2017)
INFO:root:Getting input data: Var instance "sphum" (Fri Feb  3 00:14:18 2017)
_________________________________________________________________________ TestCalc3D.test_annual_ts __________________________________________________________________________

self = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_annual_ts>

    def test_annual_ts(self):
        calc_int = CalcInterface(intvl_out='ann',
                                 dtype_out_time='ts',
                                 **self.test_params)
        calc = Calc(calc_int)
>       calc.compute()

calc       = Calc object: sphum, example_proj, example_model, example_run
calc_int   = <aospy.calc.CalcInterface object at 0x2ba599b7e450>
self       = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_annual_ts>

test_calc_basic.py:46:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../calc.py:643: in compute
    self.end_date),
../calc.py:477: in _get_all_data
    for n, var in enumerate(self.variables)]
../calc.py:429: in _get_input_data
    **self.data_loader_attrs)
../data_loader.py:202: in load_variable
    ds = _prep_time_data(ds)
../data_loader.py:126: in _prep_time_data
    ds = times.numpy_datetime_workaround_encode_cf(ds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ds = <xarray.Dataset>
Dimensions:       (bounds: 2, lat: 64, lat_bounds: 65, lon: 1...bounds, time) float64 1.825e+03 1.856e+03
    time_weights  (time) float64 31.0

    def numpy_datetime_workaround_encode_cf(ds):
        """Generate CF-compliant units for out-of-range dates.

        Hack to address np.datetime64, and therefore pandas and xarray, not
        supporting dates outside the range 1677-09-21 and 2262-04-11 due to
        nanosecond precision.  See e.g.
        https://github.com/spencerahill/aospy/issues/96.

        Specifically, we coerce the data such that, when decoded, the earliest
        value starts in 1678 but with its month, day, and shorter timescales
        (hours, minutes, seconds, etc.) intact and with the time-spacing between
        values intact.

        Parameters
        ----------
        ds : xarray.Dataset

        Returns
        -------
        xarray.Dataset

        """
        time = ds[internal_names.TIME_STR]
>       units = time.attrs['units']
E       KeyError: 'units'

ds         = <xarray.Dataset>
Dimensions:       (bounds: 2, lat: 64, lat_bounds: 65, lon: 1...bounds, time) float64 1.825e+03 1.856e+03
    time_weights  (time) float64 31.0
time       = <xarray.DataArray 'time' (time: 1)>
array([ 1841.])
Coordinates:
  * time     (time) float64 1.841e+03

../utils/times.py:196: KeyError

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill I think this fix looks good to me. I agree renaming NV_STR to BOUNDS_STR makes things much more readable.

if data_coord_name:
data = data.rename({data_coord_name.pop(): name_int})
# Force all dimensions to have coordinates.
if not data[name_int].coords:
data = data.assign_coords(**{name_int: data[name_int].values})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests that fail are for the case where time is a scalar coordinate. Therefore it gets caught in this if statement as well. I think there are two options here that would fix this:

  1. Change the if statement to also check if the key is a dimension in the Dataset; this will prevent a scalar time coordinate from being caught in this if statement (since it is not a dimension in the Dataset in that case):
if not data[name_int].coords and name_int in data.dims:
    data = data.assign_coords(**{name_int: data[name_int].values})
  1. Don't change the if statement (allow a scalar time to be caught), but be sure to copy over the attrs dictionary to the new coordinate (this will carry over the 'units' attribute and prevent the KeyError downstream):
if not data[name_int].coords:
    attrs = data[name_int].attrs
    data = data.assign_coords(**{name_int: data[name_int].values})
    data[name_int].attrs = attrs

Take your pick out of those solutions; perhaps the safest would actually be to combine them. I don't see any harm copying over the attrs dictionary, and it's probably best not to modify scalar coordinates that we don't explicitly need to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement option (2) in an even simpler manner (rather than using .values use the full object):

if not data[name_int].coords:
    data = data.assign_coords(**{name_int: data[name_int]})

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Super useful. I agree, we'll do both and use your simple version of (2).

@@ -66,6 +66,20 @@ def test_rename_grid_attrs_ds(self):
ds = rename_grid_attrs(self.ds)
assert LAT_STR in ds

def test_rename_grid_attrs_dim_no_coord(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you decide to do in rename_grid_attrs just make sure to update the test to reflect the logic change (e.g. make sure attributes are copied over and / or scalar grid attribute DataArrays that are not associated with dimensions are not modified).

@spencerahill
Copy link
Owner Author

@spencerkclark ready for another review

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill a few very minor things, but this looks pretty much ready to go!

add missing coordinates from Model objects.
Search all of the dataset's coords and dims looking for matches to known
grid attribute names; any that are found subsequently get renamed to the
aospy name as specified in aospy.internal_names.GRID_ATTRS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on formatting? Should we use double back-ticks on aospy.internal_names.GRID_ATTRS?

ds[phalf_dim] = 4
ds = ds.set_coords(phalf_dim)
result = rename_grid_attrs(ds)
assert result[phalf_dim] == ds[phalf_dim]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use xr.testing.assert_identical here?

ds_orig = self.ds.copy()
ds_orig[self.ALT_LAT_STR].attrs = orig_attrs
ds = rename_grid_attrs(ds_orig)
assert ds[LAT_STR].attrs == orig_attrs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use self.assertEqual here?

@spencerahill
Copy link
Owner Author

@spencerkclark thanks, those are all good. I also renamed rename_grid_attrs to be more descriptive. Will merge when test pass (besides AppVeyor; will fix that later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants