WIP BUG don't neglect grid attrs saved as dims w/out coords #140

spencerahill · 2017-02-03T05:09:37Z

Closes #137.

@spencerkclark this was a little thornier than we originally thought. Ultimately for this data bnds was a dim without a coord, and this was missed. In addition, it was a dim without a coord, and this caused the subsequent isel/sel logic to crash. I don't fully understand that, but a solution I'm happy with is to just create a coord for any dims that don't have them.

Before I go any further, would you mind assessing if this seems like an OK approach?

I also decided to switch from NV_STR to BOUNDS_STR for the sake of readability and snuck in one xarray.to_dataset compat update for 0.9.1 (it gave a warning).

Poking at similar places as #139...will merge that first and then rebase as necessary for this.

Still to do:

Tests
What's new

MAINT switch name of NV_STR to BOUNDS_STR COMPAT xr.DataArray.to_dataset call for 0.9.1

spencerahill · 2017-02-03T05:16:20Z

Hmm this is causing some calc tests to fail. I'm out of steam for tonight but will return to these tomorrow. Example:

---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:root:Initializing Calc instance: Calc object: sphum, example_proj, example_model, example_run (Fri Feb  3 00:14:18 2017)
INFO:root:Getting input data: Var instance "sphum" (Fri Feb  3 00:14:18 2017)
_________________________________________________________________________ TestCalc3D.test_annual_ts __________________________________________________________________________

self = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_annual_ts>

    def test_annual_ts(self):
        calc_int = CalcInterface(intvl_out='ann',
                                 dtype_out_time='ts',
                                 **self.test_params)
        calc = Calc(calc_int)
>       calc.compute()

calc       = Calc object: sphum, example_proj, example_model, example_run
calc_int   = <aospy.calc.CalcInterface object at 0x2ba599b7e450>
self       = <aospy.test.test_calc_basic.TestCalc3D testMethod=test_annual_ts>

test_calc_basic.py:46:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../calc.py:643: in compute
    self.end_date),
../calc.py:477: in _get_all_data
    for n, var in enumerate(self.variables)]
../calc.py:429: in _get_input_data
    **self.data_loader_attrs)
../data_loader.py:202: in load_variable
    ds = _prep_time_data(ds)
../data_loader.py:126: in _prep_time_data
    ds = times.numpy_datetime_workaround_encode_cf(ds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ds = <xarray.Dataset>
Dimensions:       (bounds: 2, lat: 64, lat_bounds: 65, lon: 1...bounds, time) float64 1.825e+03 1.856e+03
    time_weights  (time) float64 31.0

    def numpy_datetime_workaround_encode_cf(ds):
        """Generate CF-compliant units for out-of-range dates.

        Hack to address np.datetime64, and therefore pandas and xarray, not
        supporting dates outside the range 1677-09-21 and 2262-04-11 due to
        nanosecond precision.  See e.g.
        https://github.com/spencerahill/aospy/issues/96.

        Specifically, we coerce the data such that, when decoded, the earliest
        value starts in 1678 but with its month, day, and shorter timescales
        (hours, minutes, seconds, etc.) intact and with the time-spacing between
        values intact.

        Parameters
        ----------
        ds : xarray.Dataset

        Returns
        -------
        xarray.Dataset

        """
        time = ds[internal_names.TIME_STR]
>       units = time.attrs['units']
E       KeyError: 'units'

ds         = <xarray.Dataset>
Dimensions:       (bounds: 2, lat: 64, lat_bounds: 65, lon: 1...bounds, time) float64 1.825e+03 1.856e+03
    time_weights  (time) float64 31.0
time       = <xarray.DataArray 'time' (time: 1)>
array([ 1841.])
Coordinates:
  * time     (time) float64 1.841e+03

../utils/times.py:196: KeyError

spencerkclark

@spencerahill I think this fix looks good to me. I agree renaming NV_STR to BOUNDS_STR makes things much more readable.

spencerkclark · 2017-02-03T12:24:34Z

aospy/data_loader.py

        if data_coord_name:
            data = data.rename({data_coord_name.pop(): name_int})
+            # Force all dimensions to have coordinates.
+            if not data[name_int].coords:
+                data = data.assign_coords(**{name_int: data[name_int].values})


The tests that fail are for the case where time is a scalar coordinate. Therefore it gets caught in this if statement as well. I think there are two options here that would fix this:

Change the if statement to also check if the key is a dimension in the Dataset; this will prevent a scalar time coordinate from being caught in this if statement (since it is not a dimension in the Dataset in that case):

if not data[name_int].coords and name_int in data.dims: data = data.assign_coords(**{name_int: data[name_int].values})

Don't change the if statement (allow a scalar time to be caught), but be sure to copy over the attrs dictionary to the new coordinate (this will carry over the 'units' attribute and prevent the KeyError downstream):

if not data[name_int].coords: attrs = data[name_int].attrs data = data.assign_coords(**{name_int: data[name_int].values}) data[name_int].attrs = attrs

Take your pick out of those solutions; perhaps the safest would actually be to combine them. I don't see any harm copying over the attrs dictionary, and it's probably best not to modify scalar coordinates that we don't explicitly need to.

You can implement option (2) in an even simpler manner (rather than using .values use the full object):

if not data[name_int].coords: data = data.assign_coords(**{name_int: data[name_int]})

Thanks! Super useful. I agree, we'll do both and use your simple version of (2).

spencerkclark · 2017-02-03T12:32:48Z

aospy/test/test_data_loader.py

@@ -66,6 +66,20 @@ def test_rename_grid_attrs_ds(self):
        ds = rename_grid_attrs(self.ds)
        assert LAT_STR in ds

+    def test_rename_grid_attrs_dim_no_coord(self):


Whatever you decide to do in rename_grid_attrs just make sure to update the test to reflect the logic change (e.g. make sure attributes are copied over and / or scalar grid attribute DataArrays that are not associated with dimensions are not modified).

spencerahill · 2017-02-03T18:50:31Z

@spencerkclark ready for another review

spencerkclark

@spencerahill a few very minor things, but this looks pretty much ready to go!

spencerkclark · 2017-02-03T18:54:30Z

aospy/data_loader.py

-    add missing coordinates from Model objects.
+    Search all of the dataset's coords and dims looking for matches to known
+    grid attribute names; any that are found subsequently get renamed to the
+    aospy name as specified in aospy.internal_names.GRID_ATTRS.


What are your thoughts on formatting? Should we use double back-ticks on aospy.internal_names.GRID_ATTRS?

spencerkclark · 2017-02-03T18:54:59Z

aospy/test/test_data_loader.py

+        ds[phalf_dim] = 4
+        ds = ds.set_coords(phalf_dim)
+        result = rename_grid_attrs(ds)
+        assert result[phalf_dim] == ds[phalf_dim]


Can we use xr.testing.assert_identical here?

spencerkclark · 2017-02-03T18:55:23Z

aospy/test/test_data_loader.py

+        ds_orig = self.ds.copy()
+        ds_orig[self.ALT_LAT_STR].attrs = orig_attrs
+        ds = rename_grid_attrs(ds_orig)
+        assert ds[LAT_STR].attrs == orig_attrs


Should we use self.assertEqual here?

spencerahill · 2017-02-03T19:20:14Z

@spencerkclark thanks, those are all good. I also renamed rename_grid_attrs to be more descriptive. Will merge when test pass (besides AppVeyor; will fix that later).

BUG don't neglect grid attrs saved as dims w/out coords

70a71c1

MAINT switch name of NV_STR to BOUNDS_STR COMPAT xr.DataArray.to_dataset call for 0.9.1

spencerahill added this to the v0.1.1 milestone Feb 3, 2017

spencerahill added bug Calc data loaders dates/times utils labels Feb 3, 2017

TEST add test for rename_grid_attrs w/ non-coord dim

5b3c917

spencerkclark reviewed Feb 3, 2017

View reviewed changes

Spencer Hill added 3 commits February 3, 2017 10:23

BUG don't force scalar dims to have coords when renaming

085dbc0

Merge branch 'develop' into nv-bugfix

e99b35f

DOC update what's new

50820d7

spencerkclark reviewed Feb 3, 2017

View reviewed changes

rename_grid_attrs -> grid_attrs_to_aospy_names; minor test edits

e34d29b

spencerahill merged commit deca6d6 into develop Feb 3, 2017

spencerahill deleted the nv-bugfix branch February 3, 2017 19:25

This was referenced Feb 5, 2017

'nv' not found error in ensure_time_avg_has_cf_metadata #137

Closed

Cleanup logic for isel/drop on dims w/ and w/out coords #142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP BUG don't neglect grid attrs saved as dims w/out coords #140

WIP BUG don't neglect grid attrs saved as dims w/out coords #140

spencerahill commented Feb 3, 2017 •

edited

spencerahill commented Feb 3, 2017

spencerkclark left a comment

spencerkclark Feb 3, 2017 •

edited

spencerkclark Feb 3, 2017

spencerahill Feb 3, 2017

spencerkclark Feb 3, 2017

spencerahill commented Feb 3, 2017

spencerkclark left a comment

spencerkclark Feb 3, 2017

spencerkclark Feb 3, 2017

spencerkclark Feb 3, 2017

spencerahill commented Feb 3, 2017

WIP BUG don't neglect grid attrs saved as dims w/out coords #140

WIP BUG don't neglect grid attrs saved as dims w/out coords #140

Conversation

spencerahill commented Feb 3, 2017 • edited

spencerahill commented Feb 3, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017 • edited

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017

Choose a reason for hiding this comment

spencerahill Feb 3, 2017

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017

Choose a reason for hiding this comment

spencerahill commented Feb 3, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017

Choose a reason for hiding this comment

spencerkclark Feb 3, 2017

Choose a reason for hiding this comment

spencerahill commented Feb 3, 2017

spencerahill commented Feb 3, 2017 •

edited

spencerkclark Feb 3, 2017 •

edited