Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for bug w/ len-1 times #269

Merged
merged 8 commits into from
Apr 27, 2018
Merged

Conversation

spencerahill
Copy link
Owner

Closes #268

What's happening is that, with xarray 10.3, if time is length-1, then time_bounds, time_weights, and any data variables aren't being indexed by it. I implemented the hack below that seems to fix it.

However, this feels really clunky, and I suspect there's a much more elegant way of doing it. @spencerkclark, do you have any ideas? The frustrating part was that I couldn't just use DataArray.assign_coords, because of a "ValueError: cannot add coordinates with new dimensions to a DataArray"

I'll wait to add regression tests etc. until we converge on the overall logic.

@spencerkclark
Copy link
Collaborator

@spencerahill thanks for looking into this! Following up on my comment in #268, I think the solution is to change times.ensure_time_as_dim to the following:

def ensure_time_as_dim(ds):
    TIME_INDEXED_VARS = set(ds.data_vars).union({TIME_WEIGHTS_STR, TIME_BOUNDS_STR})
    TIME_INDEXED_VARS = TIME_INDEXED_VARS.intersection(ds.variables)
    for name in TIME_INDEXED_VARS:
        if TIME_STR not in ds[name].dims:
            da = ds[name].expand_dims(TIME_STR)
            da[TIME_STR] = ds[TIME_STR]
            ds[name] = da
    return ds

If we use expand_dims here we can do away with times.convert_scalar_to_indexable_coord. I tested this locally, and it seems to do the trick.

@spencerkclark
Copy link
Collaborator

spencerkclark commented Apr 26, 2018

Note when testing above I only tested test_calc_basic.py. It seems the above implementation breaks test_utils_times.test_ensure_time_as_dim. There is a simple fix which is to do the following:

def ensure_time_as_dim(ds):
    if TIME_STR not in ds[TIME_STR].dims:
        ds[TIME_STR] = ds[TIME_STR].expand_dims(TIME_STR)

    time_indexed_vars = set(ds.data_vars).union(set(TIME_VAR_STRS))
    time_indexed_vars = time_indexed_vars.intersection(ds.variables)
    for name in time_indexed_vars:
        if TIME_STR not in ds[name].dims:
            da = ds[name].expand_dims(TIME_STR)
            da[TIME_STR] = ds[TIME_STR]
            ds[name] = da
    return ds

That said, we could consider modifying the test, because I think as of Xarray 0.10.3 we are guaranteed that TIME_STR will be a dimension for ds[TIME_STR], since we call xr.open_mfdataset with concat_dim=TIME_STR.

@spencerahill
Copy link
Owner Author

Thanks @spencerkclark, that was super helpful. I slightly modified your last version, and now all tests pass. Let me know your take.

as of Xarray 0.10.3 we are guaranteed that TIME_STR will be a dimension for ds[TIME_STR], since we call xr.open_mfdataset with concat_dim=TIME_STR.

This is a good point. So what I have implemented may be slightly overkill. But I'm not sure it's worth the effort to spend much more time to prune this down. For one, any performance hit will likely be small, since by definition this arises only if there is a single time value.

@spencerahill
Copy link
Owner Author

Note that the failures above are all from #259; the actual relevant tests are passing

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure it's worth the effort to spend much more time to prune this down.

I agree, not worth the effort.

I have one minor comment on preference, but overall this looks good to me. Thanks!

for name in time_indexed_vars:
if TIME_STR not in ds[name].indexes:
da = ds[name].expand_dims(TIME_STR)
da[TIME_STR] = convert_scalar_to_indexable_coord(ds[TIME_STR])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly partial to a solution that eliminates convert_scalar_to_indexable_coord for the simple reason that it is something that Xarray supports now (so we shouldn't need to roll our own solution)

@spencerahill
Copy link
Owner Author

@spencerkclark good call. I got rid of that function, renamed the other one, and revamped the tests to more directly check what we're interested in. I think it's good to go, but I'd appreciate one last review.

time_indexed_vars = time_indexed_vars.intersection(ds.variables)
for name in time_indexed_vars:
if TIME_STR not in ds[name].indexes:
da = ds[name].expand_dims(TIME_STR)
da[TIME_STR] = ds[TIME_STR]
ds[name] = da
# Convert time weights and bounds back to coordinates from data_vars
return ds.set_coords([TIME_WEIGHTS_STR, TIME_BOUNDS_STR])
for coord in time_indexed_coords:
if coord in ds.coords:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if coord not in ds.coords? Otherwise this loop is a no-op right?

More generally though, I'm not sure if this logic is needed, because we call set_grid_attrs_as_coords directly after _prep_time_data, and all the functions called within _prep_time_data operate on Datasets:

aospy/aospy/data_loader.py

Lines 278 to 279 in 7e3e5be

ds, min_year, max_year = _prep_time_data(ds)
ds = set_grid_attrs_as_coords(ds)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if coord not in ds.coords? Otherwise this loop is a no-op right?

To cover the case where TIME_WEIGHTS_STR or TIME_BOUNDS_STR are not in the Dataset, I mean: if coord in ds.data_vars

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. For some reason while putting the tests together, I convinced myself that time_bounds and time_weights were being promoted to data_vars. But the full suite passed on my machine with this omitted, so clearly that was mistaken.

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I say go ahead and merge once the CI comes back green!

@spencerahill spencerahill merged commit 098da63 into develop Apr 27, 2018
@spencerahill spencerahill deleted the single-timestep-xr10.3-fix branch April 27, 2018 04:06
@spencerahill
Copy link
Owner Author

Thanks for the invaluable help as always @spencerkclark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants