Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regrid_dataset broken with xarray=0.16.1 #36

Closed
slevang opened this issue Oct 2, 2020 · 8 comments
Closed

regrid_dataset broken with xarray=0.16.1 #36

slevang opened this issue Oct 2, 2020 · 8 comments

Comments

@slevang
Copy link
Contributor

slevang commented Oct 2, 2020

First off, thanks everyone for keeping this great package going!

Unfortunately, regridding of chunked/dask datasets seems to have broken with the latest 0.16.1 xarray release, maybe related to #4060?

Minimal Example

import xarray as xr
import xesmf as xe

ds = xr.tutorial.open_dataset('air_temperature').chunk({'time':1})
ds['foo'] = ds.air # adding a second field breaks regridder call
grid = xr.Dataset({'lat': (['lat'], ds.lat[::4]), 'lon': (['lon'], ds.lon[::4])})
regridder = xe.Regridder(ds, grid, 'bilinear')
ds_out = regridder(ds)

Fails with:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    ds_out = regridder(ds)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xesmf/frontend.py", line 397, in __call__
    return self.regrid_dataset(indata, keep_attrs=keep_attrs)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xesmf/frontend.py", line 515, in regrid_dataset
    ds_out = xr.apply_ufunc(
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xarray/core/computation.py", line 1092, in apply_ufunc
    return apply_dataset_vfunc(
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xarray/core/computation.py", line 410, in apply_dataset_vfunc
    result_vars = apply_dict_of_variables_vfunc(
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xarray/core/computation.py", line 356, in apply_dict_of_variables_vfunc
    result_vars[name] = func(*variable_args)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.8/site-packages/xarray/core/computation.py", line 653, in apply_variable_ufunc
    raise ValueError(
ValueError: dimension 'dim1' in 'output_sizes' must correspond to output_core_dims

Regridding the tutorial dataset with single field air works fine, but adding a second variable raises this error with dimension names.

@jminsk-cc
Copy link

Is anyone working on trying to pin the xarray version to <=1.15? Just wondering since I spent a few hours debugging this exact issue. 😄

@jhamman
Copy link
Member

jhamman commented Nov 9, 2020

Just ran into this same issue as described above. I can also confirm that downgrading to 0.16.0 fixes the issue. Xarray=0.16.1 did have a fair bit of churn around apply_ufunc (pydata/xarray#3890, pydata/xarray#4060, pydata/xarray#4391, pydata/xarray#4392), so I guess I'm not surprised to find things broke here.

@rabernat
Copy link
Member

cc @chiaral

@dcherian
Copy link
Contributor

@kmuehlbauer probably knows whats going wrong.

@kmuehlbauer
Copy link

@dcherian Thanks for the ping. The error looks familiar. I'll have to look inside to be sure. Coming back the next day.

@mathause
Copy link
Contributor

xESMF is not required to reproduce. So this is an upstream issue:

import xarray as xr

ds = xr.tutorial.open_dataset('air_temperature').chunk({'time':1})
ds['foo'] = ds.air # second field required

def func(da):
    return da[:, 1:4, 1:7]

xr.apply_ufunc(
    func,
    ds,
    dask="parallelized",
    input_core_dims=[["lon", "lat"]],
    output_core_dims=[["lat_new", "lon_new"]],
    output_sizes={"lat_new": 3, "lon_new": 6},
)

This most likely happens because dask_gufunc_kwargs.pop("output_sizes", {}) (xarray/core/computation.py#L670) so output_sizes is no longer available when apply_variable_ufunc is called for the second variable.

@kmuehlbauer
Copy link

Indeed, this is a bug upstream.

A bit of background:

dask.apply_gufunc expects the output_sizes mapping with generic dimension naming (dim0, dim1 etc). Thus we have to remap the naming within xr.apply_ufunc and change output_sizes accordingly. Obviously we do not test this for 2 variables. If we had, we would have found, that we need to make a copy of dask_gufunc_kwargs before remapping takes place.

I'll open a PR upstream to fix this.

@kmuehlbauer
Copy link

see pydata/xarray#4576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants