# Regrid xarray Dataset with multiple variables

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import xesmf as xe

Starting v0.2.0, xESMF is able to take `xarray.Dataset` as input data, and automatically loop over all variables.

## A simple example

### Prepare input data

In [2]:
ds = xr.tutorial.open_dataset('air_temperature')
ds  # air temperature in Kelvin

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

In [3]:
# input dataset can contain variables of different shapes (e.g. 2D, 3D, 4D), as long as horizontal shapes are the same. 
ds['celsius'] = ds['air'] - 273.15  # Kelvin -> celsius
ds['slice'] = ds['air'].isel(time=0)
ds

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.49 296.19 295.69
    celsius  (time, lat, lon) float32 -31.949997 -30.649994 ... 22.540009
    slice    (lat, lon) float32 241.2 242.5 243.5 244.0 ... 296.9 296.79 296.6
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

### Build regridder

In [4]:
ds_out = xr.Dataset({'lat': (['lat'], np.arange(16, 75, 1.0)),
                     'lon': (['lon'], np.arange(200, 330, 1.5)),
                    }
                   )

regridder = xe.Regridder(ds, ds_out, 'bilinear')
regridder.clean_weight_file()
regridder

Create weight file: bilinear_25x53_59x87.nc
Remove file bilinear_25x53_59x87.nc


xESMF Regridder 
Regridding algorithm:       bilinear 
Weight filename:            bilinear_25x53_59x87.nc 
Reuse pre-computed weights? False 
Input grid shape:           (25, 53) 
Output grid shape:          (59, 87) 
Output grid dimension name: ('lat', 'lon') 
Periodic in longitude?      False

### Apply to data

In [5]:
# the entire dataset can be processed at once
ds_out = regridder(ds)
ds_out

using dimensions ('lat', 'lon') from data variable air as the horizontal dimensions for this dataset.


<xarray.Dataset>
Dimensions:  (lat: 59, lon: 87, time: 2920)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * lon      (lon) float64 200.0 201.5 203.0 204.5 ... 324.5 326.0 327.5 329.0
  * lat      (lat) float64 16.0 17.0 18.0 19.0 20.0 ... 70.0 71.0 72.0 73.0 74.0
Data variables:
    air      (time, lat, lon) float64 296.1 296.4 296.6 ... 240.9 241.0 241.5
    celsius  (time, lat, lon) float64 22.98 23.24 23.49 ... -32.24 -32.14 -31.7
    slice    (lat, lon) float64 296.1 296.4 296.6 296.9 ... 233.8 235.4 237.5
Attributes:
    regrid_method:  bilinear

In [6]:
# verify that the result is the same as regridding each variable one-by-one
for k in ds.data_vars:
    print(k, ds_out[k].equals(regridder(ds[k])))

air True
celsius True
slice True


## Invalid dimension orderings to avoid

xESMF assumes the horizontal dimensions are the last/rightmost dimensions, which matches the convention of most NetCDF data.

In [7]:
# xESMF doesn't like horizontal dimensions to be the first/leftmost dimensions
ds_bad = ds.copy()
ds_bad['air'] = ds_bad['air'].transpose()
ds_bad

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (lon, lat, time) float32 241.2 242.09999 ... 295.19 295.69
    celsius  (time, lat, lon) float32 -31.949997 -30.649994 ... 22.540009
    slice    (lat, lon) float32 241.2 242.5 243.5 244.0 ... 296.9 296.79 296.6
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

In [8]:
# regridder(ds_bad)  # comment this line to see the error message

In [9]:
# besides ordering dimensions properly, another simple fix is to drop bad variables
regridder(ds_bad.drop('air'))  

using dimensions ('lat', 'lon') from data variable celsius as the horizontal dimensions for this dataset.


<xarray.Dataset>
Dimensions:  (lat: 59, lon: 87, time: 2920)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * lon      (lon) float64 200.0 201.5 203.0 204.5 ... 324.5 326.0 327.5 329.0
  * lat      (lat) float64 16.0 17.0 18.0 19.0 20.0 ... 70.0 71.0 72.0 73.0 74.0
Data variables:
    celsius  (time, lat, lon) float64 22.98 23.24 23.49 ... -32.24 -32.14 -31.7
    slice    (lat, lon) float64 296.1 296.4 296.6 296.9 ... 233.8 235.4 237.5
Attributes:
    regrid_method:  bilinear