Skip to content

Latest commit

 

History

History
199 lines (135 loc) · 6.28 KB

xarray_acc.rst

File metadata and controls

199 lines (135 loc) · 6.28 KB

Xarray accessors

One of the main purposes of Salem is to add georeferencing tools to xarray 's data structures. These tools can be accessed via a special .salem attribute, available for both xarray.DataArray and xarray.Dataset objects after a simple import salem in your code.

Initializing the accessor

Automated projection parsing

Salem will try to understand in which projection your Dataset or DataArray is defined. For example, with a platte carree (or lon/lat) projection, Salem will know what to do based on the coordinates' names:

python

import numpy as np import xarray as xr import salem

da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['lat', 'lon'],
coords={'lat':np.linspace(0, 30, 4),

'lon':np.linspace(-20, 20, 5)})

da.salem # the accessor is an object

@savefig plot_xarray_simple.png width=80% da.salem.quick_map();

While the above should work with a majority of climate datasets (such as atmospheric reanalyses or GCM output), certain NetCDF files will have a more exotic map projection requiring a dedicated parsing. There are conventions to formalise these things in the NetCDF data model, but Salem doesn't understand them yet (my impression is that they aren't widely used anyways).

Currently, Salem can deal with:

  • platte carree (or lon/lat) projections
  • WRF projections (see wrf)
  • for geotiff files only: any projection that rasterio can understand
  • virually any projection provided explicitly by the user

The logic for deciding upon the projection of a Dataset or DataArray is located in :py~salem.grid_from_dataset. If the automated parsing doesn't work, the salem accessor won't work either. In that case, you'll have to provide your own xarray_acc.custom to the data.

From geotiffs

Salem uses rasterio to open and parse geotiff files:

python

plt.rcParams['figure.figsize'] = (7, 3) f = plt.figure(figsize=(7, 3))

python

fpath = salem.get_demo_file('himalaya.tif') ds = salem.open_xr_dataset(fpath) hmap = ds.salem.get_map(cmap='topo') hmap.set_data(ds['data'])

@savefig plot_xarray_geotiff.png width=80% hmap.visualize();

Custom projection

Alternatively, Salem will understand any projection supported by pyproj. The proj info has to be provided as attribute:

python

dutm = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],

coords={'y': np.arange(3, 7)2e5, 'x': np.arange(1, 6)2e5})

psrs = 'epsg:32630' # http://spatialreference.org/ref/epsg/wgs-84-utm-zone-30n/ dutm.attrs['pyproj_srs'] = psrs

@savefig plot_xarray_utm.png width=80% dutm.salem.quick_map(interp='linear');

Using the accessor

The accessor's methods are available trough the .salem attribute.

Keeping track of attributes

Some datasets carry their georeferencing information in global attributes (WRF model output files for example). This makes it possible for Salem to determine the data's map projection. From the variables alone, however, this is not possible. This is the reason why it is recommended to use the :py~salem.open_xr_dataset and :py~salem.open_wrf_dataset function, which add an attribute to the variables automatically:

python

dsw = salem.open_xr_dataset(salem.get_demo_file('wrfout_d01.nc')) dsw.T2.pyproj_srs

Unfortunately, the DataArray attributes are lost when doing operations on them. It is the task of the user to keep track of this attribute:

python

dsw.T2.mean(dim='Time', keep_attrs=True).salem # triggers an error without keep_attrs

Reprojecting data

python

plt.rcParams['figure.figsize'] = (7, 3) f = plt.figure(figsize=(7, 3))

You can reproject a Dataset onto another one with the :py~salem.DatasetAccessor.transform function:

python

dse = salem.open_xr_dataset(salem.get_demo_file('era_interim_tibet.nc')) dsr = ds.salem.transform(dse) dsr @savefig plot_xarray_transfo.png width=80% dsr.t2m.mean(dim='time').salem.quick_map();

Currently, salem implements, the neirest neighbor (default), linear, and spline interpolation methods:

python

dsr = ds.salem.transform(dse, interp='spline') @savefig plot_xarray_transfo_spline.png width=80% dsr.t2m.mean(dim='time').salem.quick_map();

The accessor's map transformation machinery is handled by the :py~salem.Grid class internally. See gis for more information.

Subsetting data

The :py~salem.DatasetAccessor.subset function allows you to subset your datasets according to (georeferenced) vector or raster data, for example based on shapely geometries (e.g. polygons), other grids, or shapefiles:

python

shdf = salem.read_shapefile(salem.get_demo_file('world_borders.shp')) shdf = shdf.loc[shdf['CNTRY_NAME'] == 'Nepal'] # remove other countries dsr = dsr.salem.subset(shape=shdf, margin=10) @savefig plot_xarray_subset_out.png width=80% dsr.t2m.mean(dim='time').salem.quick_map();

Regions of interest

While subsetting selects the optimal rectangle over your region of interest, sometimes you also want to maskout unrelevant data, too. This is done with the :py~salem.DatasetAccessor.roi tool:

python

dsr = dsr.salem.roi(shape=shdf) @savefig plot_xarray_roi_out.png width=80% dsr.t2m.mean(dim='time').salem.quick_map();