# xarray

### multi-dimensional data analysis in Python


**ACINN workshop**, Tue 07.02.2017

*Fabien Maussion*

<img src="./figures/dataset-diagram-logo.png" width="35%" align="center">


Slides: <a href="http://fabienmaussion.info/acinn_xarray_workshop">http://fabienmaussion.info/acinn_xarray_workshop</a>

Notebook: <a href="https://github.com/fmaussion/teaching/blob/master/xarray_intro_acinn/ACINN_workshop_xarray-slides.ipynb"> On GitHub</a>

In [None]:
# Ignore numpy warnings
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline
import xarray as xr
# Some defaults:
plt.rcParams['figure.figsize'] = (12, 6)  # Default plot size|
xr.set_options(display_width=64);  # same here

# xarray

<img src="./figures/dataset-diagram-logo.png" width="20%" align="right"> 

**Documentation**: http://xarray.pydata.org

**Repository**: https://github.com/pydata/xarray 

**Initial release**: 03.05.2014

**Latest release**: v0.9.1 (20.01.2017)

**53 contributors** (latest release: 24)

**Umbrellas:** [Python for data](http://pydata.org/) & [NumFOCUS](http://www.numfocus.org/) *(but no funding...)*

<img src="./figures/logopydata.png" width="17%" align="left"> 

<img src="./figures/numfocus.png" width="23%" align="right">

# numpy.array

In [None]:
import numpy as np
a = np.array([[1, 3, 9], [2, 8, 4]])
a

In [None]:
a[1, 2]

In [None]:
a.mean(axis=0)

# xarray.DataArray

In [None]:
import xarray as xr
da = xr.DataArray(a, dims=['lat', 'lon'], 
                  coords={'lon':[11, 12, 13], 'lat':[1, 2]})
da

In [None]:
da.sel(lon=13, lat=2).values

In [None]:
da.mean(dim='lat')

# Our data

<img src="./figures/dataset.png" width="50%" align="right"> 

- numeric
- multi-dimensional
- labelled
- (lots of) metadata
- sometimes (very) large

# xarray.Dataset

In [None]:
f = 'ERA-Int-MonthlyAvg-4D-TUVWZ.nc'
ds = xr.open_dataset(f)
ds

## Selection

### By value

In [None]:
ds.t.sel(month=8, level=850)

### By index 

In [None]:
ds.t.isel(month=7, level=11)

### By "wait... where is Innsbruck again?"

In [None]:
ds.t.sel(level=1001, latitude=47.26, longitude=11.38, method='nearest')

### The "old way"

In [None]:
ds.t[7, 11, :, :]

## Operations

### Aggregation

In [None]:
ds.u.mean(dim=['month', 'longitude']).plot.contourf(levels=13)
plt.ylim([1000, 100]);

### And other kind of things

In [None]:
u_avg = ds.u.mean(dim=['month', 'longitude'])
u_avg_masked = u_avg.where(u_avg > 12)
u_avg_masked.plot.contourf(levels=13)
plt.ylim([1000, 100]);

# Arithmetic

### Broadcasting

<img src="./figures/broadcast.png" width="50%" align="left"> 

In [None]:
a = xr.DataArray(np.arange(3), dims='time', 
                 coords={'time':np.arange(3)})
b = xr.DataArray(np.arange(4), dims='space', 
                 coords={'space':np.arange(4)})
a + b

### Alignment

<img src="./figures/align.png" width="50%" align="left"> 

In [None]:
a = xr.DataArray(np.arange(3), dims='time', 
                 coords={'time':np.arange(3)})
b = xr.DataArray(np.arange(5), dims='time', 
                 coords={'time':np.arange(5)+1})
a + b

# Plotting

### 1-d

In [None]:
ts = ds.t.sel(level=1001, latitude=47.26, longitude=11.38, method='nearest')
ts.plot();

### On maps

In [None]:
import cartopy.crs as ccrs
ax = plt.axes(projection=ccrs.Robinson())
ds.z.sel(level=1000, month=8).plot(ax=ax, transform=ccrs.PlateCarree());
ax.coastlines();

# (Big) data: multiple files

Opening all files in a directory...

In [None]:
mfs = '/home/mowglie/disk/Data/Gridded/GPM/3BDAY_sorted/*.nc'
dsmf = xr.open_mfdataset(mfs)

... results in a consolidated dataset ...

In [None]:
dsmf

... on which all usual operations can be applied:

In [None]:
dsmf = dsmf.sel(time='2015')
dsmf

Yes, even computations!

In [None]:
ts = dsmf.precipitationCal.mean(dim=['lon', 'lat'])
ts

Computations are done "lazily" 

No actual computation has happened yet:

In [None]:
ts.data

But they can be triggered:

In [None]:
ts = ts.load()
ts

For more information: http://xarray.pydata.org/en/stable/dask.html

In [None]:
ts.plot();
ts.rolling(time=31, center=True).mean().plot();

# Extensions

### Example: EOFS 

Taken from: http://ajdawson.github.io/eofs/examples/nao_xarray.html

In [None]:
from eofs.xarray import Eof
from eofs.examples import example_data_path

# Read geopotential height data using the xarray module
filename = example_data_path('hgt_djf.nc')
z_djf = xr.open_dataset(filename)['z']

# Compute anomalies by removing the time-mean.
z_djf = z_djf - z_djf.mean(dim='time')

# Create an EOF solver to do the EOF analysis.
coslat = np.cos(np.deg2rad(z_djf.coords['latitude'].values)).clip(0., 1.)
solver = Eof(z_djf, weights=np.sqrt(coslat)[..., np.newaxis])

# Get the leading EOF
eof1 = solver.eofsAsCovariance(neofs=1)

In [None]:
# Leading EOF expressed as covariance in the European/Atlantic domain
ax = plt.axes(projection=ccrs.Orthographic(central_longitude=-20, central_latitude=60))
ax.coastlines() ; ax.set_global()
eof1[0, 0].plot.contourf(ax=ax, levels=np.linspace(-75, 75, 11), 
                         cmap=plt.cm.RdBu_r, add_colorbar=False,
                         transform=ccrs.PlateCarree())
ax.set_title('EOF1 expressed as covariance', fontsize=16);

# Salem

- Adds geolocalized operations to xarray
- Adds projection transformations
- Adds WRF support

http://salem.readthedocs.io/en/latest/

Try it out:

```
pip install salem
```

## Plotting

In [None]:
# importing salem adds a new "toolbox" to xarray objects
import salem

In [None]:
pday = dsmf.precipitationCal.sel(time='2015-02-01')
cm = pday.salem.quick_map(cmap='Blues', vmax=100);

## Subsetting

In [None]:
shdf = salem.read_shapefile(salem.get_demo_file('world_borders.shp'))
shdf = shdf.loc[shdf['CNTRY_NAME'].isin(['Peru'])]

In [None]:
dsmfperu = dsmf.salem.subset(shape=shdf, margin=10)

In [None]:
pday = dsmfperu.precipitationCal.sel(time='2015-02-01')
cm = pday.salem.quick_map(cmap='Blues', vmax=100);

## Regions of interest

In [None]:
dsmfperu = dsmfperu.salem.roi(shape=shdf)

In [None]:
pday = dsmfperu.precipitationCal.sel(time='2015-02-01')
cm = pday.salem.quick_map(cmap='Blues', vmax=100);

## ... all xarray operations continue to apply

In [None]:
prpc_a = dsmfperu.precipitationCal.sum(dim=['time']).load()

In [None]:
prpc_a.salem.quick_map(cmap='Blues', vmax=5000);

## WRF output files

Problems:
- not CF compliant (e.g. timestamp)
- staggered grids
- not all variables available (e.g. moisture transport)
- large

### Example file

In [None]:
f = 'wrfpost_d01_2005-09-21_00-00-00_25h.nc'

In [None]:
ds = xr.open_dataset(f)
ds

### Objectives

- "clean" the file to make it more appealing
- automatic projection parsing
- automatic unstaggering
- pressure-levels interpolation
- diagnostic variables
- ...

In [None]:
wrf = salem.open_wrf_dataset(f)
wrf

### Diagnostic variables

In [None]:
wrf.T2C.mean(dim='time', keep_attrs=True).salem.quick_map();

### 3D interpolation

In [None]:
ws_h = wrf.isel(time=5).salem.wrf_zlevel('WS', levels=10000.)
ws_h.salem.quick_map(cmap='Reds');

### ... and more!

especially if I get some help ;-)

Repository: https://github.com/fmaussion/salem

# Final remarks

- xarray relies on pandas, which is one of the most widely used scientific python tools
- their documentation is excellent
- both libraries require a certain learning investment, but this time is well spent
- there is potential for "ACINN homegrown" tools based on these libs 