In [1]:
%run ../00_AdvancedPythonConcepts/talktools.py

# xarray

<img src="http://xarray.pydata.org/en/stable/_images/dataset-diagram-logo.png">

Adding dimensions names and coordinate indexes to numpy’s ndarray makes many powerful array operations possible:

- Apply operations over dimensions by name: `x.sum('time')`.
- Select values by label instead of integer location: `x.loc['2014-01-01']` or `x.sel(time='2014-01-01')`.
- Mathematical operations (e.g., `x - y`) vectorize across multiple dimensions (array broadcasting) based on dimension names, not shape.
- Flexible split-apply-combine operations with groupby: `x.groupby('time.dayofyear').mean()`.
- Database like alignment based on coordinate labels that smoothly handles missing values: `x, y = xr.align(x, y, join='outer')`.
- Keep track of arbitrary metadata in the form of a Python dictionary: x.attrs.

Works with dask too.

- **DataArray**: labeled, N-dimensional array. It is an N-D generalization of a pandas.Series. 

- **Dataset** multi-dimensional, in-memory array database. It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray to the pandas.DataFrame

http://xarray.pydata.org/en/stable/why-xarray.html#features

In [None]:
!conda install xarray netCDF4 -y

In [None]:
import xarray as xr
import numpy as np

In [None]:
xr.DataArray(np.random.randn(2, 3))

In [None]:
data = xr.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
data

In [None]:
data.dims

In [None]:
type(data.values)

In [None]:
data.attrs

In [None]:
# positional and by integer label, like numpy
data[[0]]

In [None]:
# by dimension name and coordinate label
data.sel(x=['a'])

In [None]:
data.mean(dim='x')

In [None]:
ds = data.to_dataset(name="mydata")

In [None]:
ds.to_netcdf("/tmp/data.nc")

In [None]:
ds1 = xr.open_dataset("/tmp/data.nc")

In [None]:
ds1.sel(y=[0])

## OPeNDAP

`xarray` includes support for OPeNDAP (via the netCDF4 library or Pydap), which lets us access large datasets over HTTP.

In [None]:
remote_data = xr.open_dataset(
     'http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods',
     decode_times=False)

In [None]:
remote_data

In [None]:
tmax = remote_data['tmax'][:1, ::3, ::3]

In [None]:
tmax

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.figure(figsize=(14,8))
tmax[0].plot()