## When pandas is not enough



![img](images/xarray.png)


## xarray



-   `xarray` is `pandas` and `numpy` on steroids!
-   Labels in the form of dimensions, coordinates and attributes
-   Inspired by `pandas` but also integrates with `numpy`



## xarray data structures



-   `DataArray` is a labeled, multi-dimensional array
    -   `values`: a numpy array holding the values
    -   `dims`: dimension names
    -   `coords`: a dict-like container of arrays with labels for each point
    -   `attrs`: arbitrary metadata
-   `Dataset` is `xarray` equivalent to a `DataFrame`
    -   `data_vars`: a dict-like container of `DataArray` objects



## Create a DataArray



Let's import the modules



In [1]:
import numpy as np
import pandas as pd
import xarray as xr

The constructor takes either a numpy array or a Series/DataFrame



In [1]:
xr.DataArray(np.range(6).reshape((2, 3)))

In [1]:
data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))
data

In [1]:
xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))

## Indexing



Four kinds of indexing are supported by `xarray`



Positional and by integer label (like numpy)



In [1]:
data[[0, 1]]

Positional and by coordinate label (like pandas)



In [1]:
data.loc['a':'b']

By dimension name and integer label



In [1]:
data.isel(x=slice(2))

By dimension name and coordinate label



In [1]:
data.sel(x=['a', 'b'])

## Computation



Numpy operations work the way we expect them to



In [1]:
(data + 10).sum()

In [1]:
np.sin(data.T)

Aggregate operations can use dimension names



In [1]:
data.mean(dim='x')

Arithmetic operations broadcast based on dimension name



In [1]:
a = xr.DataArray(np.arange(4, 7), [data.coords['y']])
a

In [1]:
b = xr.DataArray(np.arange(9, 13), dims='z')
b

In [1]:
a + b

## Groupby



Grouped operations are supported



In [1]:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
labels

In [1]:
data.groupby(labels).mean('y')

In [1]:
data.groupby(labels).apply(lambda x: x - x.min())

## pandas



Can easily convert to and from `pandas` objects



In [1]:
series = data.to_series()
series

In [1]:
series.to_xarray()

## Datasets



Generalization of `DataArray` objects



In [1]:
ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})
ds

Can access them as with dictionaries



In [1]:
ds['foo']

## NetCDF



Recommended way of writing `xarray` objects to disk



In [1]:
ds.to_netcdf('examples/examplexr.nc')
xr.open_dataset('examples/examplexr.nc')

## Temperature data



Let's grab some data from the NCEP Reanalysis. Before we start ensure you have installed `netCDF4`!

[https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.derived.surfaceflux.html](https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.derived.surfaceflux.html)



-   We can download the entire dataset but if we don't want the entire globe?
-   We will use the Opendap protocol

[https://www.esrl.noaa.gov/psd/thredds/catalog/Datasets/ncep.reanalysis.derived/surface_gauss/catalog.html](https://www.esrl.noaa.gov/psd/thredds/catalog/Datasets/ncep.reanalysis.derived/surface_gauss/catalog.html)



In [1]:
ds = xr.open_dataarray("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis.derived/surface_gauss/tmax.2m.mon.mean.nc")
ds

In [1]:
ds = ds.isel(lat=slice(21, 35), lon=slice(123, 160))
ds

Data are not downloaded yet!



In [1]:
ds = ds.load()

## Load the corresponding minimum temperature data



## Construct a Dataset from the two DataArrays



## Create a DataFrame and examine the data



Use `head`, `describe`, `plot`, and `sns.pairplot`



## Plot the probability of freezing by month for three locations



-   Calculate the freezing probability
-   Stack the DataArray to create the location coordinate
-   Select three locations and plot (Hint: look at the `to_pandas` method)



## Calculate monthly anomalies and plot them



Plot spatial average as well as a map



## Now for a different kind of exercise



-   Go to [http://xarray.pydata.org/en/stable/examples/monthly-means.html](http://xarray.pydata.org/en/stable/examples/monthly-means.html)
-   Let's walk through each section
-   Take a few minutes to understand the code and the discuss!

