# Introduction to Xarray

[Xarray](https://github.com/pydata/xarray) is an open source Python package designed to work with lablled multidimensional arrays in an efficient manner. By multidimensional data (also often called N-dimensional), we mean data with many independent dimensions or axes. For example, we might represent Earth’s surface temperature $T$ as a three dimensional variable:

\begin{equation*}
T(x,y,t)
\end{equation*}

where $x$ and $y$ are spatial dimensions and $t$ is time. By labeled, we mean data that has metadata associated with it describing the names and relationships between the variables.

Example of multidimensional data processing using Xarray:

In [None]:
!wget -O ./dataset.nc https://github.com/pangeo-data/tutorial-data/blob/master/sst/NOAA_NCDC_ERSST_v3b_SST-1960.nc

In [None]:
import xarray as xr

ds = xr.open_dataset("./dataset.nc")
ds

## Data Structures

Like Pandas, xarray has two fundamental data structures: a `DataArray`, which holds a single multi-dimensional variable and its coordinates a `Dataset`, which holds multiple variables that potentially share the same coordinates

A `DataArray` has four essential attributes: * values: a numpy.ndarray holding the array’s values `dims`: dimension names for each axis (e.g., `('x', 'y', 'z')`), `coords`: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) `attrs`: an `OrderedDict` to hold arbitrary metadata (attributes)

A dataset is simply an object containing multiple `DataArrays` indexed by variable name.

In [None]:
xr.set_options(display_style="html")
ds

In [None]:
# attribute syntax
sst = ds.sst  # or sst = ds['sst']

sst

## Indexing

In [None]:
sst.sel(time="1960-06-15").plot(vmin=-2, vmax=30)

We can select along any axis 

In [None]:
sst.sel(lon=180).transpose().plot()

In [None]:
sst.sel(lon=180, lat=40).plot()