# The Power of labeled data structures

***Purpose: Your data has labels; you should use them***

![](images/dataset-diagram.png)

Scientific data is inherently labeled. For example, time series data includes timestamps that label individual periods or points in time, spatial data has coordinates (e.g. longitude, latitude, elevation), and model or laboratory experiments are often identified by unique identifiers. The figure above provides an example of a labeled dataset. In this case the data is a map of global air temperature from a numeric weather model. The labels on this particular dataset are time (e.g. “2016-05-01”), longitude (x-axis), and latitude (y-axis).

----

### Outline
- Named dimensions/axes
- Coordinate labels
- Label based indexing
- Alignment

### Tutorial Duriation
10 minutes

In [None]:
import xarray as xr


In [None]:
ds = xr.tutorial.load_dataset('air_temperature')
ds

## The old way (numpy positional indexing)

When working with numpy, indexing is done by position (slices/ranges/scalars). 

In [None]:
t = ds['air'].data  # numpy array
t

In [None]:
t.shape

In [None]:
# extract a time-series for one spatial location
t[:, 10, 20]

but wait, what labels go with `10` and `20`? Was that lat/lon or lon/lat? Where are the timestamps that go along with this time-series.

# Indexing with Xarray


In [None]:
da = ds['air']

In [None]:
# numpy style indexing still works (but preserves the labels/metadata)
da[:, 10, 20]

In [None]:
# Positional indexing using dimension names
da.isel(lat=10, lon=20)

In [None]:
# Label-based indexing
da.sel(lat=50., lon=250.)

In [None]:
# Nearest neighbor lookups
da.sel(lat=52.25, lon=251.8998, method='nearest')

In [None]:
# all of these indexing methods work on the dataset too, e.g.:
ds.sel(lat=52.25, lon=251.8998, method='nearest')

## Vectorized indexing

In [None]:
# generate a coordinates for a transect of points
lat_points = xr.DataArray([52, 52.5, 53], dims='points')
lon_points = xr.DataArray([250, 250, 250], dims='points')

In [None]:
# nearest neighbor selection along the transect
da.sel(lat=lat_points, lon=lon_points, method='nearest')

In [None]:
# more to do here:
# alignment
# broadcasting
# 