# Xarray for multidimensional gridded data

In last week's lecture, we saw how Pandas provided a way to keep track of additional "metadata" surrounding tabular datasets, including "indexes" for each row and labels for each column. These features, together with Pandas' many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.

However, not all Earth science datasets easily fit into the "tabular" model (i.e. rows and columns) imposed by Pandas. In particular, we often deal with _multidimensional data_. By _multidimensional data_ (also often called _N-dimensional_), I mean data with many independent dimensions or axes. For example, we might represent Earth's surface temperature $T$ as a three dimensional variable

$$ T(x, y, t) $$

where $x$ is longitude, $y$ is latitude, and $t$ is time.

The point of xarray is to provide pandas-level convenience for working with this type of data. 



![xarray data model](https://github.com/pydata/xarray/raw/master/doc/_static/dataset-diagram.png)

In [1]:
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt
%matplotlib inline



In [4]:
#url = 'http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.OISST/.version2/.AVHRR-AMSR/.sst/dods'
#url = 'http://iridl.ldeo.columbia.edu/SOURCES/.SOC/.GASC97/dods'
#url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/olrcdr/olr.1deg.eq.1x.latlonregrid.1979-2016.nc'
#url = 'http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.EMC/.CMB/.GLOBAL/.Reyn_SmithOIv2/.weekly/.sst/dods'
url = 'http://iridl.ldeo.columbia.edu/SOURCES/.NASA/.GSFC/.MERRA/.Anl_MonoLev/.t10m/dods '
ds = xr.open_dataset(url)
ds

<xarray.Dataset>
Dimensions:  (T: 324336, X: 540, Y: 361)
Coordinates:
  * X        (X) float32 -180.0 -179.333 -178.667 -178.0 -177.333 -176.667 ...
  * T        (T) datetime64[ns] 1979-01-01T00:59:59.971200 ...
  * Y        (Y) float32 -90.0 -89.5 -89.0 -88.5 -88.0 -87.5 -87.0 -86.5 ...
Data variables:
    t10m     (T, Y, X) float64 ...
Attributes:
    Conventions:  IRIDL

In [5]:
ds_subset = ds.sel(T=slice('2015-01-01','2016-01-01'))
ds_daily = ds_subset.resample('1D', dim='T', how='mean')

In [6]:
ds_daily

<xarray.Dataset>
Dimensions:  (T: 366, X: 540, Y: 361)
Coordinates:
  * X        (X) float32 -180.0 -179.333 -178.667 -178.0 -177.333 -176.667 ...
  * Y        (Y) float32 -90.0 -89.5 -89.0 -88.5 -88.0 -87.5 -87.0 -86.5 ...
  * T        (T) datetime64[ns] 2015-01-01 2015-01-02 2015-01-03 2015-01-04 ...
Data variables:
    t10m     (T, Y, X) float64 245.7 245.7 245.7 245.7 245.7 245.7 245.7 ...

In [7]:
ds_daily.to_netcdf('MERRA_t10m_daily.nc')