# xarray

Sometimes, we really want to work with collections of higher dimensional arrays (ndim > 2), or arrays for which the order of dimensions (e.g., columns vs. rows) shouldn’t really matter. For example, climate and weather data is often natively expressed in 4 or more dimensions: time, and x, y and z coordinates.

The main distinguishing feature of xarray’s `DataArray` over labeled arrays in pandas is that dimensions can have names (e.g., “time”, “latitude”, “longitude”). Names are much easier to keep track of than axis numbers, and xarray uses dimension names for indexing, aggregation and broadcasting. 

Install it via:

    conda install xarray

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import xarray as xr
xr.__version__

'0.10.0'

## creation

In [2]:
d = xr.DataArray([[0, 1], [2, 3]], 
                 [('time', [1.5, 2.7]), 
                  ('space', ['x', 'y'])], 
                 name='Example 1')
d

<xarray.DataArray 'Example 1' (time: 2, space: 2)>
array([[0, 1],
       [2, 3]])
Coordinates:
  * time     (time) float64 1.5 2.7
  * space    (space) <U1 'x' 'y'

In [3]:
d.to_pandas()  # easy conversion to...

space,x,y
time,Unnamed: 1_level_1,Unnamed: 2_level_1
1.5,0,1
2.7,2,3


In [4]:
df = pd.DataFrame({'x': [0, 2], 'y': [1, 3]}, index=[1.5, 2.7])
d = xr.DataArray(df, dims=['time', 'space'], name='Example 2')  # ... and from pandas
d

<xarray.DataArray 'Example 2' (time: 2, space: 2)>
array([[0, 1],
       [2, 3]])
Coordinates:
  * time     (time) float64 1.5 2.7
  * space    (space) object 'x' 'y'

In [5]:
d.to_pandas()  # ...and back again

space,x,y
time,Unnamed: 1_level_1,Unnamed: 2_level_1
1.5,0,1
2.7,2,3


## properties

In [6]:
d.values

array([[0, 1],
       [2, 3]])

In [7]:
d.dims

('time', 'space')

In [8]:
d.coords

Coordinates:
  * time     (time) float64 1.5 2.7
  * space    (space) object 'x' 'y'

In [9]:
d.attrs['unit'] = 'Hz'  # a dictionary to store arbitrary metadata
d.attrs

OrderedDict([('unit', 'Hz')])

## indexing

In [10]:
d[1], '-'*30, d[:,1]  # like numpy  

(<xarray.DataArray 'Example 2' (space: 2)>
 array([2, 3])
 Coordinates:
     time     float64 2.7
   * space    (space) object 'x' 'y'
 Attributes:
     unit:     Hz,
 '------------------------------',
 <xarray.DataArray 'Example 2' (time: 2)>
 array([1, 3])
 Coordinates:
   * time     (time) float64 1.5 2.7
     space    <U1 'y'
 Attributes:
     unit:     Hz)

In [11]:
d.loc[2.7], '-'*30, d.loc[:,'y']  # like pandas

(<xarray.DataArray 'Example 2' (space: 2)>
 array([2, 3])
 Coordinates:
     time     float64 2.7
   * space    (space) object 'x' 'y'
 Attributes:
     unit:     Hz,
 '------------------------------',
 <xarray.DataArray 'Example 2' (time: 2)>
 array([1, 3])
 Coordinates:
   * time     (time) float64 1.5 2.7
     space    <U1 'y'
 Attributes:
     unit:     Hz)

In [12]:
d.isel(time=1)  # by dimension name and integer index

<xarray.DataArray 'Example 2' (space: 2)>
array([2, 3])
Coordinates:
    time     float64 2.7
  * space    (space) object 'x' 'y'
Attributes:
    unit:     Hz

In [13]:
d.sel(space='x')  # by dimension name and coordinate label

<xarray.DataArray 'Example 2' (time: 2)>
array([0, 2])
Coordinates:
  * time     (time) float64 1.5 2.7
    space    <U1 'x'
Attributes:
    unit:     Hz

## computation and aggregation

In [14]:
d * 2

<xarray.DataArray 'Example 2' (time: 2, space: 2)>
array([[0, 2],
       [4, 6]])
Coordinates:
  * time     (time) float64 1.5 2.7
  * space    (space) object 'x' 'y'

In [15]:
np.exp(d)

<xarray.DataArray 'Example 2' (time: 2, space: 2)>
array([[ 1.      ,  2.718282],
       [ 7.389056, 20.085537]])
Coordinates:
  * time     (time) float64 1.5 2.7
  * space    (space) object 'x' 'y'

In [16]:
d.mean(axis=1)  # aggregation operations can use dimension names instead of axis numbers!

<xarray.DataArray 'Example 2' (time: 2)>
array([0.5, 2.5])
Coordinates:
  * time     (time) float64 1.5 2.7

In [17]:
d.mean(dim='space').values  # just the values

array([0.5, 2.5])

## broadcasting
Arithmetic operations are broadcast based on dimension name:

In [18]:
a = xr.DataArray(np.arange(3), dims='x')
b = xr.DataArray(np.arange(3), dims='x+add')  # try just 'x' or 'x-add' here
a + b

<xarray.DataArray (x: 3, x+add: 3)>
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])
Dimensions without coordinates: x, x+add

# further reading
For more information on the `xarray` package, and the `Dataset` object (an in-memory representation of the *netCDF* file format), see

http://xarray.pydata.org/en/stable/