# Indexing & Selecting Data

------------------

xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection.

Dimensions of xarray objects have names, so you can also lookup the dimensions by name, instead of remembering their positional order.

Thus in total, xarray supports four different kinds of indexing, as described below and summarized in this table:

In [40]:
import xarray as xr
import pandas as pd
import numpy as np

In [41]:
da = xr.DataArray(np.random.rand(4, 3),
   [('time', pd.date_range('2000-01-01', periods=4)),
   ('space', ['IA', 'IL', 'IN'])])

In [42]:
da

<xarray.DataArray (time: 4, space: 3)>
array([[0.530982, 0.384736, 0.139548],
       [0.701967, 0.25436 , 0.506669],
       [0.088322, 0.681463, 0.985982],
       [0.579701, 0.180966, 0.775258]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA' 'IL' 'IN'

In [43]:
da[:2]

<xarray.DataArray (time: 2, space: 3)>
array([[0.530982, 0.384736, 0.139548],
       [0.701967, 0.25436 , 0.506669]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) <U2 'IA' 'IL' 'IN'

In [44]:
da[0, 0]

<xarray.DataArray ()>
array(0.530982)
Coordinates:
    time     datetime64[ns] 2000-01-01
    space    <U2 'IA'

In [45]:
da[:, [2, 1]]

<xarray.DataArray (time: 4, space: 2)>
array([[0.139548, 0.384736],
       [0.506669, 0.25436 ],
       [0.985982, 0.681463],
       [0.775258, 0.180966]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IN' 'IL'

xarray also supports label-based indexing, just like pandas. Because we use a pandas.Index under the hood, label based indexing is very fast. To do label based indexing, use the loc attribute:

In [8]:
da.loc['2000-01-01':'2000-01-02', 'IA']

<xarray.DataArray (time: 2)>
array([0.312672, 0.951886])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
    space    <U2 'IA'

In this example, the selected is a subpart of the array in the range ‘2000-01-01’:‘2000-01-02’ along the first coordinate time and with ‘IA’ value from the second coordinate space.

Setting values with label based indexing is also supported:

In [9]:
da.loc['2000-01-01', ['IL', 'IN']] = -10

In [10]:
da

<xarray.DataArray (time: 4, space: 3)>
array([[  0.312672, -10.      , -10.      ],
       [  0.951886,   0.592339,   0.525268],
       [  0.271905,   0.751364,   0.528909],
       [  0.550212,   0.063488,   0.191625]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA' 'IL' 'IN'

------------------

# Indexing W/ Dimension Names



With the dimension names, we do not have to rely on dimension order and can use them explicitly to slice data. There are two ways to do this:


In [46]:
da

<xarray.DataArray (time: 4, space: 3)>
array([[0.530982, 0.384736, 0.139548],
       [0.701967, 0.25436 , 0.506669],
       [0.088322, 0.681463, 0.985982],
       [0.579701, 0.180966, 0.775258]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA' 'IL' 'IN'

In [48]:
da[dict(space=0)]

<xarray.DataArray (time: 4)>
array([0.530982, 0.701967, 0.088322, 0.579701])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
    space    <U2 'IA'

In [11]:
da[dict(space=0, time=slice(None, 2))]

<xarray.DataArray (time: 2)>
array([0.312672, 0.951886])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
    space    <U2 'IA'

In [49]:
da.loc[dict(time=slice('2000-01-01', '2000-01-02'))]

<xarray.DataArray (time: 2, space: 3)>
array([[0.530982, 0.384736, 0.139548],
       [0.701967, 0.25436 , 0.506669]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) <U2 'IA' 'IL' 'IN'

In [53]:
da.loc[dict(space=0)]

KeyError: 0

In [13]:
# index by integer array indices
da.isel(space=0, time=slice(None, 2))

<xarray.DataArray (time: 2)>
array([0.312672, 0.951886])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
    space    <U2 'IA'

In [50]:
# index by dimension coordinate labels
da.sel(time=slice('2000-01-01', '2000-01-02'))

<xarray.DataArray (time: 2, space: 3)>
array([[0.530982, 0.384736, 0.139548],
       [0.701967, 0.25436 , 0.506669]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) <U2 'IA' 'IL' 'IN'

In [51]:
# index by dimension coordinate labels
da.isel(time=slice('2000-01-01', '2000-01-02'))

TypeError: 'str' object cannot be interpreted as an integer

---------------------

# Dataset Indexing

In [54]:
#We can also use these methods to index all variables in a dataset simultaneously, returning a new dataset:
da = xr.DataArray(np.random.rand(4, 3),
                      [('time', pd.date_range('2000-01-01', periods=4)),
                       ('space', ['IA', 'IL', 'IN'])])

In [55]:
ds = da.to_dataset(name='foo')

In [56]:
ds.isel(space=[0], time=[0])

<xarray.Dataset>
Dimensions:  (space: 1, time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA'
Data variables:
    foo      (time, space) float64 0.6602

In [18]:
ds.sel(time='2000-01-01')

<xarray.Dataset>
Dimensions:  (space: 3)
Coordinates:
    time     datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA' 'IL' 'IN'
Data variables:
    foo      (space) float64 0.3128 0.8192 0.8832

Positional indexing on a dataset is not supported because
the ordering of dimensions in a dataset is somewhat ambiguous (it can vary between different arrays). However, you can do normal indexing with dimension names:

In [19]:
ds[dict(space=[0], time=[0])]

<xarray.Dataset>
Dimensions:  (space: 1, time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01
  * space    (space) <U2 'IA'
Data variables:
    foo      (time, space) float64 0.3128

Using indexing to assign values to a subset of dataset (e.g., ds[dict(space=0)] = 1) is not yet supported.

-----------------------------------

# Dropping Labels

    

The drop() method returns a new object with the listed index labels along a dimension dropped:

In [25]:
ds.drop(['IN', 'IL'], dim='space')

<xarray.Dataset>
Dimensions:  (space: 1, time: 4)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA'
Data variables:
    foo      (time, space) float64 0.3128 0.6314 0.9186 0.05156

drop() is both a Dataset and DataArray method.

---------------------

# Selecting values with isin

To check whether elements of an xarray object contain a single object, you can compare with the equality operator == (e.g., arr == 3). To check multiple values, use isin():

In [28]:
da = xr.DataArray([1, 2, 3, 4, 5], dims=['x'])

In [29]:
da

<xarray.DataArray (x: 5)>
array([1, 2, 3, 4, 5])
Dimensions without coordinates: x

In [30]:
da.isin([2, 4])

<xarray.DataArray (x: 5)>
array([False,  True, False,  True, False])
Dimensions without coordinates: x

isin() works particularly well with where() to support indexing by arrays that are not already labels of an array:

In [31]:
lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=['x'])

In [32]:
da.where(lookup.isin([-2, -4]), drop=True)

<xarray.DataArray (x: 2)>
array([2., 4.])
Dimensions without coordinates: x

However, some caution is in order: when done repeatedly, this type of indexing is significantly slower than using sel().

# Vectorized Indexing

Like numpy and pandas, xarray supports indexing many array elements at once in a vectorized manner.

In [33]:
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
                   coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})

In [34]:
da

<xarray.DataArray (x: 3, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

In [35]:
da[[0, 1], [1, 1]]

<xarray.DataArray (x: 2, y: 2)>
array([[1, 1],
       [5, 5]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) <U1 'b' 'b'

In [57]:
ind_x = xr.DataArray([0, 1], dims=['x'])
ind_x

<xarray.DataArray (x: 2)>
array([0, 1])
Dimensions without coordinates: x

In [58]:
ind_y = xr.DataArray([0, 1], dims=['y'])
ind_y

<xarray.DataArray (y: 2)>
array([0, 1])
Dimensions without coordinates: y

In [59]:
da

<xarray.DataArray (time: 4, space: 3)>
array([[0.660225, 0.480955, 0.207655],
       [0.043205, 0.487537, 0.333558],
       [0.820248, 0.099803, 0.304183],
       [0.25895 , 0.722659, 0.316458]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'IA' 'IL' 'IN'

In [38]:
# Orthogonal Indexing
da[ind_x, ind_y]

<xarray.DataArray (x: 2, y: 2)>
array([[0, 1],
       [4, 5]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) <U1 'a' 'b'

In [39]:
# vectorized indexing
da[ind_x, ind_x]

<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
  * x        (x) int64 0 1
    y        (x) <U1 'a' 'b'

In [60]:
go through this

SyntaxError: invalid syntax (<ipython-input-60-ee0fbe99488b>, line 1)