# Computation

The labels associated with DataArray and Dataset objects enables some powerful shortcuts for computation, notably including aggregation and broadcasting by dimension names.

Basic array math

Arithmetic operations with a single DataArray automatically vectorize (like numpy) over all array values:



In [43]:
import xarray as xr
import pandas as pd
import numpy as np

In [6]:
arr = xr.DataArray(np.random.RandomState(0).randn(2, 3),
[('x', ['a', 'b']), ('y', [10, 20, 30])])
arr

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.764052,  0.400157,  0.978738],
       [ 2.240893,  1.867558, -0.977278]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

In [5]:
arr - 3

<xarray.DataArray (x: 2, y: 3)>
array([[-1.235948, -2.599843, -2.021262],
       [-0.759107, -1.132442, -3.977278]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

In [7]:
abs(arr)

<xarray.DataArray (x: 2, y: 3)>
array([[1.764052, 0.400157, 0.978738],
       [2.240893, 1.867558, 0.977278]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

You can also use any of numpy’s or scipy’s many ufunc functions directly on a DataArray:

In [8]:
np.sin(arr)

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.981384,  0.389563,  0.829794],
       [ 0.783762,  0.956288, -0.828978]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

Use where() to conditionally switch between values:

In [9]:
xr.where(arr > 0, 'positive', 'negative')

<xarray.DataArray (x: 2, y: 3)>
array([['positive', 'positive', 'positive'],
       ['positive', 'positive', 'negative']], dtype='<U8')
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

Data arrays also implement many numpy.ndarray methods:

In [10]:
arr.round(2)

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.76,  0.4 ,  0.98],
       [ 2.24,  1.87, -0.98]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

In [11]:
arr.T

<xarray.DataArray (y: 3, x: 2)>
array([[ 1.764052,  2.240893],
       [ 0.400157,  1.867558],
       [ 0.978738, -0.977278]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

# Missing values



xarray objects borrow the isnull(), notnull(), count(), dropna(), fillna(), ffill(), and bfill() methods for working with missing data from pandas:

In [12]:
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'])

In [13]:
x

<xarray.DataArray (x: 5)>
array([ 0.,  1., nan, nan,  2.])
Dimensions without coordinates: x

In [14]:
x.isnull()

<xarray.DataArray (x: 5)>
array([False, False,  True,  True, False])
Dimensions without coordinates: x

In [15]:
x.notnull()

<xarray.DataArray (x: 5)>
array([ True,  True, False, False,  True])
Dimensions without coordinates: x

In [16]:
x.count()

<xarray.DataArray ()>
array(3)

In [17]:
x.dropna(dim='x')

<xarray.DataArray (x: 3)>
array([0., 1., 2.])
Dimensions without coordinates: x

In [18]:
x.fillna(-1)

<xarray.DataArray (x: 5)>
array([ 0.,  1., -1., -1.,  2.])
Dimensions without coordinates: x

# Aggregation



Aggregation methods have been updated to take a dim argument instead of axis. This allows for very intuitive syntax for aggregation methods that are applied along particular dimension(s):

In [19]:
arr.sum(dim='x')

<xarray.DataArray (y: 3)>
array([4.004946e+00, 2.267715e+00, 1.460104e-03])
Coordinates:
  * y        (y) int64 10 20 30

In [20]:
arr.std(['x', 'y'])

<xarray.DataArray ()>
array(1.090383)

In [21]:
arr.min()

<xarray.DataArray ()>
array(-0.977278)

If you need to figure out the axis number for a dimension yourself (say, for wrapping code designed to work with numpy arrays), you can use the get_axis_num() method:

In [22]:
arr.get_axis_num('y')

1

These operations automatically skip missing values, like in pandas:

In [23]:
xr.DataArray([1, 2, np.nan, 3]).mean()

<xarray.DataArray ()>
array(2.)

If desired, you can disable this behavior by invoking the aggregation method with skipna=False.

# Computation using Coordinates

Xarray objects have some handy methods for the computation with their coordinates. differentiate() computes derivatives by central finite differences using their coordinates,

In [24]:
a = xr.DataArray([0, 1, 2, 3], dims=['x'], coords=[[0.1, 0.11, 0.2, 0.3]])

In [25]:
a

<xarray.DataArray (x: 4)>
array([0, 1, 2, 3])
Coordinates:
  * x        (x) float64 0.1 0.11 0.2 0.3

In [26]:
a.differentiate('x')


<xarray.DataArray (x: 4)>
array([100.      ,  91.111111,  10.584795,  10.      ])
Coordinates:
  * x        (x) float64 0.1 0.11 0.2 0.3

This method can be used also for multidimensional arrays,

In [27]:
a = xr.DataArray(np.arange(8).reshape(4, 2), dims=['x', 'y'],
coords={'x': [0.1, 0.11, 0.2, 0.3]})

In [28]:
a

<xarray.DataArray (x: 4, y: 2)>
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])
Coordinates:
  * x        (x) float64 0.1 0.11 0.2 0.3
Dimensions without coordinates: y

In [29]:
a.differentiate('x')

<xarray.DataArray (x: 4, y: 2)>
array([[200.      , 200.      ],
       [182.222222, 182.222222],
       [ 21.169591,  21.169591],
       [ 20.      ,  20.      ]])
Coordinates:
  * x        (x) float64 0.1 0.11 0.2 0.3
Dimensions without coordinates: y

# Coordinates

Although index coordinates are aligned, other coordinates are not, and if their values conflict, they will be dropped. This is necessary, for example, because indexing turns 1D coordinates into scalar coordinates:

In [34]:
arr

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.764052,  0.400157,  0.978738],
       [ 2.240893,  1.867558, -0.977278]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30

In [30]:
arr[0]

<xarray.DataArray (y: 3)>
array([1.764052, 0.400157, 0.978738])
Coordinates:
    x        <U1 'a'
  * y        (y) int64 10 20 30

In [31]:
arr[1]

<xarray.DataArray (y: 3)>
array([ 2.240893,  1.867558, -0.977278])
Coordinates:
    x        <U1 'b'
  * y        (y) int64 10 20 30

In [32]:
arr[1] - arr[0]

<xarray.DataArray (y: 3)>
array([ 0.476841,  1.467401, -1.956016])
Coordinates:
  * y        (y) int64 10 20 30

In [33]:
arr[0] + 1

<xarray.DataArray (y: 3)>
array([2.764052, 1.400157, 1.978738])
Coordinates:
    x        <U1 'a'
  * y        (y) int64 10 20 30

In [35]:
arr[0] - arr[0]

<xarray.DataArray (y: 3)>
array([0., 0., 0.])
Coordinates:
    x        <U1 'a'
  * y        (y) int64 10 20 30

# Wrapping custom computation

In [40]:
squared_error = lambda x, y: (x - y) ** 2

In [41]:
arr1 = xr.DataArray([0, 1, 2, 3], dims='x')
arr1

<xarray.DataArray (x: 4)>
array([0, 1, 2, 3])
Dimensions without coordinates: x

In [42]:
xr.apply_ufunc(squared_error, arr1, 1)

<xarray.DataArray (x: 4)>
array([1, 0, 1, 4])
Dimensions without coordinates: x

In [44]:
help(xr.apply_ufunc)

Help on function apply_ufunc in module xarray.core.computation:

apply_ufunc(func, *args, **kwargs)
    apply_ufunc(func : Callable,
                   *args : Any,
                   input_core_dims : Optional[Sequence[Sequence]] = None,
                   output_core_dims : Optional[Sequence[Sequence]] = ((),),
                   exclude_dims : Collection = frozenset(),
                   vectorize : bool = False,
                   join : str = 'exact',
                   dataset_join : str = 'exact',
                   dataset_fill_value : Any = _NO_FILL_VALUE,
                   keep_attrs : bool = False,
                   kwargs : Mapping = None,
                   dask : str = 'forbidden',
                   output_dtypes : Optional[Sequence] = None,
                   output_sizes : Optional[Mapping[Any, int]] = None)
    
    Apply a vectorized function for unlabeled arrays on xarray objects.
    
    The function will be mapped over the data variable(s) of the input
    argu