# X-Array cheatsheet

To open a dataset or data-array directly through xarray:

In [1]:
import xarray as xr

In [6]:
ds = xr.open_dataset("../../../topo.nc")

### Viewing variables within dataset

In [14]:
ds.variables

Frozen(OrderedDict([('y', <xarray.IndexVariable 'y' (y: 17002)>
array([4230327., 4230324., 4230321., ..., 4179330., 4179327., 4179324.],
      dtype=float32)
Attributes:
    description:    UTM, north south
    long_name:      y coordinate
    units:          meters
    standard_name:  projection_y_coordinate), ('x', <xarray.IndexVariable 'x' (x: 17569)>
array([254007., 254010., 254013., ..., 306705., 306708., 306711.],
      dtype=float32)
Attributes:
    description:    UTM, east west
    long_name:      x coordinate
    units:          meters
    standard_name:  projection_x_coordinate), ('veg_height', <xarray.Variable (y: 17002, x: 17569)>
[298708138 values with dtype=float32]
Attributes:
    long_name:     vegetation height
    grid_mapping:  projection), ('veg_tau', <xarray.Variable (y: 17002, x: 17569)>
[298708138 values with dtype=float32]
Attributes:
    long_name:     vegetation tau
    grid_mapping:  projection), ('veg_k', <xarray.Variable (y: 17002, x: 17569)>
[298708138 va

### Accessing individual data variables

In [15]:
ds.dem

<xarray.DataArray 'dem' (y: 17002, x: 17569)>
[298708138 values with dtype=float32]
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4179327.0 4179324.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 306708.0 306711.0
Attributes:
    long_name:     dem
    grid_mapping:  projection

### Accessing just the values of a data variable

In [16]:
ds.dem.values

array([[2633.369 , 2635.3313, 2635.3313, ..., 2137.585 , 2137.585 ,
        2137.6157],
       [2633.369 , 2635.3313, 2635.3313, ..., 2137.585 , 2137.585 ,
        2137.6157],
       [2634.1025, 2637.2336, 2637.2336, ..., 2139.121 , 2139.121 ,
        2138.9734],
       ...,
       [1585.9766, 1580.0369, 1580.0369, ..., 3063.1907, 3063.1907,
        3062.4495],
       [1585.9766, 1580.0369, 1580.0369, ..., 3063.1907, 3063.1907,
        3062.4495],
       [1587.6707, 1582.0449, 1582.0449, ..., 3064.8484, 3064.8484,
        3064.3577]], dtype=float32)

### To slice the data to select only a subset of the whole dataset

In [24]:
a = ds.isel(x=0, y=0) # selects across each dimension by index value; does not work on individual values

In [47]:
dem_var = ds.dem # selects an individual value from a data varible
dem_var

<xarray.DataArray 'dem' (y: 17002, x: 17569)>
array([[2633.369 , 2635.3313, 2635.3313, ..., 2137.585 , 2137.585 , 2137.6157],
       [2633.369 , 2635.3313, 2635.3313, ..., 2137.585 , 2137.585 , 2137.6157],
       [2634.1025, 2637.2336, 2637.2336, ..., 2139.121 , 2139.121 , 2138.9734],
       ...,
       [1585.9766, 1580.0369, 1580.0369, ..., 3063.1907, 3063.1907, 3062.4495],
       [1585.9766, 1580.0369, 1580.0369, ..., 3063.1907, 3063.1907, 3062.4495],
       [1587.6707, 1582.0449, 1582.0449, ..., 3064.8484, 3064.8484, 3064.3577]],
      dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4179327.0 4179324.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 306708.0 306711.0
Attributes:
    long_name:     dem
    grid_mapping:  projection

**For this, slice by dimension first, then select the values you need from the variable you need**

### Accessing coordinates and their values

In [44]:
ds.coords

Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4179327.0 4179324.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 306708.0 306711.0

### Accessing metadata

In [46]:
ds.attrs

OrderedDict([('last_modified', '[2019-08-08 17:17:16] Data added or updated'),
             ('Conventions', 'CF-1.6'),
             ('dateCreated', '2019-08-08 17:17:34'),
             ('Title', 'Topographic Images for SMRF/AWSM'),
             ('history',
              '[2019-08-08 17:17:34] Create netCDF4 file using Basin Setup v0.8.2'),
             ('institution',
              'USDA Agricultural Research Service, Northwest Watershed Research Center'),
             ('generation_command',
              '/usr/local/bin/basin_setup -f corrected_tuolumne_subbasin.shp -bn Tuolumne -dm tuolumne_UTM11_WGS84.tif -apd 0 0 0 220 -d /Downloads --cell_size 3')])

### Slicing data variables across dimensions

In [49]:
dem_var.isel(x=slice(0,500), y=slice(0,500))

<xarray.DataArray 'dem' (y: 500, x: 500)>
array([[2633.369 , 2635.3313, 2635.3313, ..., 2591.5686, 2591.5686, 2591.5686],
       [2633.369 , 2635.3313, 2635.3313, ..., 2591.5686, 2591.5686, 2591.5686],
       [2634.1025, 2637.2336, 2637.2336, ..., 2591.5337, 2591.5337, 2591.5337],
       ...,
       [2724.1108, 2723.7507, 2723.7507, ..., 2525.5593, 2525.5593, 2525.5593],
       [2724.1108, 2723.7507, 2723.7507, ..., 2525.5593, 2525.5593, 2525.5593],
       [2723.1982, 2722.8303, 2722.8303, ..., 2529.1357, 2529.1357, 2529.1357]],
      dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4228833.0 4228830.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 255501.0 255504.0
Attributes:
    long_name:     dem
    grid_mapping:  projection

* If you want to slice by label, use .sel instead of .isel

In [69]:
dem_var[:500, :500]

<xarray.DataArray 'dem' (y: 500, x: 500)>
array([[2633.369 , 2635.3313, 2635.3313, ..., 2591.5686, 2591.5686, 2591.5686],
       [2633.369 , 2635.3313, 2635.3313, ..., 2591.5686, 2591.5686, 2591.5686],
       [2634.1025, 2637.2336, 2637.2336, ..., 2591.5337, 2591.5337, 2591.5337],
       ...,
       [2724.1108, 2723.7507, 2723.7507, ..., 2525.5593, 2525.5593, 2525.5593],
       [2724.1108, 2723.7507, 2723.7507, ..., 2525.5593, 2525.5593, 2525.5593],
       [2723.1982, 2722.8303, 2722.8303, ..., 2529.1357, 2529.1357, 2529.1357]],
      dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4228833.0 4228830.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 255501.0 255504.0
Attributes:
    long_name:     dem
    grid_mapping:  projection

* Note that both of these do the same thing

### Reduction functions

Mean:

In [58]:
mean_x = dem_var.isel(x=slice(0,500), y=slice(0,500)).mean(dim='x')
mean_x

<xarray.DataArray 'dem' (y: 500)>
array([2669.386 , 2669.386 , 2669.398 , ..., 2569.0754, 2569.0754, 2568.481 ],
      dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4228833.0 4228830.0

In [59]:
mean_y = dem_var.isel(x=slice(0,500), y=slice(0,500)).mean(dim='y')
mean_y

<xarray.DataArray 'dem' (x: 500)>
array([2672.5251, 2674.039 , 2674.039 , ..., 2512.8276, 2512.8276, 2512.8276],
      dtype=float32)
Coordinates:
  * x        (x) float32 254007.0 254010.0 254013.0 ... 255501.0 255504.0

**Note: the dim argument is the dimension over which you want to perform the reduction function. It will go away in the output.**

**Not providing an argument in mean() or std() will perform the operation over the entire dataset, outputting one value**

In [78]:
mu = dem_var.isel(x=slice(0,500), y=slice(0,500)).mean()
mu

<xarray.DataArray 'dem' ()>
array(2631.3975, dtype=float32)

Standard deviation:

In [61]:
std_x = dem_var.isel(x=slice(0,500), y=slice(0,500)).std(dim='x')
std_x

<xarray.DataArray 'dem' (y: 500)>
array([ 28.342205,  28.342205,  28.163368, ..., 112.54608 , 112.54608 ,
       112.47522 ], dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0 4230324.0 4230321.0 ... 4228833.0 4228830.0

In [62]:
std_y = dem_var.isel(x=slice(0,500), y=slice(0,500)).std(dim='y')
std_y

<xarray.DataArray 'dem' (x: 500)>
array([41.53691, 40.75824, 40.75824, ..., 34.67409, 34.67409, 34.67409],
      dtype=float32)
Coordinates:
  * x        (x) float32 254007.0 254010.0 254013.0 ... 255501.0 255504.0

In [80]:
sigma = dem_var.isel(x=slice(0,500), y=slice(0,500)).std()
sigma

<xarray.DataArray 'dem' ()>
array(87.72577, dtype=float32)

### Operations on datasets

In [81]:
xr.apply_ufunc(lambda a, b: a * b, dem_var[:500 : 500], 2) # lambda allows one to use a function without defining it; only really useful for mathematical operations

<xarray.DataArray 'dem' (y: 1, x: 17569)>
array([[5266.738 , 5270.6626, 5270.6626, ..., 4275.17  , 4275.17  , 4275.2314]],
      dtype=float32)
Coordinates:
  * y        (y) float32 4230327.0
  * x        (x) float32 254007.0 254010.0 254013.0 ... 306708.0 306711.0

Code syntax:

lambda arg1, arg2, ... , arg(n): function, assignment1, assignment2, ... , assignment(n)


In [88]:
import numpy.random as rd
xr.apply_ufunc(lambda a, b: a + b, dem_var[:500 : 500], rd.rand(300)) # doing this yields an error, as the dataset sizes aren't the same

ValueError: operands could not be broadcast together with shapes (1,17569) (300,) 

### Grouping across common variable

If you have time series data and want to group across a similar month, day, etc. in each year, use the .groupby() function to sort the whole array by the specified variable.

Combining groupby() with reduction functions is a great way to generate statistics over a time series:

**I know this worked because 'months' is now only 12 values (it started with 240)**