# `xarray`

`xarray`:

- python package
- augments NumPy arrays by adding labeled dimensions, coordinates and attributes
- based on the NetCDF data model

Today: learn about `xarray.DataArray` and `xarray.Dataset`

## `xarray.DataArray`

- primary data structure of the xarray package

## Create an `xarray.DataArray`


In [1]:
import os              
import pandas as pd
import numpy as np

import xarray as xr   # This is the package we'll explore

**Variable Values**

The underlying data in the `xarray.DataArray` is a `numpy.ndarray` that holds the variable values

So we can start by making a `numpy.ndarray` with our mock temperature data

In [6]:
# Values of a single variable (temp) at each point of the coords
temp_data = np.array([np.zeros([5,5]),
         np.ones((5,5)),
         np.ones((5,5))*2]).astype(int)

temp_data

array([[[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2]]])

**Dimensions and Coordinates**

To specify the dimensions of our upcoming `xarray.DataArray`, we must examine how we’ve constructed the `numpy.ndarray` holding the temperature data

We have that:

- 1st dimension: date coordinates are 2022-09-01, 2022-09-02, 2022-09-03
- 2nd dimension: latitude coordinates are 70, 60, 50, 40, 30 (notice decreasing order)
- 3rd dimension: longitude coordinates are 60, 70, 80, 90, 100 (notice increasing order)

Add dims and coords:

In [11]:
#names of dimensions in the required order
dims = ('time', 'lat', 'lot')

#create coordinates along each dimension (dictionary)
coords = { 'time' : pd.date_range('2022-09-01', '2022-09-03'),
         'lat' : np.arange(70,20, -10),
         'lot' : np.arange(60, 110, 10)}

**Attributes**

In [8]:
#add the attributes (metadata) as a dictionary
attrs = {'title' : 'temperature across weather stations',
         'standard_name' : 'air_temperature',
         'units' : 'degrees_c'}

In [12]:
#initialize xarray.DataArray
temp = xr.DataArray(data = temp_data,
                   dims = dims,
                   coords = coords,
                   attrs = attrs)
temp

## Subsetting

To select data from an `xarray.DataArray` we need to specify the subsets we want along each dimension.
We can do this in two ways:

- relying on the dimension's positions(**dimension lookup by position**)
- by calling each dimension by its name (**dimension lookup by name**)