# `xarray` 

`xarray` is a Python package, augments numpy arrays by adding labeled dimensions, coordinates, and attributes, and based on the NetCDF data model


Today: learn `xarray.DataArray` and `xarray.Dataset`

##`xarray.DataArray`
- primary object of `xarray`
- it is a n-dimensional array with **labeled dimensions**
- represents a single variable in the NetCDF data format: holds the variable's values, dimensions, and attributes

in `xarray` each dimension has a set of **coordinates**.
A dimension's coordinates indicate the dimension's values (tick labels along the dimension)

## Make an `xarray.DataArray`

We will use the info in our example. 

In [1]:
import pandas as pd
import numpy as np
import xarray as xr

**Variable values**

Underlying data in `xr.DataArray` is a `np.array` that holds the variable values.
We start by making a `np.array` of our mock temperature data.

In [13]:
# values of a single variable (temp) at each point of the coords 
temp_data = np.array([np.zeros((5,5)),
          np.ones((5,5)),
          np.ones((5,5))*2]).astype(int)

temp_data

array([[[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2]]])

**Dimensions and coordinates**

To specify the dimensions of our `xr.DataArray`, let's think about how we constructed the `np.array` which holds the data.

We have that:
- 1st dimension: time, coords: 2022-09-01;2022-09-02;2022-09-03
- 2nd dimension: latitude, coords: from 70 to 30, decreasing by 10
- 3rd dimension: longitude, coords: from 60 to 100, increasing by 10

Add dims and coords:

In [16]:
#names of dimensions in the required order
dims = ('time','lat','lon')

#make coordinates along each dimension
coords = {'time':pd.date_range('2022-09-01','2022-09-03'),
         'lat': np.arange(70, 20, -10 ),
          'lon': np.arange(60,110,10)}

**Attributes**

In [11]:
# add the attributes (metadata) as a dictionary
attrs = {'title': 'temp across weather stations',
         'standard_name': 'air_temperature',
         'units':'degree_c'}

**Combining items**

In [17]:
#initialize xaray.DataArray
temp = xr.DataArray(data = temp_data,
                   dims = dims,
                   coords = coords,
                   attrs = attrs)
temp

## Subsetting data 

To select daa from an `xarray.DataArray`, we need to specify the subsets we want along each dimension.
We can do this in two ways:
- relying on the dimension's positions (**dimension lookup by position**)
- by calling each dimension by its name (**dimensions lookup by name**)

**Example**

We want the temperature recorded by the weather station located at 40N 80 E on September 2, 2022

In [18]:
# access dimensions by position, then use integers for indexing
temp[0,3,2]

In [19]:
# by label ; we use the 'loc' locator to look up a specific coordinate at each position

temp.loc['2022-09-01', 40, 80]

In [20]:
# access dimensions by name, then use integers for indexing
temp.isel(time= 0, lon=2, lat = 3)

In [22]:
# access dimensions by name, then use labels for indexing
temp.sel(time = '2022-09-01', lat = 40, lon = 80)