# `xarray`

`xarray`:

- python package
- augments NumPy arrays by adding labeled dimensions, coordinates and attributes
- based on the NetCDF data model

Today: learn about `xarray.DataArray` and `xarray.Dataset`

## `xarray.DataArray`

- primary data structure of the xarray package

## Create an `xarray.DataArray`


In [2]:
import os              
import pandas as pd
import numpy as np

import xarray as xr   # This is the package we'll explore

**Variable Values**

The underlying data in the `xarray.DataArray` is a `numpy.ndarray` that holds the variable values

So we can start by making a `numpy.ndarray` with our mock temperature data

In [3]:
# Values of a single variable (temp) at each point of the coords
temp_data = np.array([np.zeros([5,5]),
         np.ones((5,5)),
         np.ones((5,5))*2]).astype(int)

temp_data

array([[[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2]]])

**Dimensions and Coordinates**

To specify the dimensions of our upcoming `xarray.DataArray`, we must examine how we’ve constructed the `numpy.ndarray` holding the temperature data

We have that:

- 1st dimension: date coordinates are 2022-09-01, 2022-09-02, 2022-09-03
- 2nd dimension: latitude coordinates are 70, 60, 50, 40, 30 (notice decreasing order)
- 3rd dimension: longitude coordinates are 60, 70, 80, 90, 100 (notice increasing order)

Add dims and coords:

In [4]:
#names of dimensions in the required order
dims = ('time', 'lat', 'lot')

#create coordinates along each dimension (dictionary)
coords = { 'time' : pd.date_range('2022-09-01', '2022-09-03'),
         'lat' : np.arange(70,20, -10),
         'lot' : np.arange(60, 110, 10)}

**Attributes**

In [5]:
#add the attributes (metadata) as a dictionary
attrs = {'title' : 'temperature across weather stations',
         'standard_name' : 'air_temperature',
         'units' : 'degrees_c'}

In [6]:
#initialize xarray.DataArray
temp = xr.DataArray(data = temp_data,
                   dims = dims,
                   coords = coords,
                   attrs = attrs)
temp

## Subsetting

To select data from an `xarray.DataArray` we need to specify the subsets we want along each dimension.
We can do this in two ways:

- relying on the dimension's positions(**dimension lookup by position**)
- by calling each dimension by its name (**dimension lookup by name**)

**Example**

We want the temperature recorded by the weather station located at 40N 80E

## Reduction

`xarray` has implemented several methods to reduce an `xarray.DataArray` along any number of dimensions.

**Example**

Calculate the average temperature at each weather station over time 

In [9]:
avg_temp = temp.mean(dim = 'time')
avg_temp

In [11]:
avg_temp.attrs = {'title':'average temperature over three days'}
avg_temp

## `xarray.DataSet`

`xarray.DataSet`:
- resembles an in-memory representation of a NetCDF file 
- consists of *multiple* variables (each being an xarray.DataArray)
- self-describing
- Attributes can be specific to each variable, each dimension, or they can describe the whole dataset
- variables in an xarray.DataSet can have the same dimensions, share some dimensions, or have no dimensions in common

**Example**
Combine temp and avg temp data into a single object:

In [12]:
# make dictionaries with variables and attributes
data_vars = {'avg_temp': avg_temp,
             'temp' : temp}

attrs = {'title': 'temperature data at weather stations:datily and average',
        'description':'simple example of an xarray.Dataset'}

#create xarray.Dataset
temp_dataset = xr.Dataset( data_vars = data_vars,
                         attrs = attrs)

In [13]:
temp_dataset