## Working with Xarray
Xarry is a class of objects added to the regular python system that allows storing data in a more organized method. The format is very similar to netCDF classic model (netCDF3). It can read netCDF files efficiently and handle some issues associated with incorrectly designed netCDF files.

Xarray also has extetions to use the Numpy, Pandas, and SciPy libaries directly. Think of Xarray as a tool for orgnaizing data in a way that other libaries can be used on the data efficiently.

The primary difference between Xarray and Pandas is that Pandas is designed to handle 1-D data while Xarray can handle n-D data and metadata about the data.

The one downside is that Xarry has very powerful functions with less great documentation. May need to dig a bit to get the best way to perform a task.

In [None]:
import numpy as np
import xarray as xr  # Convention is to import xarray as xr

## DataArray
Here we create some data with Numpy and put into an Xarray DataArray. Notice how there is a concept of dimentionality built into DataArray. "xarray.DataArray  (dim_0: 10000)". But because we didn't define the dimention name a generic one was created for us.

In [None]:
data = np.arange(10000)  # This is a numpy array
# Now we create the Xarray DataArray and add the data.
xr_da = xr.DataArray(data)
xr_da

This time create a time array to match the data array shape. Time will be one minute time steps. The time array will become a coordinate variable to describe the values along the dimention we defined as named "time". The coordinate is set to the time array and the dimention is set to "time" string.

In [None]:
time = np.array('2019-11-01T00:00:00', dtype='datetime64[m]') + np.arange(data.size)
xr_da = xr.DataArray(data, dims=['time'], coords=[time])
xr_da

We can add attributes describing metadata about the data.

In [None]:
xr_da.attrs['long_name'] = 'Amazing data that will win me a Nobel prize.'
xr_da.attrs['units'] = 'degK'
xr_da.attrs['valid_min'] = 0.
xr_da.attrs['valid_max'] = 10000.
xr_da

Same as above but all in one step while creating the DataArray.

In [None]:
xr_da = xr.DataArray(
    data, dims=['time'],
    coords=[time],
    attrs={'long_name': 'Amazing data that will win me a Nobel prize.',
           'units': 'degK',
           'valid_min': 0.,
           'valid_max': 10000.})
xr_da

To extract the data values only we use the .values attribute on the DataArray

In [None]:
xr_da.values

To extract the attributes as a dictionary we use the .attrs attribute. It can also accept a name for a specific attribute

In [None]:
xr_da.attrs

In [None]:
xr_da.attrs['long_name']

In [None]:
type(xr_da)

In [None]:
type(xr_da.values)

In [None]:
type(xr_da.attrs)

## Dataset
The full power of Xarray comes from using Datasets. A Dataset is a collection of DataArrays. The beauty of Datasets is holding all the corresoponding data together and performing functions on multiple DataArrays in the Datasets all at once. This becomes very powerful and very fast!

Create some data and a time data array to match the data we created with minute time steps.

In [None]:
data1 = np.arange(10000, dtype=float)
data2 = np.arange(10000, dtype=float) + 123.456
time = np.array('2019-11-01T00:00:00', dtype='datetime64[m]') + np.arange(data1.size)

In [None]:
xr_ds = xr.Dataset(
    # This is the data section.
    # Notice all data is wrappted in a dictionary. In that dict the key
    # is the variable name followed by a tuple. The first value of the tuple
    # is the dimension(s) name, folloed by the data values, followed by optional
    # dictionary of attributes as key:value pairs.
    data_vars={'data1': ('time', data1, {'long_name': 'Data 1 values', 'units': 'degC'}),
               'data2': ('time', data2, {'long_name': 'Data 2 values', 'units': 'degF'})
               },
    # This is the coordinate section following the same format. Since this
    # comes next it could be interpredted as positional as coordinates.
    # But we are using keywords to make it easier to understand.
    coords={'time': ('time', time, {'long_name': 'Time in UTC'})},
    # Lastly we have the global attributes.
    attrs={'the_best_animals': 'sharks'}
)

Print out the full Dataset

In [None]:
xr_ds

Print out one DataArray from the Dataset

In [None]:
xr_ds['data1']

Print out values from the one variable in the Dataset

In [None]:
xr_ds['data1'].values

Print out one attribute from one DataArray

In [None]:
xr_ds['data1'].attrs['units']