# Table of content
- DataArray & Dataset
- Dimensions & Coordinates
- Indexing
- `xarray.where` Array selection using arbitrary conditions

# Introduction of xarray

xarray is a python package for processing netcdf data. Xarray is helpful for processing meteorological data stored in the netcdf format. 

## Dependencies
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html

xarray requires at least
- `numpy`
- `pandas`
- `packaging`.

There are a lot more functionalities which are available if further dependencies are fulfilled. Here is an incomplete list of packages that I found useful:
- `dask`: lazy loading of larger datasets. Operations are collected and executed on demand only.
- `netCDF4`
- `scipy`
- `cftime`: to process dates
- `matplotlib`, `cartopy`


## Frequent issues
### numpy version
Reading netcdf files can fail if a rather new `xarray` version meets an older `numpy` version. 
The only solution here is to create a venv and install a newer numpy inside it. 

#### netcdf library
You are in trouble if the devel packages for the netcdf C library on SuSe are not installed. 

### Difference between coordinates and dimensions
xarray differentiates strictly between so called dimensions and coordinates. Depending on the tool which created a netcdf file, it can be necessary to do reformatting of the dataset until you can use many xarray functionalities. 

A dimension has a name and indicates how many entries are present in the dimension, e.g. `ncells = 600000`. 

A coordinate is e.g. the latitude. A coordinate has one or more dimensions. 

Many of xarray's plot functionalities work either on coordinates or dimensions. 

## inplace or not inplace?
"Inplace" means that an object's function changes the object itself. Many functions in xarray are not inplace. This means you have to store the output when apllying a function to an object. 

In [1]:
import numpy as np
import datetime as dt
import xarray
import matplotlib.pyplot as plt

# xarray Basics

## Create a Dataset & DataArray and select data

-> simple examples. This us helpful if you have data from somewhere and want to continue processing this data with xarray. 

Source: https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html

In [2]:
dataArray = xarray.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})

In [3]:
dataArray

In [4]:
dataSet = xarray.Dataset(dict(foo=dataArray, bar=("x", [1, 2]), baz=np.pi))

In [5]:
dataSet

In [6]:
dataSet.foo

In [7]:
dataSet['x'] # key syntax

In [8]:
dataSet.foo.dims

('x', 'y')

In [9]:
dataSet.foo.coords

Coordinates:
  * x        (x) int64 16B 10 20

In [10]:
# add some new coordinates with dimension x
# Caution: assign_coords does not work on the array itself (inplace), you have to save the output!
dataSet = dataSet.assign_coords(dict(**dataSet['foo'].coords, lat=('x',np.array([50,70]))))
dataSet

In [11]:
dataSet.drop_vars(['baz']) # remove baz 

In [12]:
dataSet.drop_vars(['foo','bar'])

In [13]:
dataSet.data_vars

Data variables:
    foo      (x, y) float64 48B -0.6542 0.0301 -1.072 0.08027 -1.206 0.8441
    bar      (x) int64 16B 1 2
    baz      float64 8B 3.142

In [14]:
type(dataSet['foo'])

xarray.core.dataarray.DataArray

In [15]:
type(dataSet['foo'].values)

numpy.ndarray

## Indexing

### isel: select by index, this selecting the n'th entry of a dimension

In [16]:
dataSet.isel(dict(y=[1],x=[0]))

In [17]:
dataSet.isel(dict(y=[1],x=0)) # drops x Dimension! 

In [18]:
dataSet.isel(dict(y=[1],x=0)).drop_dims('y') # explicitely drop dimension y

### sel: Select by coordinate

In [19]:
dataSet.sel(x=[20])

### boolean array

In [20]:
dataArray.shape

(2, 3)

In [21]:
msk = np.repeat(np.array([[True,False]]),3).reshape((dataArray.shape))
msk

array([[ True,  True,  True],
       [False, False, False]])

In [22]:
dataArray

In [23]:
dataArray.values[msk] # numpy array index

array([-0.65421368,  0.03009868, -1.071831  ])

### xarray.where

In [24]:
xa_msk = xarray.DataArray(msk,coords=dataArray.coords)
xa_msk

In [25]:
dataArray.where(xa_msk)

In [26]:
dataSet.where(dataSet.foo>0)

In [27]:
dataSet.foo>0

In [28]:
xarray.where(dataSet.foo>0,dataSet.foo, np.inf) # 1st argument is a boolean xarray

In [29]:
xarray.where(dataSet.foo>0,7, np.inf) 