In [None]:
import xarray as xr
import rioxarray
import pint_xarray
import cf_xarray

import xoak
import numpy as np
import pandas as pd

xr.set_options(display_style="text", display_expand_data=False)

<center><h1>recent and upcoming changes to xarray</h1></center>

<p>
<center>Justus Magin</center>
<center>@keewis</center>
</p>

- Recent major changes
- `xarray-contrib`

- Recent major changes
    - backend refactor
    - index refactor
    - datatree
- `xarray-contrib`

- Recent major changes
- `xarray-contrib`
    - cf-xarray
    - pint-xarray
    - ...

## recent and ongoing changes

### backend refactor

up until `xarray=0.17`:
- new file formats are possible, but: everything has to be reimplemented. Like
    - chunking
    - cache
    - lazy indexing (without dask)
- `open_dataset` can't be reused

since `xarray=0.18.0`, new backends can be


- defined as a [backend](https://xarray.pydata.org/en/latest/internals/how-to-add-new-backend.html):

```python
from xarray.backends import BackendEntrypoint

class MyBackendEntrypoint(BackendEntrypoint):
    def open_dataset(self, fn_or_obj, *, drop_vars=None):
        return my_open(fn_or_obj, drop_vars=drop_vars)

    open_dataset_parameters = ["fn_or_obj", "drop_vars"]

    def guess_can_open(self, fn_or_obj):
        ...
```

since `xarray=0.18.0`, new backends can be


- defined as a [backend](https://xarray.pydata.org/en/latest/internals/how-to-add-new-backend.html)
- registered under the `xarray.backends` entrypoints

In [None]:
rio_da = rioxarray.open_rasterio("RGB.byte.tif")
rio_da

In [None]:
ds = xr.open_dataset("RGB.byte.tif", engine="rasterio")
ds

## index refactor

- current state:
    - coordinates ≠ dimension coordinates
    -
    -
- goals
- process

In [None]:
ds = xr.Dataset(coords={"x": ("x", ["a", "b", "c"]), "u": ("x", [1, 2, 3])})
ds

In [None]:
ds.sel(x=["a", "b"])

In [None]:
ds.sel(u=[1, 2])

- current state:
    - coordinates ≠ coordinates on dimensions (dimension coordinates)
    - associated "indexes" must be pandas.Index
    -
- goals
- process

- current state:
    - coordinates ≠ dimension coordinates (coordinates on dimensions)
    - associated "indexes" must be pandas.Index
    - dimension coordinates must be numpy arrays
- goals

- current state
- goals:
    - indexes other than pandas.Index (kdtree, balltree, dask.dataframe.Index, ...)
    - index over multiple coordinates
    - other array types (dask, pint, cupy, ...)
    - indexing operations on all coordinates, not dimension coordinates

### datatree

- netcdf files can have (nested) groups, with different values for coordinates
- represent these groups using a tree-like structure, but similar API as Dataset

→ [datatree](https://github.com/TomNicholas/datatree)

In [None]:
from datatree.tests.test_datatree import create_test_datatree

dt = create_test_datatree()
print(str(dt))

In [None]:
print(str(dt["/set1/set2"]))

In [None]:
print(str(dt.mean()))

## xarray-contrib

- pint-xarray: use pint to convert and work with units
- cupy-xarray: simplify working with cupy arrays (experimental)
- cf-xarray: integrate more closely with the CF conventions
- xpublish: publish a xarray object as a web server
- xoak: extended indexing operations
- xarray-simlab: simulations using xarray
- xskillscore: evaluate forecasts
- sphinx-autosummary-accessors: document accessors as they are called

### cf-xarray

In [None]:
xr.set_options(display_style="html")

In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds.air.attrs["standard_name"] = "air_temperature"
ds

In [None]:
ds.cf

In [None]:
# by standard name
ds.cf["latitude"]

In [None]:
# by axis
ds.cf["Y"]

In [None]:
ds.cf.mean(dim=["T", "latitude"])

### `pint-xarray`

`pint`'s API is centered around the `UnitRegistry`:

In [None]:
import pint

ureg = pint.UnitRegistry()

This class allows the creation of `Unit` and `Quantity` instances:

In [None]:
u = ureg.Unit("m / s")
u

In [None]:
q = ureg.Quantity(4, "s")
q

it also allows the customization of representation:

In [None]:
ureg.default_format = "~P"
display(u)
display(q)

we can easily convert to other units:

In [None]:
q.to("ms")

but only compatible ones:

In [None]:
q.to("m")

the units are automatically propagated:

In [None]:
q1 = ureg.Quantity(5, "m / s ** 2")
q2 = ureg.Quantity(3, "s")
v = q1 * q2
v

and automatically converted where necessary:

In [None]:
ureg.Quantity(36, "degree") + ureg.Quantity(np.pi, "radians")

they will also raise on invalid operations:

In [None]:
ureg.Quantity(36, "degree") + ureg.Quantity(10, "kg")

`pint` can be used directly with `xarray`, but using it directly is difficult. Thus: `pint-xarray`.