In [None]:
import dask.array as da
import numpy as np
import pint
import sparse
import xarray as xr

rng = np.random.default_rng()
ureg = pint.UnitRegistry()

<center><h1>duck arrays in xarray</h1></center>

<p>
<center>Justus Magin / @keewis</center>
</p>

- what are duck arrays?
- integration status
- additional support functions

## what are duck arrays?

*duck array*:
> Duck arrays are array implementations that look and behave like numpy arrays.

*array_like* (from [numpy](https://numpy.org/doc/1.20/glossary.html#term-array-like)):
> Any argument accepted by `numpy.array` is *array_like*.

xarray requires:
- properties: `ndim`, `shape`, `dtype`
- protocols:
    - `__array__`: `np.array`
    - `__array_ufunc__`: ufuncs like `np.sin`
    - `__array_function__`: most of remaining `numpy` API

- protocols:
    - future: `__array_module__` (NEP37) or `__array_namespace__` (NEP47)

examples for duck arrays:
- `cupy`: CUDA-accelerated version of numpy
- (`dask`)
- `pint`: array with physical units
- `sparse`: multi-dimensional sparse arrays

## integration status

- tests for `pint` and `sparse`

- for `pint`: most methods keep the data as duck array

- but ...

exceptions:

- indexing (only an issue with dimension coordinates):
    - `sel`, `loc`
    - `drop_sel`
    - `reindex` / `reindex_like`

exceptions:

- functionality based on external libraries:
  - `scipy` (`interp` / `interp_like`, `interpolate_na`)
  - `numbagg` (`rolling_exp`)
  - `bottleneck` (`ffill`, `bfill`)

exceptions:

- functionality in `numpy`
    - `numpy.vectorize` (`interpolate_na`, `apply_ufunc` with `vectorize=True`)

each duck array implementation behaves differently 

→ support status unclear

### duck array testing framework

**`xarray.tests.duckarrays.base`**

- separated into different parts of `xarray`'s API (using subclasses)

- using `hypothesis` (fuzzer)

- can be used to check support for duck arrays or nested duck arrays

examples for nested duck arrays:
  - `pint(dask)`
  - `dask(sparse)`
  - `pint(dask(sparse))`

## additional support functionality: `_repr_inline_`

In [None]:
xr.set_options(display_expand_data=False)

In [None]:
def format_array(x, max_width=50):
    arr = xr.DataArray(x)
    return xr.core.formatting.inline_variable_array_repr(
        arr.variable, max_width=max_width
    )

- `repr` of duck arrays should contain `shape`, `dtype`, and specific metadata

- but: `xarray` variable repr contains `dims` and `dtype`

In [None]:
arr = xr.DataArray(
    data=np.arange(200).reshape(20, 10),
    coords={
        "a": ("x", ureg.Quantity(np.ones(20), "m")),
        "b": ("y", da.zeros(10, chunks=5)),
    },
    dims=("x", "y"),
)
print(str(arr))

hard-coded custom reprs for `dask` and `sparse`:

In [None]:
arr = sparse.random(shape=(100, 100), random_state=0)

print(repr(arr))

In [None]:
print(format_array(arr))

In [None]:
arr = da.ones(shape=(100, 100), chunks=(10, 10))
print(repr(arr))

In [None]:
print(format_array(arr))

```python
def _repr_inline_(self, max_width):
    pass
```

In [None]:
arr = ureg.Quantity(np.ones(shape=(100, 100)), "m")
print(format_array(arr))

In [None]:
import pint_xarray  # noqa: F401

In [None]:
arr = ureg.Quantity(np.ones(shape=(100, 100)), "m")
print(format_array(arr))

<p>
<center><big><strong>Thank you!</strong></big></center>
</p>

In summary:
- duck arrays: arrays implementing the `numpy` interface
- most of the `xarray` API supports duck arrays
- support for a specific duck array: tests are needed