# Selecting & indexing

In most cases, the powerful data manipulation and indexing methods provided by {mod}`xarray` are sufficient; see the [corresponding section of the xarray documentation](https://docs.xarray.dev/en/stable/user-guide/indexing.html).

In this guide, we will briefly cover some frequently used {mod}`xarray` features and introduce some additional methods provided by ERLabPy.

## Basic `xarray` operations


In [None]:
import xarray as xr

xr.set_options(display_expand_data=False)

First, let us generate some example data: a simple tight binding simulation of
graphene-like bands with an exaggerated lattice constant.

In [None]:
from erlab.io.exampledata import generate_data

dat = generate_data(seed=1).T

In [None]:
dat

We have a three-dimensional array of intensity given in terms of $k_x$, $k_y$, and
binding energy. 


Let's extract a cut along $k_y = 0.3$.

In [None]:
dat.sel(ky=0.3, method="nearest").plot()

Likewise, the Fermi surface can be extracted like this:

In [None]:
dat.sel(eV=0.0, method="nearest").plot()

You can also pass {class}`slice` objects to {meth}`sel <xarray.DataArray.sel>` to
effectively crop the data. 

In [None]:
cut = dat.sel(ky=0.3, method="nearest")
cut.sel(kx=slice(-0.2, 0.8), eV=slice(-0.25, 0.05)).plot()

In many scenarios, it is necessary to perform integration across multiple indices. This can be done by slicing and then averaging. The following code returns a new DataArray with the intensity integrated over a window of 50 meV centered at $E_F$.

In [None]:
dat.sel(eV=slice(-0.025, 0.025)).mean("eV")

However, doing this every time is cumbersome, and we have lost the coordinate `eV`. In
the following sections, we introduce some utilities for convenient indexing.

## The `qsel` accessor

ERLabPy adds many useful extensions to xarray objects in the form of {mod}`accessors <erlab.accessors>`.

### Advanced selection

One is the {meth}`xarray.DataArray.qsel` DataArray accessor, which streamlines the slicing and averaging process described above. It can be used like native DataArray methods:

In [None]:
dat.qsel(eV=0.0, eV_width=0.05)

Note that the averaged coordinate `eV` is automatically added to the data array. This
is useful for further analysis.

With {meth}`xarray.DataArray.qsel`, position along a dimension
can be specified in three ways:

- As a value and width: `eV=-0.1, eV_width=0.05`

  The data is *averaged* over a slice of width `0.05`, centered at `-0.1` along the
  dimension `'eV'`.

- As a scalar value: `eV=0.0`

  If no width is specified, the data is selected along the nearest value. It is
  equivalent to passing `method='nearest'` to {meth}`xarray.DataArray.sel`.

- As a slice: `eV=slice(-0.2, 0.05)`

  The data is selected over the specified slice. No averaging is performed.

The arguments can either be provided in a key-value form, or as a single dictionary.

Unlike {meth}`xarray.DataArray.sel`, all of this can be combined in a single call:

In [None]:
dat.qsel(kx=slice(-0.3, 0.3), ky=0.3, eV=0.0, eV_width=0.05)

:::{note}

In practice, you can generate the arguments for {meth}`xarray.DataArray.qsel` that reproduces the slice shown in [ImageTool](./imagetool.md) from the right-click menu of each plot.

:::

### Averaging data within a distance

To average data over all data points within a certain distance of a given point, the method {meth}`xarray.DataArray.qsel.around` can be used.

The following code plots the integrated EDCs near the K point ($k_x\sim$ 0.52 Å $^{-1}$, $k_y\sim$ 0.3 Å $^{-1}$) for different radii.

In [None]:
for radius in (0.03, 0.06, 0.09, 0.12):
    dat.qsel.around(radius, kx=0.52, ky=0.3).plot()

### Averaging across dimensions

Taking a mean across multiple dimensions is a common operation, and can be performed easily with {meth}`xarray.DataArray.mean`. However, it is often necessary to preserve the coordinate information of the averaged dimension. In this case, {meth}`xarray.DataArray.qsel.average` can be used.

The following code first selects the data around the Fermi level, and calculates the average of the intensity over the energy axis. The coordinate `eV` is preserved in the resulting DataArray.

In [None]:
dat.sel(eV=slice(-0.05, 0.05)).qsel.average("eV")

## Masking

ERLabPy provides a way to mask data with arbitrary polygons.

:::{admonition} Work in Progress
:class: warning

This part of the user guide is still under construction. For now, see the API reference at {mod}`erlab.analysis.mask`. For the full list of packages and modules provided by ERLabPy, see [API Reference](../reference)

:::

Interpolation
-------------

In addition to the [powerful interpolation methods
](https://docs.xarray.dev/en/latest/user-guide/interpolation.html) provided by
{mod}`xarray`, ERLabPy provides a convenient way to interpolate data along an arbitrary
path.

Consider a Γ-M-K-Γ high symmetry path given as a list of `kx` and `ky` coordinates:

In [None]:
import erlab.plotting as eplt
import matplotlib.pyplot as plt
import numpy as np

a = 6.97
kx = [0, 2 * np.pi / (a * np.sqrt(3)), 2 * np.pi / (a * np.sqrt(3)), 0]
ky = [0, 0, 2 * np.pi / (a * 3), 0]


dat.qsel(eV=-0.2).qplot(aspect="equal", cmap="Greys")
plt.plot(kx, ky, "o-")

The following code interpolates the data along this path with a step of 0.01 Å $^{-1}$ using {func}`slice_along_path <erlab.analysis.interpolate.slice_along_path>`.

In [None]:
import erlab.analysis as era

dat_sliced = era.interpolate.slice_along_path(
    dat, vertices={"kx": kx, "ky": ky}, step_size=0.01
)
dat_sliced

We can see that the data has been interpolated along the path. The new coordinate `path` contains the distance along the path, and the dimensions `kx` and `ky` are now expressed in terms of `path`.

The distance along the path can be calculated as the sum of the distances between consecutive points in the path.

In [None]:
dat_sliced.qplot(cmap="Greys")
eplt.fermiline()

# Distance between each pair of consecutive points
distances = np.linalg.norm(np.diff(np.vstack([kx, ky]), axis=-1), axis=0)
seg_coords = np.concatenate(([0], np.cumsum(distances)))

plt.xticks(seg_coords, labels=["Γ", "M", "K", "Γ"])
plt.xlim(0, seg_coords[-1])
for seg in seg_coords[1:-1]:
    plt.axvline(seg, ls="--", c="k", lw=1)

:::{note}

The {meth}`xarray.DataArray.qplot` method used to plot the data is an accessor that enables convenient plotting. You will learn more about it in the next section.

:::