# Slicing

Objects in scipp can be sliced in two ways. The general way to do this is by [positional indexing](#Positional-indexing) using indices as in numpy. 
A second approach is to use [label-based indexing](#Label-based-indexing) which is uses actual coordinate values for selection.

## Positional indexing

Data in a [variable](../generated/scipp.Variable.rst#scipp.Variable), [dataset](../generated/scipp.Dataset.rst#scipp.Dataset) or [data array](../generated/scipp.DataArray.rst#scipp.DataArray) can be indexed in a similar manner to NumPy and xarray.
The dimension to be sliced is specified using a dimension label and, in contrast to NumPy, positional dimension lookup is not available.
Positional indexing with an integer or an integer range is made via `__getitem__` and `__setitem__` with a dimension label as first argument.
This is available for variables, data arrays, datasets, as well as items of a dataset.
In all cases a *view* is returned, i.e., just like when slicing a [numpy.ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray) no copy is performed.

Consider the following variable:

In [None]:
import numpy as np
import scipp as sc

var = sc.array(
    dims=['z', 'y', 'x'],
    values=np.random.rand(2, 3, 4),
    variances=np.random.rand(2, 3, 4))
sc.show(var)

As when slicing a `numpy.ndarray`, the dimension `'x'` is removed since no range is specified:

In [None]:
s = var['x', 1]
sc.show(s)
print(s.dims, s.shape)

When a range is specified, the dimension is kept, even if it has extent 1:

In [None]:
s = var['x', 1:3]
sc.show(s)
print(s.dims, s.shape)

s = var['x', 1:2]
sc.show(s)
print(s.dims, s.shape)

Slicing can be chained arbitrarily:

In [None]:
s = var['x', 1:4]['y', 2]['x', 1]
sc.show(s)
print(s.dims, s.shape)

Slicing for datasets works in the same way, but some additional rules apply:

In [None]:
d = sc.Dataset(
    {'a': sc.array(dims=['x', 'y'], values=np.random.rand(2, 3)),
     'b': sc.array(dims=['y', 'x'], values=np.random.rand(3, 2)),
     'c': sc.array(dims=['x'], values=np.random.rand(2)),
     '0d-data': sc.scalar(1.0)},
    coords={
        'x': sc.array(dims=['x'], values=np.arange(2.0), unit=sc.units.m),
        'y': sc.array(dims=['y'], values=np.arange(3.0), unit=sc.units.m),
        'aux_x': sc.array(dims=['x'], values=np.arange(2.0), unit=sc.units.m),
        'aux_y': sc.array(dims=['y'], values=np.arange(3.0), unit=sc.units.m)})
sc.show(d)

As when slicing a variable, the sliced dimension is removed when slicing without range, and kept when slicing with range.

When slicing a dataset a number of other things happen as well:

- Any data item that does not depend on the sliced dimension is removed.
- Slicing **without range**:
  - The *coordinates* for the sliced dimension are *removed*.
- Slicing **with a range**:
  - The *coordinates* for the sliced dimension are *kept*.

The rationale behind this mechanism is as follows.
We may want to modify slices independently, e.g., by adding an offset to certain slices:

In [None]:
d['x', 0] += 1.0
d['x', 1] += 2.0

By excluding scalar items from the slice view (see below for a visual representation), we prevent unintentional addition of multiple offsets to the same scalar.

This is an important aspect and it is worthwhile to take some time and think through the mechanism.
Consider the following example, contrasting slicing with and without range:

- We slice dimension `'x'`, so the data item `'0d-data'` which does not depend on dimension `'x'` is not visible in the slice views.
- In the second case (without range) the coord for dimension `'x'` is also not part of the slice view

Make sure to inspect the `dims` and `shape` of all variables (data and coordinates) of the resulting slice views (note the tooltip shown when moving the mouse over the name also contains this information):

In [None]:
# Range of length 1
sc.show(d['x', 1:2])
d['x', 1:2]

In [None]:
# No range
sc.show(d['x', 1])
d['x', 1]

Slicing a data item of a dataset should not bring any surprises.
Essentially this behaves like slicing a dataset with just a single data item:

In [None]:
sc.show(d['a']['x', 1:2])

Slicing and item access can be done in arbitrary order with identical results:

In [None]:
d['x', 1:2]['a'] == d['a']['x', 1:2]
d['x', 1:2]['a'].coords['x'] == d.coords['x']['x', 1:2]

## Label-based indexing

### Overview

Data in a [dataset](../generated/scipp.Dataset.rst#scipp.Dataset) or [data array](../generated/scipp.DataArray.rst#scipp.DataArray) can be selected by the coordinate value.
This is similar to pandas [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html).
Scipp leverages its ubiquitous support for physical units to provide label-based indexing in an intuitive manner, using the same syntax as [positional indexing](#Positional-indexing).
For example:

- `array['x', 0:3]` selects positionally, i.e., returns the first three element along `'x'`.
- `array['x', 1.2*sc.units.m:1.3*sc.units.m]` selects by label, i.e., returns the elements along `'x'` falling between `1.2 m` and `1.3 m`.

That is, label-based indexing is made via `__getitem__` and `__setitem__` with a dimension label as first argument and a scalar [variable](../generated/scipp.Variable.rst#scipp.Variable) or a Python `slice()` as created by the colon operator `:` from two scalar variables.
In all cases a *view* is returned, i.e., just like when slicing a [numpy.ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray) no copy is performed.

Consider:

In [None]:
da = sc.DataArray(
    data=sc.array(dims=['year','x'], values=np.random.random((3, 7))),
    coords={
        'x': sc.array(dims=['x'], values=np.linspace(0.1, 0.9, num=7), unit=sc.units.m),
        'year': sc.array(dims=['year'], values=[2020,2023,2027])})
sc.show(da)
da

We can select a slice of `da` based on the `'year'` labels:

In [None]:
year = sc.scalar(2023)
da['year', year] 

In this case `2023` is the second element of the coordinate so this is equivalent to positionally slicing `data['year', 1]` and [the usual rules](#Positional-indexing) regarding dropping dimensions and converting dimension coordinates to attributes apply:

In [None]:
assert sc.is_equal(da['year', year], da['year', 1])

<div class="alert alert-warning">
    
**Warning**

It is **essential** to not mix up integers and scalar scipp variables containing an integer.
As in above example, positional indexing yields different slices than label-based indexing.
    
</div>

<div class="alert alert-info">

Here, we created `year` using `sc.scalar`.
Alternatively, we could use `year = 2023 * sc.units.dimensionless` which is useful for dimensionful coordinates like `'x'` in this case, see below.
    
</div>

For floating-point-valued coordinates selecting a single point would require an exact match, which is typically not feasible in practice.
Scipp does *not* do fuzzy matching in this case, instead an `IndexError` is raised:

In [None]:
x = 0.23 * sc.units.m # No x coordinate value at this point. Equivalent of sc.scalar(0.23, unit=sc.units.m)
try:
    da['x', x]
except IndexError as e:
    print(str(e))

For such coordinates we may thus use an *interval* to select a *range* of values using the `:` operator:

In [None]:
x_left = 0.1 * sc.units.m
x_right = 0.4 * sc.units.m
da['x', x_left:x_right]

The selection includes the bounds on the "left" but excludes the bounds on the "right", i.e., we select the half-open interval $x \in [x_{\text{left}},x_{\text{right}})$, closed on the left and open on the right.

The half-open interval implies that we can select consecutive intervals without including any data point in both intervals:

In [None]:
x_mid = 0.2 * sc.units.m
sc.to_html(da['x', x_left:x_mid])
sc.to_html(da['x', x_mid:x_right])

Just like when slicing positionally one of the bounds can be omitted, to include either everything from the start, or everything until the end:

In [None]:
da['x', :x_right]

Coordinates used for label-based indexing must be monotonically ordered.
While it is natural to think of slicing in terms of ascending coordinates, the slicing mechanism also works for descending coordinates.

### Bin-edge coordinates

Bin-edge coordinates are handled slightly differently from standard coordinates in label-based indexing.
Consider:

In [None]:
da = sc.DataArray(
    data = sc.array(dims=['x'], values=np.random.random(7)),
    coords={
        'x': sc.array(dims=['x'], values=np.linspace(1.0, 2.0, num=8), unit=sc.units.m)})
da

Here `'x'` is a bin-edge coordinate, i.e., its length exceeds the array dimensions by one.
Label-based slicing with a single coord value finds and returns the bin that contains the given coord value:

In [None]:
x = 1.5 * sc.units.m
da['x', x]

If an interval is provided when slicing with a bin-edge coordinate, the range of bins *containing* the interval bounds (*including* the left as well as the right bin) is selected:

In [None]:
x_left = 1.3 * sc.units.m
x_right = 1.7 * sc.units.m
da['x', x_left:x_right]

### Limitations

Label-based indexing *not* supported for:

- Multi-dimensional coordinates.
- Non-monotonic coordinates.

The first is a fundamental limitation since a slice cannot be defined in such as case.
The latter two will likely be supported in the future to some extent.