# Slicing

Objects in scipp can be sliced in two ways. The general way to do this is by `"positional indexing"` using indices that that denote position. A second, more limited approach, is to use `"location indexing"` which is based on coordinate values.

## Positional indexing

Data in a [variable](../generated/scipp.Variable.rst#scipp.Variable), [dataset](../generated/scipp.Dataset.rst#scipp.Dataset) or [data array](../generated/scipp.DataArray.rst#scipp.DataArray) can be indexed in a similar manner to NumPy and xarray.
The dimension to be sliced is specified using a dimension label and, in contrast to NumPy, positional dimension lookup is not available.
Positional indexing with an integer or an integer range is made via `__getitem__` and `__setitem__` with a dimension label as first argument.
This is available for variables, data arrays, datasets, as well as items of a dataset.
In all cases a *view* is returned, i.e., just like when slicing a [numpy.ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray) no copy is performed.

Consider the following variable:

In [None]:
import numpy as np
import scipp as sc

var = sc.Variable(
    dims=['z', 'y', 'x'],
    values=np.random.rand(2, 3, 4),
    variances=np.random.rand(2, 3, 4))
sc.show(var)

As when slicing a `numpy.ndarray`, the dimension `'x'` is removed since no range is specified:

In [None]:
s = var['x', 1]
sc.show(s)
print(s.dims, s.shape)

When a range is specified, the dimension is kept, even if it has extent 1:

In [None]:
s = var['x', 1:3]
sc.show(s)
print(s.dims, s.shape)

s = var['x', 1:2]
sc.show(s)
print(s.dims, s.shape)

Slicing can be chained arbitrarily:

In [None]:
s = var['x', 1:4]['y', 2]['x', 1]
sc.show(s)
print(s.dims, s.shape)

Slicing for datasets works in the same way, but some additional rules apply:

In [None]:
d = sc.Dataset(
    {'a': sc.Variable(dims=['x', 'y'], values=np.random.rand(2, 3)),
     'b': sc.Variable(dims=['y', 'x'], values=np.random.rand(3, 2)),
     'c': sc.Variable(dims=['x'], values=np.random.rand(2)),
     '0d-data': sc.Variable(1.0)},
    coords={
        'x': sc.Variable(['x'], values=np.arange(2.0), unit=sc.units.m),
        'y': sc.Variable(['y'], values=np.arange(3.0), unit=sc.units.m),
        'aux_x': sc.Variable(['x'], values=np.arange(2.0), unit=sc.units.m),
        'aux_y': sc.Variable(['y'], values=np.arange(3.0), unit=sc.units.m)})
sc.show(d)

As when slicing a variable, the sliced dimension is removed when slicing without range, and kept when slicing with range.

When slicing a dataset a number of other things happen as well:

- Any data item that does not depend on the sliced dimension is removed.
- Slicing **without range**:
  - The *coordinates* for the sliced dimension are *removed*.
- Slicing **with a range**:
  - The *coordinates* for the sliced dimension are *kept*.

The rationale behind this mechanism is as follows.
We may want to modify slices independently, e.g., by adding an offset to certain slices:

In [None]:
d['x', 0] += 1.0
d['x', 1] += 2.0

By excluding scalar items from the slice view (see below for a visual representation), we prevent unintentional addition of multiple offsets to the same scalar.

This is an important aspect and it is worthwhile to take some time and think through the mechanism.
Consider the following example, contrasting slicing with and without range:

- We slice dimension `'x'`, so the data item `'0d-data'` which does not depend on dimension `'x'` is not visible in the slice views.
- In the second case (without range) the coord for dimension `'x'` is also not part of the slice view

Make sure to inspect the `dims` and `shape` of all variable (data and coordinates) of the resulting slice views (note the tooltip shown when moving the mouse over the name also contains this information):

In [None]:
# Range of length 1
sc.show(d['x', 1:2])
d['x', 1:2]

In [None]:
# No range
sc.show(d['x', 1])
d['x', 1]

Slicing a data item of a dataset should not bring any surprises.
Essentially this behaves like slicing a dataset with just a single data item:

In [None]:
sc.show(d['a']['x', 1:2])

Slicing and item access can be done in arbitrary order with identical results:

In [None]:
d['x', 1:2]['a'] == d['a']['x', 1:2]
d['x', 1:2]['a'].coords['x'] == d.coords['x']['x', 1:2]

## Label based indexing

Data in a [dataset](../generated/scipp.Dataset.rst#scipp.Dataset) or [data array](../generated/scipp.DataArray.rst#scipp.DataArray) can be sliced by the coordinate value. This is similar to pandas [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html).
Label based indexing with a [variable](../generated/scipp.Variable.rst#scipp.Variable)  or an [variable](../generated/scipp.Variable.rst#scipp.Variable) range is made via `__getitem__` and `__setitem__` with a dimension label as first argument.
This is available for data arrays and datasets, as well as items of a dataset, but not for variables as coordinates are necessary for this to work on any data structure.
In all cases a *view* is returned, i.e., just like when slicing a [numpy.ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray) no copy is performed.



Lets create a simple `data array`. It has dimension coordinates in `x` and `y`:

In [None]:
da = sc.DataArray(data = sc.Variable(dims=['x', 'y'], values=np.arange(8.0).reshape(2, 4)),
    coords={
        'x': sc.Variable(['x'], values=np.flip(np.arange(2.0)), unit=sc.units.m),
        'y': sc.Variable(['y'], values=np.arange(4.0), unit=sc.units.m)})
sc.show(da)

Lets say we wish to slice the `data array` where y coord is exactly at the point 1.0 meters expressed as a variable. You will notice that the data item in the output have no y coordinate dependency after the point slicing operation.

In [None]:
point_value = 0.0 * sc.units.m
da['y', point_value] 

Our y-coordinate is represents point data rather than bin edges for the dataset data. This means that when slicing with a point value, the value needs to exactly match a coordinate value. If it does not, an `IndexError` is raised.

In [None]:
point_value = 0.1 * sc.units.m # No y coordinate value at this point
try:
    d['y', point_value]
except IndexError as e:
    print(str(e))

We can also slice a range just like index slicing above. No exact match to coordinate value of start/stop is required in the case of range slicing. The selection includes the bounds on the left but excludes the bounds on the right (`closed` on the left and `open` on the right). so for acending coordinates `lb <= x < ub`,  where `x` is the coordinate value and `lb, ub` are the lower and upper bounds of the range respectively. This can also be described by the half-open interval `[lb,up)`.

In [None]:
lb = 0.0 * sc.units.m
up = 1.0 * sc.units.m
d['y', lb:up]

And because upper bounds are excluded both slice operations below yield a single, equal y-coord value slice:

In [None]:
first_coord = da.coords['y']['y', 0]
next_coord = da.coords['y']['y', 1]
mid_coord = (next_coord - first_coord) / 2.0
assert sc.is_equal(d['y', first_coord:next_coord],  d['y', first_coord:mid_coord])

Coordinates used for label based indexing must be monotonically ordered. While it is natural to think of slicing in terms of ascending coordinates, the slicing mechanism also works for descending coordinates. The `x` coordinate of the `data array` is descending.

In [None]:
start = 1.0 * sc.units.m
assert sc.is_equal(da['x', start:], da)

### Label based indexing on bin edge coordinates

As mentioned above, label based indexing is valid for `datasets` and `data arrays` We construct a simple `data array` to illustrate the working. In this `data array` the y-coordinate is bin edges.

In [None]:
da = sc.DataArray(data = sc.Variable(dims=['x', 'y'], values=np.arange(8.0).reshape(2, 4)),
    coords={
        'x': sc.Variable(['x'], values=np.arange(2.0), unit=sc.units.m),
        'y': sc.Variable(['y'], values=np.arange(5.0), unit=sc.units.m)})
sc.show(da)

label based slicing without range will find and return the bin that contains the given coord value.

In [None]:
mid_point = 0.5 * sc.units.m
da['y', mid_point]

Range based slicing is also valid on bin edge coordinates.

In [None]:
start = 1.1 * sc.units.m
sc.show(da['y', start:])

### Label based indexing caveats 

At the time of writing label based indexing is only possible for certain conditions.

* The coordinate selected for slicing must be monotonic, otherwise it will need to be sorted first
* The coordinate cannot be multi-dimensional 