# Scipp demo

In [None]:
import numpy as np

We create a two-dimensional array `a` containing 4 rows, each containing 3 elements.

In [None]:
a = np.arange(12.).reshape(4, 3)
a

A very common operation is selecting rows or columns of these arrays,
and this is easily achieved by 'slicing' the array using indices inside square brackets.

A number represents a single element along a dimension,
while a colon `:` represents the entire range along that dimension.

A recurring annoyance among users is not being able to remember which dimension represents columns,
and which one represents rows.

E.g. to select the first row, do we have to write `a[:, 0]` or `a[0, :]` (note that row/column indices begin at `0`)?

In the case of our present array, it is often useful to inspect the `shape` of the array:

In [None]:
a.shape

Since we have 4 rows, and each row contains 3 elements,
we know from the `shape` that the first dimension contains 4 entries,
and therefore corresponds to the row dimension.

Hence selecting the first row of elements is done via

In [None]:
a[0, :]

We can also verify that swapping the order of the `0` and the `:` gives us the first column instead:

In [None]:
a[:, 0]

### When both dimensions have the same length

Now consider the case where both dimensions have the same length (6 in this case):

In [None]:
b = np.arange(36.).reshape(6, 6)
b

Looking at the shape of the array no longer provides us with the clue as to which dimension should be sliced to select a row.

It might be ovbious from simply inspecting the values inside the array for this simple example,
but this is not always possible when the values correspond to real data,
and each dimension can reach lengths of several thousands.

So in practise, users can waste a lot of time through trial and error,
trying to find the correct dimension to slice.

## Introducing labeled dimensions

In Scipp, we introduced the concept of labeling each array dimension with a unique identifier,
which both helps identify the rows and the columns, and gives physical meaning to each dimension.

Scipp arrays are basically wrappers around the Numpy arrays.
To create a 2D array with dimensions `x` and `y`, we write

In [None]:
import scipp as sc
c = sc.array(dims=['y', 'x'], values=b)
c

Note the re-use of the `b` Numpy array inside the Scipp array constructor.

Scipp provides small graphical representations of the arrays through the `show` function:

In [None]:
sc.show(c)

The `x` and `y` dimension labels are visible on each side of the square.

With Scipp, slicing is performed by first giving the name/label of the dimension one wishes to slice,
and then the index to be selected in that dimension.

Hence, if I wish to select the first row, I simply have to slice the first element along the `y` dimension

In [None]:
c_slice = c['y', 0]
c_slice

The output above tells me that it is an array with one dimension (`x`), that contains 6 elements,
and the values are `0` to `5`
(you can click on the &#9923; symbol on the right hand side to expand the view onto the values).

In [None]:
sc.show(c_slice)

This slicing syntax has the added benefit that the intention is **not only clear for the person writing the code,
but also for another person reading it**.
It is immediately understandable what the slicing is trying to achieve,
without having to scroll up (possibly a long way) at the start of the notebook,
to look at when the array was first created.

## Physical units

Scipp arrays can also have physical units, and these are automatically handled in operations.

In [None]:
c.unit = 's'
d = sc.array(dims=['y', 'x'], values=np.random.random((6, 6)), unit='m')
d / c

Dividing `d` by `c` gives us an output in units of `m/s`.

Units also prevent users from performing wrong operations,
e.g. trying to add meters and seconds together,
something that can be difficult to detect with Numpy.

In [None]:
c + d

## Adding coordinates

It is also possible to add coordinates to an array,
which label each dimension and give a physical scale to each dimension
(just as the unit of the array gives a scale to the data contained in the array).

Below, we attach `x` and `y` coordinates to our `c` array.
The `x` coordinate ranges from `1` to `6`, while the `y` coordinate ranges from `10` to `100`
(note that coordinates can also have units).

In [None]:
e = sc.DataArray(data=c,
                 coords={'x': sc.linspace(dim='x', start=1., stop=6, num=6, unit='m'),
                         'y': sc.linspace(dim='y', start=10., stop=100, num=6, unit='m')})
sc.show(e)

Coordinates are used for many things,
including preventing operations between two arrays with non-matching coordinates,
which are thus covering different regions of the data space.

Another benefit of coordinates is that the data array now contains enough information to be visualized.
I.e. the data can plot itself.
It knows it has two dimensions, and it knows what scaling to apply to each dimension.

In [None]:
sc.plot(e)