# Multi-dimensional datasets

This is the continuation of [Introduction to Scipp](introduction.ipynb).

## Creation, slicing, and visualization

In [None]:
import numpy as np
import scipp as sc
from scipp import Dim

To create variables with more than one dimension we specify a list of dimension labels and provide data with a corresponding shape. When inserted into a dataset it is important to note that while the dimensions extents have to match, individual variables may have transposed memory layout.

In [None]:
d = sc.Dataset()
d.set_coord(Dim.X, sc.Variable([sc.Dim.X], values=np.arange(11.0), unit=sc.units.m))
d.set_coord(Dim.Y, sc.Variable([sc.Dim.Y], values=np.arange(11.0), unit=sc.units.m))
d.set_coord(Dim.Z, sc.Variable([sc.Dim.Z], values=np.arange(11.0), unit=sc.units.m))
d["alice"] = sc.Variable([Dim.Z, Dim.Y, Dim.X], values=np.random.rand(10, 10, 10), variances=0.1*np.random.rand(10, 10, 10))
d["bob"] = sc.Variable([Dim.X, Dim.Z], values=np.arange(0.0, 10.0, 0.1).reshape(10, 10), variances=0.1*np.random.rand(10, 10))

Note that in this example the coordinates are exceeding the shape of the data by 1.
 This means that the coordinates represent bin edges:

In [None]:
d

To slice in multiple dimensions, we can simply chain the slicing notation used previously for 1D data.
 This gives us a number of different options for visualizing our data:

In [None]:
sc.table(d[Dim.X, 5][Dim.Z, 2])

We are able to plot individual elements of a `Dataset` by doing

In [None]:
sc.plot(d["bob"])

You can also plot the standard deviations alongside the values with

In [None]:
sc.plot(d["bob"], show_variances=True)

Plotting a 3-dimensional data cube will show a 2D image with a slider to navigate through the third dimension

In [None]:
sc.plot(d["alice"])

Finally, by extracting a 1D variable, we obtain a 1D plot

In [None]:
sc.plot(d[Dim.X, 8][Dim.Y, 2])

Note that this is now plotted as a histogram since the coordinate in the dataset is bin edges, in contrast to the 1D data plotted in part 1.

Operations automatically broadcast based on dimension labels. In contrast to `numpy` or `MATLAB` there is no need to keep track of dimension order.

In [None]:
d["alice"] -= d["bob"]
d["alice"] -= d["alice"][Dim.Y, 5]
sc.plot(d["alice"])

### Exercise 1

 Remove the surface layer of the volume, i.e., remove the first and last slice in each of the dimensions.

### Solution 1

In [None]:
d = d[Dim.X, 1:-1][Dim.Y, 1:-1][Dim.Z, 1:-1].copy()
d

Note the important call to `copy()`.
If we omit it, `d` will just be a multi-dimensional slice of the larger volume (which is kept alive), wasting memory and preventing further modification, such as insertion of other variables.

## More advanced operations with multi-dimensional datasets
Operations like `concatenate` and `sort` work just like with one-dimensional datasets.

### Exercise 2
- Try to concatenate the dataset with itself along the X dimensions. Why does this fail?
- Make a copy of the dataset, add an offset to the X coordinate to fix the issue, and try to concatenate again.

### Solution 2

In [None]:
try:
    d = sc.concatenate(d, d, Dim.X)
except RuntimeError:
    print("Failed as expected!")

With a data extent of, e.g. `8` in this case, bin edges have extent `9`.
Naive concatenation would thus lead a new data extent of `16` and a coordinate extent of `18`, which is meaningless and thus prevented.
In this `concatenate` merges the last edge of the first input with the first edge of the second input, if compatible.

In [None]:
offset = d.copy()
offset.coords[Dim.X] += sc.Variable(8.0, unit=sc.units.m)
combined = sc.concatenate(d, offset, Dim.X)
sc.plot(combined['alice'])

Another available operation is `rebin`.
 This is only for count-data or count-density-data, so we have to set an appropriate unit first:

In [None]:
new_x = sc.Variable([Dim.X], values=d.coords[Dim.X].values[::2])
d['alice'].unit = sc.units.counts
d['bob'].unit = sc.units.counts
d = sc.rebin(d, new_x)
d

## Interaction with `numpy`
 Variable in a dataset are exposed in a `numpy`-compatible buffer format, so we can directly hand them to `numpy` functions.

In [None]:
d['alice'] = np.sin(d['alice'])

Direct access to the `numpy`-like underlying data array is possible using the `values` property. This is now a multi-dimensional array:

In [None]:
d['alice'].values

### Exercise 3
 1. Use `ds.mean` to compute the mean of the data for Alice along the Z dimension.
 2. Do the same with `numpy`, what are the complications you encounter, that are not present when using the dataset?

### Solution 3

In [None]:
help(sc.mean)

In [None]:
mean = sc.mean(d['alice'], Dim.Z)

When using `numpy` to compute the mean:
- We must remember (or lookup) which dimension corresponds to the Z dimensions.
- We need a separate call for values and variances.
- We need to manually scale the variance with the inverse square of the number of data points to get the variance of the mean.

In [None]:
np_value = np.mean(d['alice'].values, axis=0)
np_variance = np.mean(d['alice'].variances, axis=0)
np_variance /= np.sqrt(d.dimensions[Dim.Z])

Continue to [Part 3 - Neutron data](neutron-data.ipynb) to see how datasets are used with neutron-event data.