<img src='images/scipp-logo.png' width="400" height="400" style="display: block; margin-left: auto; margin-right: auto; width: 640px;">

<div style="text-align: center;">Multi-dimensional data arrays with labeled dimensions</div>
<br>
<br>

**Jan-Lukas Wynen** (<i class="fa-solid fa-envelope"></i> jan-lukas.wynen@ess.eu)
<br>

https://scipp.github.io/

# Data Reduction at ESS

<img src='images/software-stack.svg' style="display: block; margin-left: auto; margin-right: auto; width: 75%;">

### The problem with numpy (1)

Which dimension is which?

In [None]:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = np.r_[np.tile(a[:-1].transpose() + np.array([10, 20]), 2),
          np.array([[0, 0, 0, 1]])]
a

In [None]:
a[0]

Or like this?

In [None]:
a[:, 0]

### Scipp's solution: Labeled dimensions

In [None]:
import scipp as sc
v = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v

In [None]:
v['x', 0]

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = np.r_[np.tile(a[:-1].transpose() + np.array([10, 20]), 2),
          np.array([[0, 0, 0, 1]])]
a

In [None]:
v = sc.array(dims=['x', 'y'], values=[[1, 2, 3], [4, 5, 6], [7, 8, 9]])
v = v['x', :-1].transpose() + sc.array(dims=['x'], values=[10, 20])
v = sc.concat([sc.concat([v, v], 'x'),
               sc.array(dims=['x'], values=[0, 0, 0, 1])],
              'y')
v.values

### The problem with numpy (2)

How are arrays associated?

In [None]:
time = np.array([0, 1, 2, 3])
speed = np.array([0.1, 0.5, 1.3, 0.7])

Is this valid?

In [None]:
np.sum(time * speed)

### Scipp's solution: Data Arrays

In [None]:
da = sc.DataArray(sc.array(dims=['time'], values=speed, unit='m/s'),
                  coords={'time': sc.array(dims=['time'], values=time, unit='min')})
sc.table(da)

### Physcal units prevent mistakes

In [None]:
da

Total distance:

In [None]:
sc.sum(da.data * da.coords['time'])

### Wait, isn't this just xarray?

Sort of, but scipp has

- builtin physical units <font color='#555'> &nbsp; (xarray via pint)</font>
- variances
- non-destructive masks
- bin-edge coordiantes
- binned data

### What about pandas?

`scipp.DataArray` similar to `pandas.DataFrame` but multi-dimensional

In [None]:
da = sc.DataArray(sc.array(dims=['x', 'y'], values=np.random.normal(size=[5, 10])),
                  coords={'x': sc.arange('x', 5),
                          'y': sc.arange('y', 4, 14)})
da

In [None]:
sc.show(da)

### Coordinates prevent mistakes

In [None]:
da

In [None]:
da2 = sc.DataArray(sc.ones(dims=['x'], shape=[5]),
                   coords={'x': sc.arange('x', 1, 6)})
da + da2

### Attributes: unchecked coordinates

In [None]:
da_attr = da.copy()
da_attr.attrs['x'] = da_attr.coords.pop('x')
da_attr

In [None]:
da_attr + da2

### Masks: Ignore elements without removing them 

In [None]:
masked = da.copy()
masked.masks['my_mask'] = sc.array(dims=['x'], values=[True, False, False, True, False])
masked

In [None]:
masked.sum()

In [None]:
da.sum()

### Plotting

In [None]:
masked.plot()

# Questions?

Empty cell to show stuff

## Binned Data

In [None]:
binned = sc.data.binned_x(nevent=100, nbin=4)
binned

In [None]:
sc.show(binned)

### From bins to histogram

In [None]:
histogram = binned.bins.sum()
histogram

In [None]:
sc.show(histogram)

### Multi-dimensional bins

In [None]:
binned_2d = sc.data.binned_xy(nevent=100, nx=3, ny=4)
binned_2d

In [None]:
sc.show(binned_2d)

### Making binned data

In [None]:
events = sc.data.table_xyz(100)
events

In [None]:
x_edges = sc.linspace('x', 0.0, 1.0, 4, unit='m')
binned = sc.bin(events, edges=[x_edges])
sc.show(binned)

### Changing binning

In [None]:
binned

In [None]:
fine_x_edges = sc.linspace('x', 0.0, 1.0, 12, unit='m')
sc.bin(binned, edges=[fine_x_edges])

### Adding edges

In [None]:
y_edges = sc.geomspace('y', 0.1, 0.8, 5, unit='m')
binned_2d = sc.bin(binned, edges=[y_edges])
sc.show(binned_2d)

Equivalently:

In [None]:
sc.bin(binned, edges=[x_edges, y_edges])

### Computation with binned data

In [None]:
dense_x = sc.DataArray(sc.array(dims=['x'], values=[1, 2, 3], unit='1/kg'),
                       coords={'x': x_edges})
dense_x

In [None]:
dense_x * binned_2d

# Questions?

Empty cell to show stuff

# Tutorial

<img src='images/flares.svg' style="display: block; margin-left: auto; margin-right: auto; width: 65%;">