<img src='images/scipp-logo.png' width="400" height="400" style="display: block; margin-left: auto; margin-right: auto; width: 640px;">

<div style="text-align: center;">Multi-dimensional data arrays with labeled dimensions</div>
<br>
<br>

**Jan-Lukas Wynen** (<i class="fa-solid fa-envelope"></i> jan-lukas.wynen@ess.eu)
<br>

https://scipp.github.io/

# Data Reduction at ESS

<img src='images/software-stack.svg' style="display: block; margin-left: auto; margin-right: auto; width: 75%;">

### Developed at ESS w/ in-kind from ISIS

Most recent contributors:

| ESS |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| ISIS / Tessella |
|-----||------|
| Simon Heybrock  || Matthew Andrews |
| Neil Vaytet || Own Arnold |
| Jan-Lukas Wynen || Matthew Jones |
| || Samuel Jones |
| || Dan Nixon |
| || Tom Willemsen |

And More ...

### The problem with numpy (1)

Which dimension is which?

In [1]:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = np.r_[np.tile(a[:-1].transpose() + np.array([10, 20]), 2),
          np.array([[0, 0, 0, 1]])]
a

array([[11, 24, 11, 24],
       [12, 25, 12, 25],
       [13, 26, 13, 26],
       [ 0,  0,  0,  1]])

In [2]:
a[0]

array([11, 24, 11, 24])

Or like this?

In [3]:
a[:, 0]

array([11, 12, 13,  0])

### Scipp's solution: Labeled dimensions

In [4]:
import scipp as sc
v = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v

In [5]:
v['x', 0]

In [6]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a = np.r_[np.tile(a[:-1].transpose() + np.array([10, 20]), 2),
          np.array([[0, 0, 0, 1]])]
a

array([[11, 24, 11, 24],
       [12, 25, 12, 25],
       [13, 26, 13, 26],
       [ 0,  0,  0,  1]])

In [7]:
v = sc.array(dims=['x', 'y'], values=[[1, 2, 3], [4, 5, 6], [7, 8, 9]])
v = v['x', :-1].transpose() + sc.array(dims=['x'], values=[10, 20])
v = sc.concat([sc.concat([v, v], 'x'),
               sc.array(dims=['x'], values=[0, 0, 0, 1])],
              'y')
v.values

array([[11, 24, 11, 24],
       [12, 25, 12, 25],
       [13, 26, 13, 26],
       [ 0,  0,  0,  1]])

In [38]:
v['x', 0].values

array([11, 12, 13,  0])

### The problem with numpy (2)

How are arrays associated?

In [42]:
time = np.array([0, 1, 2, 3])
speed = np.array([0.1, 0.5, 1.3, 0.7])

Is this valid?

In [43]:
np.sum(time * speed)

5.199999999999999

### Scipp's solution: Data Arrays

In [44]:
speed_vs_time = sc.DataArray(sc.array(dims=['time'], values=speed, unit='m/s'),
                             coords={
                                 'time': sc.array(dims=['time'], values=time, unit='min')
                             })
sc.table(speed_vs_time)

VBox(children=(HTML(value='<style id="scipp-style-sheet">.sc-root{--sc-background-color0:var(--jp-layout-color…

In [45]:
speed_vs_time2 = sc.DataArray(sc.array(dims=['time'], values=speed, unit='m/s'),
                              coords={
                                  'time': sc.array(dims=['time'], values=time, unit='min'),
                                  'position': sc.array(dims=['time'], values=[-2, -1, 1, 1.5], unit='m')
                              })
sc.table(speed_vs_time2)

VBox(children=(HTML(value='<style id="scipp-style-sheet">.sc-root{--sc-background-color0:var(--jp-layout-color…

### Physcal units prevent mistakes

In [11]:
speed

Total distance:

In [12]:
sc.sum(speed.data * speed.coords['time'])

### Wait, isn't this just xarray?

Sort of, but scipp has

- builtin physical units <font color='#555'> &nbsp; (xarray via pint)</font>
- variances
- non-destructive masks
- bin-edge coordiantes
- binned data

### What about pandas?

`scipp.DataArray` similar to `pandas.DataFrame` but multi-dimensional

In [58]:
da = sc.DataArray(sc.array(dims=['x', 'y'], values=np.random.normal(size=[5, 10]), unit='m'),
                  coords={'x': sc.arange('x', 5),
                          'y': sc.arange('y', 4, 14, unit='s')})
da

In [53]:
sc.show(da)

### Coordinates prevent mistakes

In [54]:
da

In [59]:
da2 = sc.DataArray(sc.ones(dims=['x'], shape=[5], unit='m'),
                   coords={'x': sc.arange('x', 1, 6)})
da + da2

CoordError: Mismatch in coordinate 'x' in operation 'add':
(x: 5)      int64  [dimensionless]  [0, 1, ..., 3, 4]
vs
(x: 5)      int64  [dimensionless]  [1, 2, ..., 4, 5]

### Attributes: unchecked coordinates

In [60]:
da_attr = da.copy()
da_attr.attrs['x'] = da_attr.coords.pop('x')
da_attr

In [61]:
da_attr + da2

### Masks: Ignore elements without removing them 

In [62]:
masked = da.copy()
masked.masks['my_mask'] = sc.array(dims=['x'], values=[True, False, False, True, False])
masked

In [63]:
masked.sum()

In [64]:
da.sum()

### Plotting

In [65]:
masked.plot()

VBox(children=(HBox(children=(VBox(children=(Button(icon='home', layout=Layout(padding='0px 0px 0px 0px', widt…

# Questions?

Empty cell to show stuff

## Binned Data

In [23]:
binned = sc.data.binned_x(nevent=100, nbin=4)
binned

In [24]:
sc.show(binned)

### From bins to histogram

In [25]:
histogram = binned.bins.sum()
histogram

In [26]:
sc.show(histogram)

### Multi-dimensional bins

In [27]:
binned_2d = sc.data.binned_xy(nevent=100, nx=3, ny=4)
binned_2d

In [28]:
sc.show(binned_2d)

### Making binned data

In [29]:
events = sc.data.table_xyz(100)
events

In [30]:
x_edges = sc.linspace('x', 0.0, 1.0, 4, unit='m')
binned = sc.bin(events, edges=[x_edges])
sc.show(binned)

### Changing binning

In [31]:
binned

In [32]:
fine_x_edges = sc.linspace('x', 0.0, 1.0, 12, unit='m')
sc.bin(binned, edges=[fine_x_edges])

### Binning in additional dimension

In [33]:
y_edges = sc.geomspace('y', 0.1, 0.8, 5, unit='m')
binned_2d = sc.bin(binned, edges=[y_edges])
sc.show(binned_2d)

Equivalently:

In [34]:
sc.bin(binned, edges=[x_edges, y_edges])

### Computation with binned data

In [35]:
dense_x = sc.DataArray(sc.array(dims=['x'], values=[1, 2, 3], unit='1/kg'),
                       coords={'x': x_edges})
dense_x

In [36]:
dense_x * binned_2d

# Questions?

Empty cell to show stuff

# Tutorial

<img src='images/flares.svg' style="display: block; margin-left: auto; margin-right: auto; width: 65%;">