# GroupBy

"Group by" operations refers to an implementation of the "split-apply-combine" approach known from [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) and [xarray](http://xarray.pydata.org/en/stable/groupby.html).
We currently support only a limited number of operations that can be applied.

## Grouping with bins

Note that this notebooks requires [Mantid](https://www.mantidproject.org/Main_Page) and data files.
A [binder](https://mybinder.org/v2/gh/scipp/scipp-neutron-jupyter-demo/master) is available that can run this notebook.

In [None]:
import numpy as np
import scipp as sc

In [None]:
events = sc.neutron.load(filename='PG3_4844_event.nxs', load_pulse_times=False)

### Example 1 (dense data): split-sum-combine

We histogram the event data:

In [None]:
bins = sc.Variable(['tof'], values=np.arange(0.0, 17000.0, 50.0), unit=sc.units.us)
pos_hist = sc.histogram(events, bins)

A plot shows the shortcoming of the data representation.
There is no physical meaning attached to the "spectrum" dimension and the plot is hard to interpret:

In [None]:
sc.plot.plot(pos_hist)

To improve the plot, we store first the scattering angle as labels in the data array.
Then we create a variable containing the desired target binning:

In [None]:
pos_hist.coords['scattering_angle'] = sc.neutron.scattering_angle(pos_hist)
theta = sc.Variable(['scattering_angle'],
                    unit=sc.units.rad,
                    values=np.linspace(0.0, np.pi/2, num=500))

We use `scipp.groupby` with the desired bins and apply a `sum` over dimension `spectrum`:

In [None]:
theta_hist = sc.groupby(pos_hist, 'scattering_angle', bins=theta).sum('spectrum')

The result has `spectrum` replaced by the physically meaningful `scattering_angle` dimension and the resulting plot is easily interpretable:

In [None]:
sc.plot.plot(theta_hist)

### Example 2 (event data): split-flatten-combine

This is essentially the same as example 1 but avoids histogramming data too early.
A plot of the original data is hard to interpret:

In [None]:
sc.plot.plot(events, bins={'tof': np.linspace(0.0, 17000.0, 1000)})

To improve the plot, we store first the scattering angle as labels in the data array.
Then we create a variable containing the desired target binning:

In [None]:
events.coords['scattering_angle'] = sc.neutron.scattering_angle(events)
theta = sc.Variable(['scattering_angle'],
                    unit=sc.units.rad,
                    values=np.linspace(0.0, np.pi/2, num=500))

We use `scipp.groupby` with the desired bins and apply a `flatten` operation on dimension `spectrum`.
This is the events-data equivalent to summing histograms:

In [None]:
theta_events = sc.groupby(events, 'scattering_angle', bins=theta).flatten('spectrum')

The result has dimension `spectrum` replaced by the physically meaningful `scattering_angle` and the resulting plot is easily interpretable:

In [None]:
sc.plot.plot(theta_events, bins={'tof': np.linspace(0.0, 17000.0, 1000)})