# GroupBy

"Group by" operations refers to an implementation of the "split-apply-combine" approach known from [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) and [xarray](http://xarray.pydata.org/en/stable/groupby.html).
We currently support only a limited number of operations that can be applied.

## Grouping with bins

Note that this notebooks requires [Mantid](https://www.mantidproject.org/Main_Page) and data files.
A [binder](https://mybinder.org/v2/gh/scipp/scipp-neutron-jupyter-demo/main) is available that can run this notebook.

[PG3_4844_event.nxs](http://198.74.56.37/ftp/external-data/MD5/d5ae38871d0a09a28ae01f85d969de1e) is required for this notebook.

Rename the files upon download.

In [None]:
import numpy as np
import scipp as sc
import scippneutron as scn

In [None]:
events = scn.load(filename='PG3_4844_event.nxs', load_pulse_times=False)

### Example 1 (dense data): split-sum-combine

We histogram the event data:

In [None]:
bins = sc.Variable(['tof'], values=np.arange(0.0, 17000.0, 50.0), unit=sc.units.us)
pos_hist = sc.histogram(events, bins)

A plot shows the shortcoming of the data representation.
There is no physical meaning attached to the "spectrum" dimension and the plot is hard to interpret:

In [None]:
sc.plot(pos_hist)

To improve the plot, we first store the scattering angle as labels in the data array.
Then we create a variable containing the desired target binning:

In [None]:
pos_hist.coords['two_theta'] = scn.two_theta(pos_hist)
two_theta = sc.Variable(['two_theta'],
                        unit=sc.units.rad,
                        values=np.linspace(0.0, np.pi, num=500))

We use `scipp.groupby` with the desired bins and apply a `sum` over dimension `spectrum`:

In [None]:
theta_hist = sc.groupby(pos_hist, 'two_theta', bins=two_theta).sum('spectrum')

The result has `spectrum` replaced by the physically meaningful `two_theta` dimension and the resulting plot is easily interpretable:

In [None]:
sc.plot(theta_hist)

### Example 2 (event data): split-flatten-combine

This is essentially the same as example 1 but avoids histogramming data too early.
A plot of the original data is hard to interpret:

In [None]:
sc.plot(sc.histogram(events, bins))

Again, we improve the plot by first storing the scattering angle as labels in the data array with the events.
Then we create a variable containing the desired target binning:

In [None]:
events.coords['two_theta'] = scn.two_theta(events)
theta = sc.Variable(['two_theta'],
                    unit=sc.units.rad,
                    values=np.linspace(0.0, np.pi, num=500))

We use `scipp.groupby` with the desired bins and apply a `concatenate` operation on dimension `spectrum`.
This is the event-data equivalent to summing histograms:

In [None]:
theta_events = sc.groupby(events, 'two_theta', bins=theta).concatenate('spectrum')

The result has dimension `spectrum` replaced by the physically meaningful `two_theta` and results in the same plot as before with histogrammed data.

In [None]:
sc.plot(sc.histogram(theta_events, bins))