# Unaligned Data

## Introduction

Scipp supports features for *realigning* "unaligned" data.
Unaligned data in this context refers to data values at irregularly placed in, e.g., space or time.
Realignment lets us:

- Map a table of position-based data to an X-Y-Z grid.
- Map a table of position-based to an angle such as $\theta$.
- Map event time stamps to time bins.

The key feature here is that *realignment does not actually histogram or resample data*.
Data is kept in its original form.
The realignment just adds a wrapper with a coordinate system more adequate for working with the scientic data.
Where possible, operations with the realigned wrapper are supported "as if" working with dense histogrammed data.

## Concept

We outline the underlying concepts based on a simple example.

In [None]:
import scipp as sc
import numpy as np
from scipp.plot import plot
import matplotlib.pyplot as plt

np.random.seed(1) # Fixed for reproducibility

Consider a list of measurements at various "points" in space.
Here we restrict ourselves to the X-Y plane for visualization purposes:

In [None]:
N = 50
values = 10*np.random.rand(N)
data = sc.DataArray(
    data=sc.Variable(dims=['position'], unit=sc.units.counts, values=values, variances=values),
    coords={
        'position':sc.Variable(dims=['position'], values=['site-{}'.format(i) for i in range(N)]),
        'x':sc.Variable(dims=['position'], unit=sc.units.m, values=np.random.rand(N)),
        'y':sc.Variable(dims=['position'], unit=sc.units.m, values=np.random.rand(N))})
data

For every point we measured at the auxiliary coordinates `'x'` and `'y'` give the position in the X-Y plane.
These are *not* dimension-coordinates, since our measurements are *not* on a 2-D grid, but rather points with an irregular distribution.
`data` is essentially a 1-D table of measurements.
We can plot this data:

In [None]:
plot(data)

The `'position'` dimension is not a continuous dimension but essentially just a row in our table.
In practice, such a figure and this representation of data in general may therefore not be very useful.

As an alternative view of our data we can create a scatter plot.
We do this explicitly here to demonstrate how the content of `data` is connected to elements of the figure:

In [None]:
fig = plt.figure()
scatter = plt.scatter(
    x=data.coords['x'].values,
    y=data.coords['y'].values,
    c=data.values)
ax = fig.gca()
ax.set_xlabel('x [{}]'.format(data.coords['x'].unit))
ax.set_ylabel('y [{}]'.format(data.coords['y'].unit))
cbar = plt.colorbar(scatter)
cbar.set_label("[{}]".format(data.unit))
plt.show()

This shows the distribution in space, but for real datasets with millions of points this may not be convenient.
Furthermore, operating with scattered data is often inconvenient and may require knownledge of the underlying representation.

We can now use `scipp.realign` to provide a more accessible wrapper for our data:

In [None]:
xbins = sc.Variable(dims=['x'], unit=sc.units.m, values=[0.1,0.5,0.9])
ybins = sc.Variable(dims=['y'], unit=sc.units.m, values=[0.1,0.5,0.9])
realigned = sc.realign(data, {'x':xbins, 'y':ybins})
realigned

`realigned` is a 2-D data array, but it contains the orignal "unaligned" data, accessible through the `unaligned` property:

In [None]:
realigned.unaligned

The "realignment" proceedure based on bin edges for `'x'` and `'y'` is *not* performing the actual histogramming step.
However, since its dimensions are defined by the bin-edge coordinates for `'x'` and `'y'`, we will see below that it behaves much like normal dense data for operations such as slicing.

We create another figure to better illustrate the structure of `realigned`:

In [None]:
fig = plt.figure()
scatter = plt.scatter(
    x=realigned.unaligned.coords['x'].values,
    y=realigned.unaligned.coords['y'].values,
    c=realigned.unaligned.values)
ax = fig.gca()
ax.set_xlabel('x [{}]'.format(realigned.coords['x'].unit))
ax.set_ylabel('y [{}]'.format(realigned.coords['y'].unit))
ax.set_xticks(realigned.coords['x'].values)
ax.set_yticks(realigned.coords['y'].values)
plt.grid()
cbar = fig.colorbar(scatter)
cbar.set_label("[{}]".format(data.unit))
plt.show()

This is essentially the same figure as the scatter plot for the original `data`.
The only difference is the "grid" (the bin edges) that is stored alongside the data.
`realigned` can now directly be histogrammed, without the need for specifying bin boundaries:

In [None]:
plot(sc.histogram(realigned))

Here `histogram` performs histogramming for all "realigned" dimension, in this case `x` and `y`.
The resulting values in the X-Y bins are the counts accumulated from measurements at all points falling in a given bin.