# Filtering

Event filtering refers to the process of removing or extracting a subset of events based on some criterion such as the temperature of the measured sample at the time an event was detected.
Generally there are three steps to take when filtering events:

1. Preprocess the metadata used for filtering.
   For example, a noisy time series of temperature values needs to converted into a series of time intervals with a fixed temperature value within the interval.
   This process might involve defining thresholds and tolerances or interpolation methods between measured temperature values.
2. Map event timestamps to temperature values.
3. Filter data based on temperature values.

## Preparation

We create some fake data for illustration purposes.

<div class="alert alert-info">

**Note**

In practice data to be filtered would be based on a loaded file. Details of this subsection can safely by skipped, as long as all cells are executed.

</div>

In [None]:
import numpy as np
import scipp as sc
from scipp.plot import plot

In [None]:
np.random.seed(1) # Fixed for reproducibility
end_time = 100000
tof_max = 10000
width = tof_max/20
sizes = 4*np.array([7000, 3333, 3000, 5000])
data = sc.Variable(dims=['x'],
                   shape=[4],
                   variances=True,
                   dtype=sc.dtype.event_list_float64)
x = sc.Variable(dims=['x'], unit=sc.units.m, values=np.linspace(0, 1, num=4))
time = sc.Variable(dims=['x'],
                   shape=[4],
                   unit=sc.units.s,
                   dtype=sc.dtype.event_list_int64)
# time-of-flight in a neutron-scattering experiment
tof = sc.Variable(dims=['x'],
                   shape=[4],
                   unit=sc.units.us,
                   dtype=sc.dtype.event_list_float64)
for i, size in enumerate(sizes):
    vals = np.random.rand(size)
    data['x', i].values = np.ones(size)
    data['x', i].variances = np.ones(size)
    time['x', i].values = np.linspace(0, end_time, num=size)
    tof['x', i].values = np.concatenate(
        (np.concatenate(
            (7*width + width*np.random.randn(size//4),
            13*width + width*np.random.randn(size//4))),
        10*width + width*np.random.randn(size//2)))

ntemp = 100
sample_temperature = sc.DataArray(
    data=sc.Variable(dims=['time'], unit=sc.units.K, values=5*np.random.rand(100)+np.linspace(100, 120, num=ntemp)),
    coords={'time':sc.Variable(dims=['time'], unit=sc.units.s, values=np.linspace(0, end_time, num=ntemp))})
    
events = sc.DataArray(
    data,
    coords={'x':x, 'time':time, 'tof':tof},
    attrs={'sample_temperature': sc.Variable(value=sample_temperature)})

## Step 1: Preprocess metadata

Our data contains an attribute with metadata related to the temperature of the measured sample:

In [None]:
timeseries = events.attrs['sample_temperature'].value
plot(timeseries)

This is a timeseries with noisy measurements, as could be obtained, e.g., from a temperature sensor.
For event filtering we require intervals with a fixed temperature.
This can be obtained in many ways.
In this example we do so by taking the mean over subintervals:

In [None]:
average=4
edges = sc.concatenate(
    sc.reshape(timeseries.coords['time'], dims=['time', 'dummy'], shape=(ntemp//average,average))['dummy',0],
    timeseries.coords['time']['time', -1]+1.0*sc.units.s, 'time')
values = sc.mean(sc.reshape(timeseries.data, dims=['time', 'dummy'], shape=(ntemp//average,average)), 'dummy')
temperature = sc.DataArray(values, coords={'time':edges})
plot(temperature)

<div class="alert alert-info">

**Note**

We are using integer data with a unit of seconds for the time series since scipp has no support for datetime64 yet.

</div>

## Step 2: Map time stamps

The `temperature` data array computed above can be seen as a discretized functional dependence of temperature on time.
This "function" can now be used to map the `time` of each event to the `temperature` of each event:

In [None]:
events.coords['temperature'] = sc.map(temperature, events.coords['time'])

The event lists with temperature values created by `scipp.map` have been added as a new coordinate:

In [None]:
events

## Step 3: Filter

The temperature coordinate create in the previous step can now be used for the actual filtering step.
There are two options, `scipp.filter` and `scipp.realign` in combination with slicing.

### Option 1: `scipp.filter`

Above we have added a `temperature` coordinate to our data in `events`.
We can then use `scipp.filter` based on a temperature interval:

In [None]:
filtered = sc.filter(
    data=events,
    filter='temperature',
    interval=sc.Variable(dims=['temperature'], unit=sc.units.K, values=[115.0, 119.0]))

The returned data array contains only events with a temperature value falling into this interval:

In [None]:
plot(filtered, bins={'tof':100})

### Option 2: `scipp.realign`

With a `temperature` coordinate stored in `events` it is possible to use `scipp.realign` with temperature bins:

In [None]:
tof_bins = sc.Variable(dims=['tof'], unit=sc.units.us, values=np.linspace(0,tof_max,num=100))
temp_bins = sc.Variable(dims=['temperature'], unit=sc.units.K, values=np.linspace(100.0, 130.0, num=6))
realigned = sc.realign(events, {'temperature':temp_bins, 'tof':tof_bins})

Filtering is then performed by slicing and copying:

In [None]:
filtered = realigned['temperature', 0:3].copy()

Slicing combined with histogramming is also performing a filter operation since all events outside the histogram bounds are dropped:

In [None]:
plot(sc.histogram(realigned['temperature', 1]))
plot(sc.histogram(realigned['temperature', 3]))

Results from filter operations can also be inserted into a dataset for convenient handling of further operations such as histogramming, summing, or plotting:

In [None]:
d = sc.Dataset()
d['below_T_c'] = realigned['temperature', 1]
d['above_T_c'] = realigned['temperature', 3]
plot(sc.sum(sc.histogram(d), 'x'))

We can also realign without the time-of-flight coordinate to obtain that temperature dependence of the total event count, e.g., for normalization purposes:

In [None]:
realigned = sc.realign(events, {'temperature':temp_bins})
plot(realigned)