# Filtering

Event filtering refers to the process of removing or extracting a subset of events based on some criterion such as the temperature of the measured sample at the time an event was detected.
Generally there are three steps to take when filtering events:

1. Preprocess the metadata used for filtering.
   For example, a noisy time series of temperature values needs to converted into a series of time intervals with a fixed temperature value within the interval.
   This process might involve defining thresholds and tolerances or interpolation methods between measured temperature values.
2. Map event timestamps to temperature values.
3. Filter data based on temperature values.

## Preparation

We create some fake data for illustration purposes.

<div class="alert alert-info">

**Note**

In practice data to be filtered would be based on a loaded file. Details of this subsection can safely by skipped, as long as all cells are executed.

</div>

In [None]:
import numpy as np
import scipp as sc
from scipp.plot import plot

In [None]:
np.random.seed(1) # Fixed for reproducibility
end_time = 100000
tof_max = 10000
width = tof_max/20
sizes = 4*np.array([7000, 3333, 3000, 5000])
size = np.sum(sizes)
data = sc.Variable(dims=['event'],
                   values=np.ones(size),
                   variances=np.ones(size))
x = sc.Variable(dims=['x'], unit=sc.units.m, values=np.linspace(0, 1, num=4))
time = sc.Variable(dims=['event'], unit=sc.units.s, dtype=sc.dtype.int64, shape=[size])
# time-of-flight in a neutron-scattering experiment
tof = sc.Variable(dims=['event'], unit=sc.units.us, dtype=sc.dtype.float64, shape=[size])
table = sc.DataArray(data=data, coords={'time':time, 'tof':tof})
table

ntemp = 100
sample_temperature = sc.DataArray(
    data=sc.Variable(dims=['time'], unit=sc.units.K, values=5*np.random.rand(100)+np.linspace(100, 120, num=ntemp)),
    coords={'time':sc.Variable(dims=['time'], unit=sc.units.s, values=np.linspace(0, end_time, num=ntemp))})
  
end = sc.Variable(dims=['x'], values=np.cumsum(sizes))
begin = end.copy()
begin.values -= sizes
events = sc.DataArray(
    data=sc.bins(begin=begin, end=end, dim='event', data=table),
    coords={'x':x},
    unaligned_coords={'sample_temperature': sc.Variable(value=sample_temperature)})
for size, bucket in zip(sizes, events.values):
    bucket.coords['time'].values = np.linspace(0, end_time, num=size)
    bucket.coords['tof'].values = np.concatenate(
        (np.concatenate(
            (7*width + width*np.random.randn(size//4),
             13*width + width*np.random.randn(size//4))),
         10*width + width*np.random.randn(size//2)))

## Step 1: Preprocess metadata

Our data contains a coordinate with metadata related to the temperature of the measured sample:

In [None]:
timeseries = events.coords['sample_temperature'].value
plot(timeseries)

This is a timeseries with noisy measurements, as could be obtained, e.g., from a temperature sensor.
For event filtering we require intervals with a fixed temperature.
This can be obtained in many ways.
In this example we do so by taking the mean over subintervals:

In [None]:
average=4
edges = sc.concatenate(
    sc.reshape(timeseries.coords['time'], dims=['time', 'dummy'], shape=(ntemp//average,average))['dummy',0],
    timeseries.coords['time']['time', -1]+1.0*sc.units.s, 'time')
values = sc.mean(sc.reshape(timeseries.data, dims=['time', 'dummy'], shape=(ntemp//average,average)), 'dummy')
temperature = sc.DataArray(values, coords={'time':edges})
plot(temperature)

<div class="alert alert-info">

**Note**

We are using integer data with a unit of seconds for the time series since scipp has no support for datetime64 yet.

</div>

## Step 2: Map time stamps

The `temperature` data array computed above can be seen as a discretized functional dependence of temperature on time.
This "function" can now be used to map the `time` of each event to the `temperature` of each event:

In [None]:
event_temp = sc.buckets.map(temperature, events.data, 'time')
events.bins.data.coords['temperature'] = event_temp.bins.data

The event lists with temperature values created by `scipp.map` have been added as a new coordinate:

In [None]:
events.values[0]

## Step 3: Filter

The temperature coordinate create in the previous step can now be used for the actual filtering step.
There are two options, `scipp.filter` and `scipp.bin` in combination with slicing.

### Option 1: `scipp.filter`

Above we have added a `temperature` coordinate to our data in `events`.
We can then use `scipp.filter` based on a temperature interval:

<div class="alert alert-info">

**Note**
    
The support for realigned data is being removed and will be replaced by bin variables.
`filter` is not yet available for binned data.
Documentation will be updated in the near future.
See next subsection for a solution that works right now.

</div>

In [None]:
#filtered = sc.filter(
#    data=events,
#    filter='temperature',
#    interval=sc.Variable(dims=['temperature'], unit=sc.units.K, values=[115.0, 119.0]))

The returned data array contains only events with a temperature value falling into this interval:

In [None]:
#plot(filtered, bins={'tof':100})

### Option 2: `scipp.bin`

With a `temperature` coordinate stored as part of `events` it is possible to use `scipp.bin` with temperature bins:

In [None]:
tof_bins = sc.Variable(dims=['tof'], unit=sc.units.us, values=np.linspace(0,tof_max,num=100))
temp_bins = sc.Variable(dims=['temperature'], unit=sc.units.K, values=np.linspace(100.0, 130.0, num=6))
binned_events = sc.bin(events, [temp_bins, tof_bins])
binned_events

Filtering is then performed by slicing (and copying):

In [None]:
filtered_view = binned_events['temperature', 0:3] # view containing only relevant events
filtered = binned_events['temperature', 0:3].copy() # extract only relevant events by copying

Slicing combined with histogramming is also performing a filter operation since all events outside the histogram bounds are dropped:

In [None]:
plot(binned_events['temperature', 1].bins.sum())

In [None]:
plot(binned_events['temperature', 3].bins.sum())

Results from filter operations can also be inserted into a dataset for convenient handling of further operations such as histogramming, summing, or plotting:

In [None]:
d = sc.Dataset()
d['below_T_c'] = binned_events['temperature', 1]
d['above_T_c'] = binned_events['temperature', 3]
plot(sc.sum(d.bins.sum(), 'x'))

We can also bin without the time-of-flight coordinate to obtain that temperature dependence of the total event count, e.g., for normalization purposes:

In [None]:
binned_events = sc.bin(events, [temp_bins])
plot(binned_events.bins.sum())