# Channel-Level Evaluation
In this notebook, we introduce a new sub-module of the `pEYES` package - `channel_metrics` - which provides evaluation metrics for event detectors on the channel-level.  

We start by explaining what are MNE-Style event channels and how we match between them.  
We then introduce the two types of channel evaluation metrics provided by `pEYES`: _Differences_ and _Signal Detection Metrics_. Note that the SDT metrics require that we define what are _negative_ events in this context, which is conceptualized as "windows" of samples that do not contain any positive event (see further explanation below).

In [1]:
import os

import numpy as np
import pandas as pd

import peyes

## Preparing the Data
As shown in previous notebooks, we start by preparing the data.  
In the following code blocks we:  
1. Download the "Lund2013" dataset.  
2. Extract the data of the first trial, including the time, x, and y columns as well as the pixel size and viewer distance.  
3. Extract the annotations by the two human annotators _"RA"_ and _"MN"_. These will be used as the ground truth for the upcoming evaluation.  
4. Create two detector objects, using Engbert's detection algorithm and the I-VT detection algorithm.  
5. Use each detector to label the samples in the first trial.  

In [2]:
dataset = peyes.datasets.lund2013(directory=None, save=False, verbose=True)

Dataset Lund2013 not found in directory None.
Downloading...


Processing Files: 100%|██████████| 97/97 [00:01<00:00, 84.51it/s] 


In [3]:
trial1_data = dataset[dataset[peyes.constants.TRIAL_ID_STR] == 1]
trial1_t=trial1_data[peyes.constants.T].values
trial1_x=trial1_data[peyes.constants.X].values
trial1_y=trial1_data[peyes.constants.Y].values
trial1_pixel_size = trial1_data["pixel_size"].values[0]
trial1_viewer_distance = trial1_data["viewer_distance"].values[0]

In [4]:
ra = trial1_data["RA"].values
mn = trial1_data["MN"].values

ra, mn

(array([1, 1, 1, ..., 4, 4, 4], dtype=int64),
 array([1., 1., 1., ..., 4., 4., 4.]))

In [5]:
engbert = peyes.create_detector("engbert", missing_value=np.nan, min_event_duration=4, pad_blinks_time=0)

eng_labels, eng_metadata = engbert.detect(
    t=trial1_t, x=trial1_x, y=trial1_y, pixel_size_cm=trial1_pixel_size, viewer_distance_cm=trial1_viewer_distance
)
eng_labels

[<EventLabelEnum.UNDEFINED: 0>,
 <EventLabelEnum.UNDEFINED: 0>,
 <EventLabelEnum.UNDEFINED: 0>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <Eve

In [6]:
ivt = peyes.create_detector("ivt", missing_value=np.nan, min_event_duration=4, pad_blinks_time=0)

ivt_labels, ivt_metadata = ivt.detect(
    t=trial1_t, x=trial1_x, y=trial1_y, pixel_size_cm=trial1_pixel_size, viewer_distance_cm=trial1_viewer_distance
)
ivt_labels

[<EventLabelEnum.UNDEFINED: 0>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <EventLabelEnum.FIXATION: 1>,
 <Event

## Background:
### 1. MNE-Style "Channels"
The `channel_metrics` sub-module calculates evaluation metrics on the channel-level. To do so, it converts the input _ground-truth_ and _prediction_ sequences into a "MNE-like" channel. These are boolean arrays that are `False` everywhere except on specific samples that indicate an "event" occurrence (e.g., a saccade onset, a stimulus presentation, etc.).  

The `pEYES` package provides the `create_boolean_channel` function that converts a sequence of labels or event-objects into a boolean-channel, indicating either the onset or the offset of the events. Note that when the inpute is a sequence of `Event` objects (see guide #1), the function requires specifying the underlying `sampling_rate` to convert the time-stamps into sample indices.  
See for example how we convert labels from the human annotator "RA" into a boolean channel:

In [7]:
ra_on_channel = peyes.create_boolean_channel(channel_type=peyes.constants.ONSET_STR, data=ra)
ra_off_channel = peyes.create_boolean_channel(channel_type=peyes.constants.OFFSET_STR, data=ra)
num_onsets = np.sum(ra_on_channel)
num_offsets = np.sum(ra_off_channel)
print(f"Number of onsets: {num_onsets}")
print(f"Number of offsets: {num_offsets}")
ra_on_channel, ra_off_channel

Number of onsets: 11
Number of offsets: 11


(array([ True, False, False, ..., False, False, False]),
 array([False, False, False, ..., False, False,  True]))

### 2. Channel Matching
When provided with two boolean channels (_ground truth_ and _prediction_), we "match" them by finding the closes `True` index in the _ground-truth_ for each `True` index in the _prediction_ ("closest" in terms of sample index, which should correlate with time). This is done using one of `pEYES`' internal utility functions (`peyes._utils.vecotr_utils.pair_boolean_arrays()`), based on the implementation shown in https://stackoverflow.com/q/78484847/8543025. 

## Channel-Level Evaluation
Now that we understand what are "channels" and how they are matched, we can move on to `pEYES`' evaluation metrics for channels.  
`pEYES` offers evaluation metrics on two "types" of events - _onsets_ and _offsets_. The channel type is specified when calling the evaluation function itself, though the underlying implementation is the same.  
For each event type, there are two evaluation metrics: `X_differences` and `X_detection_metrics`, where `X` stands for either _onset_ or _offset_.  

### 1. Differences
The `onset_differences` and `offset_differences` functions take a sequence of _ground-truth_ and _prediction_ labels/events, converts them into boolean channels and matches them. The function then returns the differences between the matched pairs of events, in terms of sample indices. To avoid inaccurate matching, one can specify the `max_diff` argument that ignores pairs with a (absolute) difference greater than the specified value.

In [8]:
on_diffs = peyes.channel_metrics.onset_differences(ra, eng_labels, max_diff=20)
on_diffs

array([ 3, -1,  3,  0, -1, -3,  8, -2,  3], dtype=int64)

In [9]:
off_diffs = peyes.channel_metrics.offset_differences(ra, eng_labels, max_diff=20)
off_diffs

array([-1,  3,  0, -1, -3,  8, -2,  3, -4], dtype=int64)

### 2. Detection Metrics
The `onset_detection_metrics` and `offset_detection_metrics` functions calculate contingency-measures and Signal Detection Theory (SDT) metrics for the _onset_ and _offset_ events, respectively. They require specifying a `threshold` argument, which could be either a single number or a suquence of numbers, and determines the maximum allowed difference between the matched pairs of events (like the `max_diff` argument in the `X_differences` functions).  

As mentioned, we first calculate contingency measures for the given threshold(s). These include:  
* `P` - number of ground-truth events (i.e. number of `True` samples in the ground-truth channel).  
* `PP` - number of predicted events (i.e. number of `True` samples in the prediction channel).  
* `TP` - number of true positives (i.e. number of matched pairs for the given threshold).  
* `N` - Number of **negative windows** in the ground-truth channel. A negative window is defined as a sequence of samples with the duration `2*threshold` that does not contain any positive event.  

From those, we can calculate SDT metrics:
* `recall` - `TP / P`
* `precision` - `TP / PP`
* `F1` - `2 * (precision * recall) / (precision + recall)`
* `False Alarm Rate` - `(PP - TP) / N`
* `d'` and `criterion` - use the `recall` and `False Alarm Rate`, along with an optional argument `dprime_correction` that determines the correction method for the calculation (_None_, _'macmillan_kaplan'_ or the _'loglinear'_). See `peyes._utils.metric_utils.dprime_and_criterion()` for more details.

The functions return a pandas DataFrame where each row corresponds to a specific threshold value and each column corresponds to a metric.

In [10]:
on_sdt = peyes.channel_metrics.onset_detection_metrics(ra, eng_labels, threshold=np.arange(1, 5))
on_sdt.T

threshold,1,2,3,4
metric,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
P,11.0,11.0,11.0,11.0
PP,69.0,69.0,69.0,69.0
TP,3.0,4.0,8.0,8.0
N,541.666667,320.6,225.857143,173.222222
recall,0.272727,0.363636,0.727273,0.727273
precision,0.043478,0.057971,0.115942,0.115942
f1,0.075,0.1,0.2,0.2
false_alarm_rate,0.121846,0.202745,0.270082,0.352149
d_prime,0.561222,0.483101,1.21715,0.984111
criterion,0.885196,0.590306,0.003989,-0.11253


## Example Usage
We are of course not limited to comparing a single detector with a single human annotator, over a single trial.  
In the following example we:  
1. Extract data and annotations for each of the first 10 trials. Note that some of the trials were not labeled by both annotators, and in that case the annotator's data is set to `np.NaN` throughout the trial (and we skip the evaluation for that annotator).
2. Run the _I-VT_ and _Engbert_ detectors to label the samples in each trial.
3. Calculate the onset detection metrics for each annotator-detector pair, over multiple thresholds.
4. Generate a multi-indexed DataFrame with the results.

In [11]:
GT_STR, PRED_STR = "gt", "pred"
GT1, GT2 = "RA", "MN"
DET1, DET2 = engbert, ivt
THRESHOLDS = np.arange(1, 5)

results = []
for tr in range(1, 11):
    trial_data = dataset[dataset[peyes.constants.TRIAL_ID_STR] == tr]
    trial_t = trial_data[peyes.constants.T].values
    trial_x = trial_data[peyes.constants.X].values
    trial_y = trial_data[peyes.constants.Y].values
    trial_pixel_size = trial_data["pixel_size"].values[0]
    trial_viewer_distance = trial_data["viewer_distance"].values[0]
    
    for i, gt in enumerate([GT1, GT2]):
        gt_labels = trial_data[gt].dropna().values
        if len(gt_labels) == 0:
            continue
        for j, pred in enumerate([DET1, DET2]):
            pred_labels, _ = pred.detect(
                t=trial_t, x=trial_x, y=trial_y, pixel_size_cm=trial_pixel_size, viewer_distance_cm=trial_viewer_distance
            )
            if all(np.isnan(pred_labels)):
                continue
            sdt = peyes.channel_metrics.onset_detection_metrics(gt_labels, pred_labels, threshold=THRESHOLDS).T
            sdt[peyes.constants.TRIAL_STR] = tr
            sdt[GT_STR] = gt
            sdt[PRED_STR] = pred.name
            results.append(sdt)

results = pd.concat(results, ignore_index=False)
results = results.set_index([peyes.constants.TRIAL_STR, GT_STR, PRED_STR], inplace=False, append=True).reorder_levels([peyes.constants.TRIAL_STR, GT_STR, PRED_STR, peyes.constants.METRIC_STR])
results

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,threshold,1,2,3,4
trial,gt,pred,metric,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,RA,EngbertDetector,P,11.000000,11.000000,11.000000,11.000000
1,RA,EngbertDetector,PP,69.000000,69.000000,69.000000,69.000000
1,RA,EngbertDetector,TP,3.000000,4.000000,8.000000,8.000000
1,RA,EngbertDetector,N,541.666667,320.600000,225.857143,173.222222
1,RA,EngbertDetector,recall,0.272727,0.363636,0.727273,0.727273
...,...,...,...,...,...,...,...
10,MN,IVTDetector,precision,0.222222,0.222222,0.222222,0.333333
10,MN,IVTDetector,f1,0.222222,0.222222,0.222222,0.333333
10,MN,IVTDetector,false_alarm_rate,0.039033,0.067308,0.097610,0.111570
10,MN,IVTDetector,d_prime,0.997304,0.731437,0.530583,0.787492


## Summary
In this notebook, we covered the channel-level evaluation metrics provided by `pEYES`. These metrics are useful for evaluating the performance of event detectors on a more detailed level than the sample-level metrics.