# Event Matching and Match Evaluation

In [1]:
!pip install peyes --upgrade



In [2]:
import os

import numpy as np
import pandas as pd

import peyes

## Preparing the Data
As shown in previous notebooks, we start by preparing the data.  
In the following code blocks we:  
1. Download the "Lund2013" dataset.  
2. Extract the data of the first trial, including the time, x, and y columns as well as the pixel size and viewer distance.  
3. Extract the annotations by the two human annotators _"RA"_ and _"MN"_. These will be used as the ground truth for the upcoming evaluation.  
4. Create two detector objects, using Engbert's detection algorithm and the I-VT detection algorithm.  
5. Use each detector to label the samples in the first trial.  

In [3]:
# Download the "Lund2013" dataset
dataset = peyes.datasets.lund2013(directory=None, save=False, verbose=True)

# Extract the data of the first trial
trial1_data = dataset[dataset[peyes.constants.TRIAL_ID_STR] == 1]

# read labels from the two annotators
ra = trial1_data["RA"].values
mn = trial1_data["MN"].values

# Extract the time, x, and y columns as well as the pixel size and viewer distance
trial1_t = trial1_data[peyes.constants.T].values
trial1_x = trial1_data[peyes.constants.X].values
trial1_y = trial1_data[peyes.constants.Y].values
trial1_p = trial1_data[peyes.constants.PUPIL].values
trial1_pixel_size = trial1_data["pixel_size"].values[0]
trial1_viewer_distance = trial1_data["viewer_distance"].values[0]

# create detector objects and use them to detect the samples in the first trial
engbert = peyes.create_detector("engbert", missing_value=np.nan, min_event_duration=4, pad_blinks_time=0)
eng_labels, eng_metadata = engbert.detect(
    t=trial1_t, x=trial1_x, y=trial1_y, pixel_size_cm=trial1_pixel_size, viewer_distance_cm=trial1_viewer_distance
)

ivt = peyes.create_detector("ivt", missing_value=np.nan, min_event_duration=4, pad_blinks_time=0)
ivt_labels, ivt_metadata = ivt.detect(
    t=trial1_t, x=trial1_x, y=trial1_y, pixel_size_cm=trial1_pixel_size, viewer_distance_cm=trial1_viewer_distance
)

Downloading...


Processing Files: 100%|██████████| 96/96 [00:00<00:00, 202.77it/s]


### Generating Events
As shown in notebook #1, consecutive samples that share the same label can be grouped into an `Event` object.  
These are one of `pEYES`'s data structures, and they contain many useful attributes and methods that are easier to use than manipulating the raw data directly. For example, we can use an `Event` object to calculate the duration of the event, instead of manually calculating the difference between the start and end times of the samples that compose the event.  

To convert `pEYES`'s labels to events, we use the `peyes.create_events()` function (see notebook #1 for more details).  
As another preparatory step, we will convert the labels from all annotators and detectors to their corresponding events.

In [4]:
ra_events = peyes.create_events(
    labels=ra,
    t=trial1_t, x=trial1_x, y=trial1_y, pupil=trial1_p,
    pixel_size=trial1_pixel_size, viewer_distance=trial1_viewer_distance
)
mn_events = peyes.create_events(
    labels=mn,
    t=trial1_t, x=trial1_x, y=trial1_y, pupil=trial1_p,
    pixel_size=trial1_pixel_size, viewer_distance=trial1_viewer_distance
)
eng_events = peyes.create_events(
    labels=eng_labels,
    t=trial1_t, x=trial1_x, y=trial1_y, pupil=trial1_p,
    pixel_size=trial1_pixel_size, viewer_distance=trial1_viewer_distance
)
ivt_events = peyes.create_events(
    labels=ivt_labels,
    t=trial1_t, x=trial1_x, y=trial1_y, pupil=trial1_p,
    pixel_size=trial1_pixel_size, viewer_distance=trial1_viewer_distance
)

## Event Matching
Event matching is the process of mapping between two sequences of events(ground-truth & prediction), based on some matching criteria (_"matching scheme"_). A matching could be between events from the same type or from different types; it could allow multiple predicted-events to match a single ground-truth-event or vice versa, and so on.  

In their article _"Evaluating eye movement event detection: A review of the state of the art"_ (2023), Stratsev & Zemblys include a dedicated chapter on Event-Matching. They describe various methods to match semi-overlapping sequences of events, originating from two different detection sources (e.g. a human annotator & a detection algorithm). They discuss different considerations that should be taken into account, and demonstrate how different matching schemes may affect the outcome. We encourage anyone working with this type of analysis to read their report before applying this logic onto their data.  

The `pEYES` package provides implementations for various matching schemes described in the aforementioned article, including _IoU_, _Window-Based_, etc. Two sequences of `Event` objects can be matched, using the `peyes.match()` function, which requires also specifying a matching-scheme (`match_by` argument) and whether you allow for _cross-matching_ - can events of different types be matched or not. The function's documentation provides useful information on the available matching-schemes and the parameters required to run them.  
**Note** that event matching is a time-consuming process that may require long runtime and exhaustive compute.

In this example, we will match only one predicted sequence against a corresponding ground-truth. We will use the strictest matching-scheme available in `pEYES`, _l2_, which only matches events if the $l_2$ norm of the differences between their onset- and offset-times is lesser than some threshold (set here to 15ms). For more information, see the Stratsev & Zemblys article.

In [5]:
print(peyes.match.__doc__)


    Match events based on the given matching criteria, ignoring specified event-labels.
    Matches can be one-to-one or one-to-many depending on the matching criteria and the specified parameters.

    :param ground_truth: a sequence of BaseEvent objects representing the ground truth events.
    :param prediction: a sequence of BaseEvent objects representing the predicted events.
    :param match_by: the matching criteria to use:
        - 'first' or 'first overlap': match the first predicted event that overlaps with each ground-truth event
        - 'last' or 'last overlap': match the last predicted event that overlaps with each ground-truth event
        - 'max' or 'max overlap': match the predicted event with maximum overlap with each ground-truth event
        - 'longest overlap': match the longest predicted event that overlaps with each ground-truth event
        - 'iou' or 'intersection over union': match the predicted event with maximum intersection-over-union
        - 'onset

In [6]:
peyes.match(ra_events, eng_events, match_by='l2', max_l2=15, allow_xmatch=False)

{FIXATION(236.0ms): FIXATION(228.0ms),
 SACCADE(18.0ms): SACCADE(34.0ms),
 SACCADE(16.0ms): SACCADE(20.0ms),
 SACCADE(8.0ms): SACCADE(18.0ms)}

### Setting Scheme Params
As you can see, the output is a mapping, where each _key: value_ pair is a match between a _GT_ event to a _Pred_ event. With these parameters, we were able to match four (4) events from the GT and Pred sequences. If we allow for a more lenient threshold, we may be able to match more events. For example, if we match only based on _onset-time_ and allow for a maximum jitter of _3_, _30_ or _300_ ms, we might get different results.

In [7]:
print(peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=3, allow_xmatch=False))
print(peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=30, allow_xmatch=False))
print(peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=300, allow_xmatch=False))

{SACCADE(18.0ms): SACCADE(34.0ms), SACCADE(16.0ms): SACCADE(20.0ms)}
{FIXATION(236.0ms): FIXATION(228.0ms), SACCADE(18.0ms): SACCADE(34.0ms), SACCADE(16.0ms): SACCADE(20.0ms), SACCADE(12.0ms): SACCADE(34.0ms), SACCADE(8.0ms): SACCADE(18.0ms)}
{FIXATION(236.0ms): FIXATION(228.0ms), SACCADE(18.0ms): SACCADE(34.0ms), SACCADE(16.0ms): SACCADE(20.0ms), SACCADE(12.0ms): SACCADE(34.0ms), SACCADE(8.0ms): SACCADE(18.0ms)}


### Cross-Matching
Note also that all _gt: pred_ pairs are of the same type. This is because we explicitly prevented _cross matching_, which means a match is only valid if both _gt_ and _pred_ events share the same event type (underlying label). Now we will allow allow different event types, and you'll see the pairs indeed don't necessarily share a type.  
Even though _x matching_ yields more matches, we discourage this usage as may lead to illogical matches and interfere with subsequent analysis. Use at your own risk :D

In [8]:
print(peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=30, allow_xmatch=True))

{FIXATION(236.0ms): FIXATION(228.0ms), SACCADE(18.0ms): SACCADE(34.0ms), PSO(6.0ms): FIXATION(132.0ms), SACCADE(16.0ms): SACCADE(20.0ms), PSO(4.0ms): FIXATION(176.0ms), SACCADE(12.0ms): SACCADE(34.0ms), SMOOTH_PURSUIT(450.0ms): FIXATION(50.0ms), SACCADE(8.0ms): SACCADE(18.0ms), SMOOTH_PURSUIT(196.0ms): FIXATION(18.0ms)}


### Non-Schematic Matching
When you specify the  `match_by` according to one of the predefined matching schemes, it only uses the subset of parameters that are defined for that scheme (e.g., if you specify `match_by='onset'`, the only relevant parameter is `max_onset_difference`), and all other parameters are ignored. If you want to "create" your own matching scheme, using multiple parameters that don't have a specific unifying scheme, you can specify `match_by='other'` (or `match_by='generic'`) along with whatever combination of parameters you wish.

In [9]:
print(
    peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=30, allow_xmatch=False) == 
    peyes.match(ra_events, eng_events, match_by='onset', max_onset_difference=30, allow_xmatch=False, max_offset_difference=30, min_iou=0.15)
)

print(peyes.match(ra_events, eng_events, match_by='other', max_onset_difference=30, allow_xmatch=False, max_offset_difference=30, min_iou=0.15))

True
{FIXATION(236.0ms): [FIXATION(228.0ms)], SACCADE(18.0ms): [SACCADE(34.0ms)], SACCADE(16.0ms): [SACCADE(20.0ms)], SACCADE(12.0ms): [SACCADE(34.0ms)], SACCADE(8.0ms): [SACCADE(18.0ms)]}


### Summary: Event Matching
In this section, we:
1. Presented the concept of event-matching, a mapping between two sequences of events.
2. Provided examples of a few matching-schemes, which are different methods to match between event sequences.
3. Seen how different parameters might influence matching when using the same matching-scheme.
4. Shortly discussed the meaning of _cross matching_ and seen how it may affect matches when all other parameters remain the same.
5. Demonstrated "generic" matching based on multiple (unrelated) matching criteria

For further information, please read the relevant section in the Stratsev & Zemblys (2023) article.

We will now generate some example matching, that we can use in further analyses:

In [10]:
m = peyes.match(ra_events, mn_events, match_by='onset', max_onset_difference=15, allow_xmatch=False)
m

{FIXATION(236.0ms): FIXATION(236.0ms),
 SACCADE(18.0ms): SACCADE(20.0ms),
 PSO(6.0ms): PSO(10.0ms),
 SMOOTH_PURSUIT(1898.0ms): SMOOTH_PURSUIT(488.0ms),
 SACCADE(16.0ms): SACCADE(24.0ms),
 SMOOTH_PURSUIT(450.0ms): SMOOTH_PURSUIT(448.0ms),
 SACCADE(12.0ms): SACCADE(10.0ms),
 SMOOTH_PURSUIT(450.0ms): SMOOTH_PURSUIT(432.0ms),
 SACCADE(8.0ms): SACCADE(12.0ms),
 SMOOTH_PURSUIT(196.0ms): SMOOTH_PURSUIT(188.0ms)}

## Match Evaluation
Now that we know what event-matching is, we can look into different evaluation methods based on a matching. We will do this using a dedicated sub-module in the `pEYES` package, called `match_metrics`.  

As always, we assume one sequence od events is the _ground-truth_ and the other is the _prediction_. We also assume here that matches are one-to-one, meaning any _gt_ event can be matched to, at most, a single _pred_ event, and vice versa. While not all metrics require this strict condition, it is useful when attempting to interpret matches and in some cases, a necessity.  

We can divide the evaluation criteria into two general types: **feature**-based and **detection** based.  

### Feature Based Evaluation
Feature-based evaluation compares between the features of the matched events. It extracts the specified feature(s) from both events, and calculates the difference (or some other distance metric) between these features. For example:
* Amplitude Difference - difference in amplitude (in DVA) between each pair of matched events
* Center Pixel Distance - Euclidean distance (in pixels) between the "center" of each event in the pair
* Time Overlap - duration of overlap (in ms) between the two events
* Time IoU - Intersection over Union (unitless) of the times of the two events
* etc.

In [11]:
print(f"Duration Differences:\t{peyes.match_metrics.duration_difference(m)}")
print(f"Center Distances:\t{peyes.match_metrics.center_pixel_distance(m)}")
print(f"Time L2 Norm:\t{peyes.match_metrics.time_l2(m)}")
print(f"Time IoU:\t{peyes.match_metrics.time_iou(m)}")

Duration Differences:	[   0.   -2.   -4. 1410.   -8.    2.    2.   18.   -4.    8.]
Center Distances:	[0.00000000e+00 2.72931706e+00 5.20619045e+00 1.06826770e+02
 4.29598683e+00 8.90973202e-02 1.88567659e+00 1.83772023e-01
 5.87357120e+00 4.81840202e-01]
Time L2 Norm:	[   0.            2.            6.32455532 1404.01282045    8.
    2.            2.           13.41640786    6.32455532    8.        ]
Time IoU:	[1.         0.9        0.33333333 0.25711275 0.66666667 0.99555556
 0.83333333 0.96       0.42857143 0.95918367]


### Detection Based Evaluation
Detection-based metrics are based on the commonly used Signal Detection Theory metrics (recall, precision, d-prime, etc.). To calculate these metrics we **must** specify a subset of labels (event types, e.g. "saccade") as _positive_, which implicitly sets any other label in the sequence as _negative_.  
One exception to this rule is the `match_ratio()` metric, which doesn't strictly require specifying a _positive_ label, but enables doing so.  
Other available metrics are `recall`, `precision`, `d_prime`, etc.

In [12]:
ratio_all_labels = peyes.match_metrics.match_ratio(prediction=mn_events, matches=m, labels=None)
ratio_fixations = peyes.match_metrics.match_ratio(prediction=mn_events, matches=m, labels=[1])

ratio_all_labels, ratio_fixations

(0.5555555555555556, 1.0)

In [13]:
prec, rec, f1 = peyes.match_metrics.precision_recall_f1(ground_truth=ra_events, prediction=mn_events, matches=m, positive_label=[2])
far = peyes.match_metrics.false_alarm_rate(ground_truth=ra_events, prediction=mn_events, matches=m, positive_label=[2])
dprime, crit = peyes.match_metrics.d_prime_and_criterion(ground_truth=ra_events, prediction=mn_events, matches=m, positive_label=[2], correction="loglinear")

prec, rec, f1, far, dprime, crit

(0.6666666666666666,
 1.0,
 0.8,
 0.2857142857142857,
 1.8974663378613346,
 -0.4773437033421788)

## Summary: Match Evaluation
In this section, we examined different evaluation options for _(gt: pred)_ matches, both based on the matched events' features and based on Signal Detection Theory measures.