# Parsing Raw Eye Tracking Files

## What you will learn in this tutorial:

* how to parse raw eye tracking files
* how to extract experiment information using patterns
* how to create a custom dataset definition to load a complete dataset of multiple files

## Preparations

We import `pymovements` as the alias `pm` for convenience.

In [1]:
import pymovements as pm

Let's start by downloading a toy dataset `ToyDatasetEyeLink` that contains `*.asc` files:

In [2]:
dataset = pm.Dataset("ToyDatasetEyeLink", path='data/ToyDatasetEyeLink')
dataset.download()

INFO:pymovements.dataset.dataset:        You are downloading the pymovements Toy Dataset EyeLink. Please be aware that pymovements does not
        host or distribute any dataset resources and only provides a convenient interface to
        download the public dataset resources that were published by their respective authors.

        Please cite the referenced publication if you intend to use the dataset in your research.
        


Using already downloaded and verified file: data/ToyDatasetEyeLink/downloads/pymovements-toy-dataset-eyelink.zip
Extracting pymovements-toy-dataset-eyelink.zip to data/ToyDatasetEyeLink/raw


100%|█████████████████████████████████████████████| 4/4 [00:00<00:00,  8.69it/s]


This dataset includes `*.asc` files that store raw eye-tracking data along with synchronization messages. Below, we’ll inspect the files included in the dataset:

In [3]:
asc_files = list(dataset.path.glob('**/*.asc'))
asc_files

[PosixPath('data/ToyDatasetEyeLink/raw/aeye-lab-pymovements-toy-dataset-eyelink-a970d09/raw/subject_1_session_1.asc'),
 PosixPath('data/ToyDatasetEyeLink/raw/aeye-lab-pymovements-toy-dataset-eyelink-a970d09/raw/subject_2_session_1.asc')]

Let’s display the first 20 lines of one of the files to get a sense of its structure:

In [4]:
!head -n 20 data/ToyDatasetEyeLink/raw/aeye-lab-pymovements-toy-dataset-eyelink-a970d09/raw/subject_1_session_1.asc

** CONVERTED FROM D:\SamplePymovements\results\sub_1\sub_1.edf using edfapi 4.2.1 Win32  EyeLink Dataviewer Sub ComponentApr 12 2021 on Fri Mar 10 18:07:57 2023
** DATE: Wed Mar  8 09:25:20 2023
** TYPE: EDF_FILE BINARY EVENT SAMPLE TAGGED
** VERSION: EYELINK II 1
** SOURCE: EYELINK CL
** EYELINK II CL v6.12 Feb  1 2018 (EyeLink Portable Duo)
** CAMERA: EyeLink USBCAM Version 1.01
** SERIAL NUMBER: CLU-DAB50
** CAMERA_CONFIG: DAB50200.SCD
** RECORDED BY SleepAlc
** SREB2.2.299 WIN32 LID:20A87A96 Mod:2023.03.08 11:03 MEZ
**

MSG	2091650 !CMD 1 select_parser_configuration 0
MSG	2091659 !CMD 0 fixation_update_interval = 50
MSG	2091659 !CMD 0 fixation_update_accumulate = 50
MSG	2091681 !CMD 1 auto_calibration_messages = YES
MSG	2095865 DISPLAY_COORDS 0 0 1279 1023
MSG	2095865 RETRACE_INTERVAL  16.646125144
MSG	2095865 ENVIRONMENT   OpenGL on Windows (6, 2, 9200, 2, '')


We can see that this file is a converted version of an `*.edf` file created by EyeLink.

Let’s try loading one of these files directly using `pm.gaze.io.from_asc`:

In [5]:
gaze = pm.gaze.io.from_asc(file=asc_files[0])
gaze

time,pupil,pixel
i64,f64,list[f64]
2154556,778.0,"[138.1, 132.8]"
2154557,778.0,"[138.2, 132.7]"
2154558,778.0,"[138.2, 132.3]"
2154559,778.0,"[138.1, 131.9]"
2154560,777.0,"[137.9, 131.6]"
…,…,…
2339287,619.0,"[637.7, 531.7]"
2339288,619.0,"[637.9, 531.8]"
2339289,618.0,"[637.8, 531.6]"
2339290,618.0,"[637.6, 531.4]"

name,onset,offset,duration
str,i64,i64,i64


This function automatically loads the raw eye-tracking data and attempts to infer the experimental settings used.

Let’s inspect a few rows from the resulting `GazeDataFrame`:

In [6]:
gaze.frame

time,pupil,pixel
i64,f64,list[f64]
2154556,778.0,"[138.1, 132.8]"
2154557,778.0,"[138.2, 132.7]"
2154558,778.0,"[138.2, 132.3]"
2154559,778.0,"[138.1, 131.9]"
2154560,777.0,"[137.9, 131.6]"
…,…,…
2339287,619.0,"[637.7, 531.7]"
2339288,619.0,"[637.9, 531.8]"
2339289,618.0,"[637.8, 531.6]"
2339290,618.0,"[637.6, 531.4]"


We can see that timestamps (column time), pupil diameter (column pupil), and raw pixel coordinates (column pixel) are extracted automatically.

Let’s now take a look at the experimental metadata that was retrieved:

In [7]:
gaze.experiment

All relevant experimental metadata have been successfully extracted, such as the eye tracker model and the screen resolution used during recording.

### Defining Custom Patterns for Data Extraction

Now let’s define our own patterns to extract additional information from the `*.asc` files.
We can do this using the parameter `patterns` using `pm.gaze.io.from_asc`.

`patterns` accepts either a list of custom patterns to match additional columns or a key identifying predefined, eye-tracker-specific patterns.

Let’s define a set of custom patterns to extract more information from parsed messages and show the resulting `GazeDataFrame`:

In [8]:
patterns = [
                {
                    'pattern': 'SYNCTIME_READING_SCREEN',
                    'column': 'task',
                    'value': 'reading',
                },
                {
                    'pattern': 'SYNCTIME_JUDO',
                    'column': 'task',
                    'value': 'judo',
                }, 
                r'TRIALID (?P<trial_id>\d+)',
            ]

gaze = pm.gaze.io.from_asc(file=asc_files[0],
                          patterns=patterns)
gaze.frame

time,pupil,trial_id,task,pixel
i64,f64,str,str,list[f64]
2154556,778.0,"""0""",,"[138.1, 132.8]"
2154557,778.0,"""0""",,"[138.2, 132.7]"
2154558,778.0,"""0""",,"[138.2, 132.3]"
2154559,778.0,"""0""",,"[138.1, 131.9]"
2154560,777.0,"""0""",,"[137.9, 131.6]"
…,…,…,…,…
2339287,619.0,"""12""","""judo""","[637.7, 531.7]"
2339288,619.0,"""12""","""judo""","[637.9, 531.8]"
2339289,618.0,"""12""","""judo""","[637.8, 531.6]"
2339290,618.0,"""12""","""judo""","[637.6, 531.4]"


We can see that the information for `task` and `trial_id` has been added.

The `trial_id` was extracted from messages such as `MSG 2762689 TRIALID 0`, while the task value was obtained from messages like `MSG 2814942 SYNCTIME_JUDO`.

### Creating a Custom Dataset with Defined Patterns
Let’s create a custom dataset definition to load all `*.asc` files, including the patterns we defined earlier.
First, we need to define the experiment:

In [9]:
experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='lower left',
    sampling_rate=1000,
)

Next, we define the filename format, which also encodes subject and session information:

In [10]:
filename_format = r'subject_{subject_id:d}_session_{session_id:d}.asc'

We also specify the data types for the information extracted from the filename:

In [11]:
filename_format_dtypes = {
    'subject_id': int,
    'session_id': int,
}

We use the patterns defined above, stored in the `custom_read_kwargs` variable, when creating the dataset definition:

In [12]:
custom_read_kwargs = {
    'patterns': patterns,
    'schema': {'trial_id': int,
               }
}

dataset_definition = pm.DatasetDefinition(
    name='ToyDatasetRaw',
    experiment=experiment,
    filename_format={'gaze':filename_format},
    filename_format_schema_overrides={'gaze':filename_format_dtypes},
    custom_read_kwargs={'gaze': custom_read_kwargs},
    has_files={'gaze': True,
               'precomputed_events': False,
               'precomputed_reading_measures' :False,
              }
)

Let’s create a dataset and load the data using the dataset definition we just set up:

In [13]:
dataset = pm.Dataset(
    definition=dataset_definition,
    path='data/ToyDatasetEyeLink',
)
dataset.load()

  0%|          | 0/2 [00:00<?, ?it/s]

subject_id,session_id,filepath
i64,i64,str
1,1,"""aeye-lab-pymovements-toy-datas…"
2,1,"""aeye-lab-pymovements-toy-datas…"

time,pupil,trial_id,task,subject_id,session_id,pixel
i64,f64,i64,str,i64,i64,list[f64]
2154556,778.0,0,,1,1,"[138.1, 132.8]"
2154557,778.0,0,,1,1,"[138.2, 132.7]"
2154558,778.0,0,,1,1,"[138.2, 132.3]"
2154559,778.0,0,,1,1,"[138.1, 131.9]"
2154560,777.0,0,,1,1,"[137.9, 131.6]"
…,…,…,…,…,…,…
2339287,619.0,12,"""judo""",1,1,"[637.7, 531.7]"
2339288,619.0,12,"""judo""",1,1,"[637.9, 531.8]"
2339289,618.0,12,"""judo""",1,1,"[637.8, 531.6]"
2339290,618.0,12,"""judo""",1,1,"[637.6, 531.4]"

subject_id,session_id,name,onset,offset,duration
i64,i64,str,i64,i64,i64

time,pupil,trial_id,task,subject_id,session_id,pixel
i64,f64,i64,str,i64,i64,list[f64]
2762704,783.0,0,,2,1,"[139.1, 142.8]"
2762705,783.0,0,,2,1,"[139.3, 142.8]"
2762706,783.0,0,,2,1,"[139.5, 142.4]"
2762707,783.0,0,,2,1,"[139.6, 141.9]"
2762708,783.0,0,,2,1,"[139.5, 141.3]"
…,…,…,…,…,…,…
2903401,705.0,12,"""judo""",2,1,"[762.7, 605.5]"
2903402,706.0,12,"""judo""",2,1,"[762.6, 605.2]"
2903403,706.0,12,"""judo""",2,1,"[762.5, 605.0]"
2903404,706.0,12,"""judo""",2,1,"[762.7, 604.9]"

subject_id,session_id,name,onset,offset,duration
i64,i64,str,i64,i64,i64


Let’s inspect the first `GazeDataFrame` in this dataset:

In [14]:
dataset.gaze[0].frame

time,pupil,trial_id,task,subject_id,session_id,pixel
i64,f64,i64,str,i64,i64,list[f64]
2154556,778.0,0,,1,1,"[138.1, 132.8]"
2154557,778.0,0,,1,1,"[138.2, 132.7]"
2154558,778.0,0,,1,1,"[138.2, 132.3]"
2154559,778.0,0,,1,1,"[138.1, 131.9]"
2154560,777.0,0,,1,1,"[137.9, 131.6]"
…,…,…,…,…,…,…
2339287,619.0,12,"""judo""",1,1,"[637.7, 531.7]"
2339288,619.0,12,"""judo""",1,1,"[637.9, 531.8]"
2339289,618.0,12,"""judo""",1,1,"[637.8, 531.6]"
2339290,618.0,12,"""judo""",1,1,"[637.6, 531.4]"


## What you have learned in this tutorial:

* how to handle `*.asc` files
* how to create a custom dataset loading all files and parsing custom messages
* how to load the dataset into your working memory