# Parsing Raw Eye Tracking Files

## What you will learn in this tutorial:

* how to parse raw eye tracking files
* how to extract experiment information using patterns
* how to create a custom dataset definition to load a complete dataset of multiple files

## Preparations

We import `pymovements` as the alias `pm` for convenience.

In [None]:
import pymovements as pm

Let's start by downloading a toy dataset `ToyDatasetEyeLink` that contains `*.asc` files:

In [None]:
dataset = pm.Dataset("ToyDatasetEyeLink", path='data/ToyDatasetEyeLink')
dataset.download()

This dataset includes `*.asc` files that store raw eye-tracking data along with synchronization messages. Below, we’ll inspect the files included in the dataset:

In [None]:
asc_files = list(dataset.path.glob('**/*.asc'))
asc_files

Let’s display the first 20 lines of one of the files to get a sense of its structure:

In [None]:
!head -n 20 data/ToyDatasetEyeLink/raw/aeye-lab-pymovements-toy-dataset-eyelink-a970d09/raw/subject_1_session_1.asc

We can see that this file is a converted version of an `*.edf` file created by EyeLink.

Let’s try loading one of these files directly using `pm.gaze.io.from_asc`:

In [None]:
gaze = pm.gaze.io.from_asc(file=asc_files[0])
gaze

This function automatically loads the raw eye-tracking data and attempts to infer the experimental settings used.

Let’s inspect a few rows from the resulting `GazeDataFrame`:

In [None]:
gaze.frame

We can see that timestamps (column time), pupil diameter (column pupil), and raw pixel coordinates (column pixel) are extracted automatically.

Let’s now take a look at the experimental metadata that was retrieved:

In [None]:
gaze.experiment

All relevant experimental metadata have been successfully extracted, such as the eye tracker model and the screen resolution used during recording.

### Defining Custom Patterns for Data Extraction

Now let’s define our own patterns to extract additional information from the `*.asc` files.
We can do this using the parameter `patterns` using `pm.gaze.io.from_asc`.

`patterns` accepts either a list of custom patterns to match additional columns or a key identifying predefined, eye-tracker-specific patterns.

Let’s define a set of custom patterns to extract more information from parsed messages and show the resulting `GazeDataFrame`:

In [None]:
patterns = [
    {
        'pattern': 'SYNCTIME_READING_SCREEN',
        'column': 'task',
        'value': 'reading',
    },
    {
        'pattern': 'SYNCTIME_JUDO',
        'column': 'task',
        'value': 'judo',
    },
    r'TRIALID (?P<trial_id>\d+)',
]

gaze = pm.gaze.io.from_asc(file=asc_files[0],
                           patterns=patterns)
gaze.frame

We can see that the information for `task` and `trial_id` has been added.

The `trial_id` was extracted from messages such as `MSG 2762689 TRIALID 0`, while the task value was obtained from messages like `MSG 2814942 SYNCTIME_JUDO`.

### Creating a Custom Dataset with Defined Patterns
Let’s create a custom dataset definition to load all `*.asc` files, including the patterns we defined earlier.
First, we need to define the experiment:

In [None]:
experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='lower left',
    sampling_rate=1000,
)

Next, we define the filename format, which also encodes subject and session information:

In [None]:
filename_format = r'subject_{subject_id:d}_session_{session_id:d}.asc'

We also specify the data types for the information extracted from the filename:

In [None]:
filename_format_dtypes = {
    'subject_id': int,
    'session_id': int,
}

We use the patterns defined above, stored in the `custom_read_kwargs` variable, when creating the dataset definition:

In [None]:
custom_read_kwargs = {
    'patterns': patterns,
    'schema': {'trial_id': int,
               }
}

dataset_definition = pm.DatasetDefinition(
    name='ToyDatasetRaw',
    experiment=experiment,
    filename_format={'gaze': filename_format},
    filename_format_schema_overrides={'gaze': filename_format_dtypes},
    custom_read_kwargs={'gaze': custom_read_kwargs},
    has_files={'gaze': True,
               'precomputed_events': False,
               'precomputed_reading_measures': False,
               }
)

Let’s create a dataset and load the data using the dataset definition we just set up:

In [None]:
dataset = pm.Dataset(
    definition=dataset_definition,
    path='data/ToyDatasetEyeLink',
)
dataset.load()

Let’s inspect the first `GazeDataFrame` in this dataset:

In [None]:
dataset.gaze[0].frame

## What you have learned in this tutorial:

* how to handle `*.asc` files
* how to create a custom dataset loading all files and parsing custom messages
* how to load the dataset into your working memory