## Downloading one of the public datasets

In [None]:
import polars as pl

import pymovements as pm

In [None]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()

In [None]:
dataset.path

In [None]:
dataset.paths.raw

Next, we load our dataset into memory to be able to work with it:

This way we fill two attributes with data.
First we have the `fileinfo` attribute which holds all the basic information for files:

We notice that for each filepath a `text_id` and `page_id` is specified.

We have also loaded our gaze data into the dataframes in the `gaze` attribute:

Apart from some trial identifier columns we see the columns `time` and `pixel`.

The last two columns refer to the pixel coordinates at the timestep specified by `time`.


We are also able to just take a subset of the data by specifying values of the fileinfo columns.
The key refers to the column in the `fileinfo` dataframe.
The values in the dictionary can be of type `bool`, `int`,  `float` or `str`, but also lists and ranges 


Now we selected only a small subset of our data.

In [None]:
dataset.pix2deg()

dataset.gaze[0]

For transforming our positional data into velocity data, we will use the *Savitzky-Golay* differentiation filter.

We can also specify some additional parameters for this method:

There is also the more general `apply()` method, which can be used to apply both transformation and event detection methods.

## Detecting events

In [None]:
dataset.detect_events('ivt')

dataset.events[0]

In [None]:
# The screen dimensions are automatically sourced from the experiment configuration
screen = dataset.definition.experiment.screen
print(f'Screen resolution: {screen.width_px} x {screen.height_px} pixels')
print(f'Valid pixel range: x=[0, {screen.width_px}], y=[0, {screen.height_px}]')

# Detect out-of-screen events — screen boundaries are auto-filled from Gaze.experiment.screen
dataset.detect_events('out_of_screen')

# Report trackloss percentage for each gaze object (similar to VWPre's mark_trackloss output)
for i, gaze in enumerate(dataset.gaze):
    oos_events = gaze.events.frame.filter(pl.col('name') == 'out_of_screen')
    n_events = len(oos_events)
    total_samples = len(gaze.samples)

    if n_events > 0:
        total_oos_samples = int(oos_events['duration'].sum())
        pct = round(total_oos_samples / total_samples * 100, 2)
        print(f'\nGaze[{i}]: {n_events} out-of-screen events detected')
        print(f'  {total_oos_samples}/{total_samples} samples ({pct}%) marked as trackloss')
    else:
        print(f'\nGaze[{i}]: 0/{total_samples} samples marked as trackloss (0%) — clean data')

In [None]:
dataset.detect_events('microsaccades', minimum_duration=8)

dataset.events[0].frame.filter(pl.col('name') == 'saccade').head()

In [None]:
dataset.apply('idt', dispersion_threshold=2.7, name='fixation.ivt')

dataset.events[0].frame.filter(pl.col('name') == 'fixation.ivt').head()

The event dataframe currently only holds the `name`, `onset`, `offset` and `duration` of an event (additionally we have some more identifier columns at the beginning).

We now want to compute some additional properties for each event.
Event properties are things like peak velocity, amplitude and dispersion during an event.

We start out with computing the dispersion:

We notice that a new column with the name `dispersion` has appeared in the event dataframe.

We can also pass a list of properties to compute all of our desired properties in a single run.
Let's add the amplitude and peak velocity:

## Plotting our data

In [None]:
pm.plotting.main_sequence_plot(dataset.events[0])

In [None]:
dataset.save()

In [None]:
preprocessed_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

dataset.load(events=True, preprocessed=True, subset=subset)

display(dataset.gaze[0])
display(dataset.events[0])