Run this quickstart notebook to download the data and start inspecting it. 


In [None]:
from eyemovement_data.utils import get_participant_ids, download_osf_data, clean_raw_data
from eyemovement_data.preprocessor import OriginalPreprocessor
from eyemovement_data.classifier import OriginalClassifier
from eyemovement_data.participant import Participant

## Download data from OSF

Run the following chunk to download all data from OSF. 

This will create a `data/raw` folder with subfolders in your current working directory. 

In [None]:
download_osf_data(out_dir="data/raw", train_test="both", participants="all", overwrite=False)

## Cleanup data

Run the following chunk to cleanup the raw data. 

This saves the EyeLink data into csv files and performs some convenience cleaning (e.g., removing unnecessary columns, relabeling target trajectories, ...).

The raw data will NOT be overwritten. Instead, data is saved to `data/clean/...`.

If you run into an error, this is likely because you have not added the location of your `Rscript.exe` to your PATH environment variable. In this case, use the `r_exe_path` argument to specify the location of your `Rscript.exe` file or add it to your PATH environment variable.  


In [None]:
clean_raw_data()

## Familiarizing yourself with the Participant() class

### Loading data for a participant

You can check for which participants data is available. 

In [None]:
available_ids = get_participant_ids(data_path="data/raw")
available_ids.sort()
available_ids

You can load and work with the data for a given participant using the `Participant()` class.

The `OriginalPreprocessor(Preprocessor)` and `OriginalClassifier(Classifier)` classes are used per default. 

If you want to implement your own preprocessing or classification protocols that work with the `Participant()` class, you can subclass the `Preprocessor()` and `Classifier()` classes. 

In [None]:
# Loading data for the participant with the first available id
p = Participant(id = available_ids[0],
                preprocessor = OriginalPreprocessor(), # this is the default preprocessor
                classifier = OriginalClassifier()) # this is the default classifier

First you want to load the data for a given participant. 

- `set_raw_data()` loads data from `data/raw` (or the specified path) and assigns it to `Participant.raw_data`.

- `set_clean_data()` loads data from `data/clean` (or the specified path) and assigns it to `Participant.clean_data`.

- `set_preprocessed_data()` loads data from `data/preprocessed` (or the specified path) and assigns it to `Participant.preprocessed_data`.

- `set_classified_data()` loads data from `data/classified` (or the specified path) and assigns it to `Participant.classified_data`.

If you have not yet preprocessed/ classified any data, you should see warnings because no data is found. 

In [None]:
p.set_raw_data("data/raw")
p.set_clean_data("data/clean")
p.set_preprocessed_data("data/preprocessed")
p.set_classified_data("data/classified")

### Preprocessing data for a participant

Run `preprocess_clean_data()` to call the preprocessing protocol of the supplied `Preprocessor()` class. 
You can also call `run save_data()` to save the preprocessed data to disk. This way preprocessing only has to be done once. 

Per default the `OriginalPreprocessor()` is used to apply preprocessing according to the paper. 

If you implement your own `Preprocessor()` classes you should at least copy code for correcting target positions in back-and-forth circle trials, as they were collected incorrectly during data collection. 

`preprocess_clean_data()` allows passing **kwargs. The `OriginalPreprocessor()` class utilizes the following arguments:
- `blink_offset`: Tuple of (before, after) offsets in milliseconds to classify samples around blinks also as blinks. Default is symmetrical (50, 50).
- `rolling_mean_window`: Window size for rolling mean smoothing of gaze data. Default is 1 (no smoothing).


In [None]:
p.preprocess_clean_data(blink_offset=(50, 50), rolling_mean_window=1)
p.save_data(out_path="data/preprocessed", what="preprocessed")

### Classifying data

Run `classify_preprocessed_data()` to assign even labels according the protocol in the `Classifier()` class. Again, you can save the classified data to only perform this step once. 

Per default the `OriginalClassifier()` is used, which establishes "ground-truth" labels based on a lightweight algorithm relying on dynamic velocity thresholds as explained in the paper. 

`classify_preprocessed_data()` allows passing **kwargs. The `OriginalClassifier()` class utilizes the following arguments:
- `threshold_time_window`: Time window in seconds to select data for calculating the velocity threshold for each trial. Defaults to (0, 30) covering the entire trial.
- `velocity_threshold_scaling_constant`: Scaling constant for the 75-percentile of of the velocity for calculating the threshold. Defaults to 1.5.
- `min_sac_duration`: Minimum duration for classified saccades in seconds. Defaults to 0.01.
- `min_fix_duration`: Minimum duration for classified fixations in seconds. Defaults to 0.01.
- `min_sp_duration`: Minimum duration for classified smooth pursuits in seconds. Defaults to 0.01.

In [None]:
p.classify_preprocessed_data(threshold_time_window=(0,30), 
                             velocity_threshold_scaling_constant=1.5,
                             min_sac_duration=0.01,
                             min_fix_duration=0.01,
                             min_sp_duration=0.01)
p.save_data(out_path="data/classified", what="classified")

### Plotting data

Finally, you can call `plot_trial()` to create a basic plot of the gaze- and target trajectories and the assigned "ground-truth" labels. 

This function is meant as a quick and easy way to create one particular plot and does not offer any customization options. 

In [None]:
p.plot_trial(3)

## Preprocessing and classifying data for all participants

You can run the following cell to preprocess and classify data for all participants according to the protocol explained in the paper. The preprocessed and classified data will be saved to disk and you can use it for further analysis. 

In [None]:
for id in available_ids:
    # Initialize participant
    p = Participant(id = id)

    # Load raw and cleaned data
    p.set_raw_data("data/raw")
    p.set_clean_data("data/clean")

    # Preprocess cleaned data and save it
    p.preprocess_clean_data()
    p.save_data(out_path="data/preprocessed", what="preprocessed")

    # Classify preprocessed data and save it
    p.classify_preprocessed_data()
    p.save_data(out_path="data/classified", what="classified")