#  Project Description

##  Background

Eye tracking is a technology that is used to measure the movement and position of the eye. Eye tracking can be used to obtain a variety of information, such as where someone is looking (also known as the gaze point). The raw eye tracking data cann also be used to engineer new features - eye tracking events - which can further be used to obtain more information. 

The types of eye tracking events that we can measure for include fixations, which are periods of time where the eye fixates on a target. There are saccades where the eyes move between points of fixations. There are also post-saccidic oscillations and glissades where the eye will oscillate after a saccade before settling to a fixation point. Post-saccadic oscillations overshoot the target, while glissades undershoot.

These types of events can be measured by applying different threshold techniques. I-VT applies a velocity threshold; If the speed between two gaze points is below a certain threshold, it is identified as a fixation. If the speed is above the threshold, it is a  saccade. There is also a dispersion/distance based method as well known as I-DT, that uses the distance between the gaze points instead to classify either fixations and saccades. These threshold algorithms are common in practice, but do not have the ability to classify more complex events. 

For the purpose of performing the I-VT  algorithm, a speed of 0.5px/ms was selected, and a dispersion of 1º was selected for I-DT.

## Dataset

For the following notebook, the dataset used is from a study performed in the University of Guelph DRiVE lab. Particpants wore eye-tracking glasses (Tobii Pro 3 glasses) and drove an OKTAL driving simulator. The dataset contains 72 participants that are randomly separated into train, test and validation sets. This will prevent leakage amonst the different particpant data.  Each of the files contains 3 different sets of data. There is some device information that is read in and in the sheet titled 'Event Data'. There is IMU sensor data in the sheet titled 'IMU Data'. The eye tracking data is in the sheet titled 'Gaze Data'. The sheets have 4, 22 and 11 columns respectively. The data  from the eye tracker is collected at 60Hz, and each participant file has roughly 20000 records in each file. The data is pre-split to ensure that there is no leakage between participant data, which could affect the training of the models, and to ensure a more consistent evaluation of the performance of the models.

## Procedure

1. The Gaze Data is read into the notebook using an Excel library.
2. For each participant file, any gaps in the data are filled in using linear interpolation first. 
3. Next, every two records are taken to calculate the labels using I-VT and I-DT and these are stored into a new dataframe. 

# Set Up Python Notebook

## Import Python Libraries

In [1]:
import os
from os import listdir
import pandas as pd

os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"

# import spark libraries
import findspark
findspark.init()
from pyspark.sql import SparkSession

import pyspark.pandas as ps
from pyspark.sql.functions import col
from pyspark.sql.functions import when
from pyspark.sql.functions import count
from pyspark.sql.window import Window
from pyspark.sql import functions as F

## Define Global Variables

In [2]:
datasets = ['dataset_training','dataset_testing','dataset_validation'] # directories for training, testing and valdiation
sheet = 'Gaze Data' # name of sheet with the eye tracking data

column_names = ['Type', 'Timestamp', 'Data_Gaze2D_X', 'Data_Gaze2D_Y', 'Data_Gaze3D_X',
       'Data_Gaze3D_Y', 'Data_Gaze3D_Z', 'Data_Eyeleft_Gazeorigin_X',
       'Data_Eyeleft_Gazeorigin_Y', 'Data_Eyeleft_Gazeorigin_Z',
       'Data_Eyeleft_Gazedirection_X', 'Data_Eyeleft_Gazedirection_Y',
       'Data_Eyeleft_Gazedirection_Z', 'Data_Eyeleft_Pupildiameter',
       'Data_Eyeright_Gazeorigin_X', 'Data_Eyeright_Gazeorigin_Y',
       'Data_Eyeright_Gazeorigin_Z', 'Data_Eyeright_Gazedirection_X',
       'Data_Eyeright_Gazedirection_Y', 'Data_Eyeright_Gazedirection_Z',
       'Data_Eyeright_Pupildiameter']

## Create Spark Session

In [3]:
spark = SparkSession.builder.appName("Cis6180_FinalProject").config("spark.executor.memory", "8g").config("spark.executor.cores", 4).getOrCreate()

## Import the Dataset

In [4]:
# iterate through all the files in the dataset (~5-20 minutes)
for ds_num,dataset in enumerate(datasets):
    data_files = listdir(dataset)
    for f_num,f in enumerate(data_files):
        file_path = dataset + '/' + f # file path is the relative file path for the current excel file
        print(f'Folder {ds_num+1}/{len(datasets)}; File {f_num+1}/{len(data_files)} {file_path}')
        
        # read the dataframe as python and then convert to pyspark since it has to be read in from excel spreadsheet
        ppdf = pd.read_excel(io=file_path,sheet_name=sheet) # read excel as pandas
        ppdf = ppdf.interpolate(method='linear',limit_direction='both')

        psdf = ps.DataFrame(ppdf) # participant spark data frame

Folder 1/3; File 1/51 dataset_training/eye-data-10327.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 2/51 dataset_training/eye-data-12471.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 3/51 dataset_training/eye-data-18514.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 4/51 dataset_training/eye-data-20116.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 5/51 dataset_training/eye-data-21051.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 6/51 dataset_training/eye-data-21895.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 7/51 dataset_training/eye-data-22013.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 8/51 dataset_training/eye-data-23090.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 9/51 dataset_training/eye-data-23753.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 10/51 dataset_training/eye-data-25462.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 11/51 dataset_training/eye-data-26370.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 12/51 dataset_training/eye-data-28334.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 13/51 dataset_training/eye-data-29048.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 14/51 dataset_training/eye-data-34473.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 15/51 dataset_training/eye-data-35217.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 16/51 dataset_training/eye-data-35745.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 17/51 dataset_training/eye-data-41517.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 18/51 dataset_training/eye-data-46121.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 19/51 dataset_training/eye-data-46307.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 20/51 dataset_training/eye-data-47274.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 21/51 dataset_training/eye-data-47402.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 22/51 dataset_training/eye-data-48737.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 23/51 dataset_training/eye-data-51637.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 24/51 dataset_training/eye-data-52063.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 25/51 dataset_training/eye-data-53209.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 26/51 dataset_training/eye-data-53349.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 27/51 dataset_training/eye-data-54455.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 28/51 dataset_training/eye-data-55367.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 29/51 dataset_training/eye-data-55746.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 30/51 dataset_training/eye-data-56135.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 31/51 dataset_training/eye-data-56233.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 32/51 dataset_training/eye-data-59774.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 33/51 dataset_training/eye-data-63923.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 34/51 dataset_training/eye-data-64765.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 35/51 dataset_training/eye-data-69876.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 36/51 dataset_training/eye-data-70253.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 37/51 dataset_training/eye-data-70615.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 38/51 dataset_training/eye-data-71291.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 39/51 dataset_training/eye-data-76001.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 40/51 dataset_training/eye-data-79820.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 41/51 dataset_training/eye-data-83008.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 42/51 dataset_training/eye-data-84384.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 43/51 dataset_training/eye-data-86812.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 44/51 dataset_training/eye-data-91060.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 45/51 dataset_training/eye-data-94231.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 46/51 dataset_training/eye-data-95397.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 47/51 dataset_training/eye-data-95985.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 48/51 dataset_training/eye-data-96194.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 49/51 dataset_training/eye-data-96679.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 50/51 dataset_training/eye-data-97448.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 1/3; File 51/51 dataset_training/eye-data-97973.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 1/11 dataset_testing/eye-data-11868.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 2/11 dataset_testing/eye-data-21182.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 3/11 dataset_testing/eye-data-22446.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 4/11 dataset_testing/eye-data-23921.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 5/11 dataset_testing/eye-data-38989.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 6/11 dataset_testing/eye-data-46094.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 7/11 dataset_testing/eye-data-54097.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 8/11 dataset_testing/eye-data-72799.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 9/11 dataset_testing/eye-data-75601.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 10/11 dataset_testing/eye-data-91260.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 2/3; File 11/11 dataset_testing/eye-data-97051.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 1/11 dataset_validation/eye-data-11085.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 2/11 dataset_validation/eye-data-14732.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 3/11 dataset_validation/eye-data-17381.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 4/11 dataset_validation/eye-data-19733.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 5/11 dataset_validation/eye-data-26585.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 6/11 dataset_validation/eye-data-29097.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 7/11 dataset_validation/eye-data-37883.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 8/11 dataset_validation/eye-data-39692.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 9/11 dataset_validation/eye-data-41473.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 10/11 dataset_validation/eye-data-51553.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


Folder 3/3; File 11/11 dataset_validation/eye-data-64087.xlsx


  ppdf = ppdf.interpolate(method='linear',limit_direction='both')


In [6]:
psdf.head(20)