#  Project Description

##  Background

Eye tracking is a technology that is used to measure the movement and position of the eye. Eye tracking can be used to obtain a variety of information, such as where someone is looking (also known as the gaze point). The raw eye tracking data cann also be used to engineer new features - eye tracking events - which can further be used to obtain more information. 

The types of eye tracking events that we can measure for include fixations, which are periods of time where the eye fixates on a target. There are saccades where the eyes move between points of fixations. There are also post-saccidic oscillations and glissades where the eye will oscillate after a saccade before settling to a fixation point. Post-saccadic oscillations overshoot the target, while glissades undershoot.

These types of events can be measured by applying different threshold techniques. I-VT applies a velocity threshold; If the speed between two gaze points is below a certain threshold, it is identified as a fixation. If the speed is above the threshold, it is a  saccade. There is also a dispersion/distance based method as well known as I-DT, that uses the distance between the gaze points instead to classify either fixations and saccades. These threshold algorithms are common in practice, but do not have the ability to classify more complex events. 

For the purpose of performing the I-VT  algorithm, a speed of 0.5px/ms was selected, and a dispersion of 1º was selected for I-DT.

## Dataset

For the following notebook, the dataset used is from a study performed in the University of Guelph DRiVE lab. Particpants wore eye-tracking glasses (Tobii Pro 3 glasses) and drove an OKTAL driving simulator. The dataset contains 74 participants that are randomly separated into train, test and validation sets. This will prevent leakage amonst the different particpant data.  Each of the files contains 3 different sets of data. There is some device information that is read in and in the sheet titled 'Event Data'. There is IMU sensor data in the sheet titled 'IMU Data'. The eye tracking data is in the sheet titled 'Gaze Data'. The sheets have 4, 22 and 11 columns respectively. The data  from the eye tracker is collected at 60Hz, and each participant file has roughly 20000 records in each file. The data is pre-split to ensure that there is no leakage between participant data, which could affect the training of the models, and to ensure a more consistent evaluation of the performance of the models.

## Procedure

1. The Gaze Data is read into the notebook using an Excel library.
2. For each participant file, any gaps in the data are filled in using linear interpolation first. 
3. Next, every two records are taken to calculate the labels using I-VT and I-DT and these are stored into a new dataframe. 

# Set Up Python Notebook

## Import Python Libraries

In [1]:
from os import listdir
import pandas as pd

# import spark libraries
import findspark
findspark.init()
from pyspark.sql import SparkSession

## Define Global Variables

In [3]:
datasets = ['dataset_training','dataset_testing','dataset_validation'] # directories for training, testing and valdiation

In [3]:
spark = SparkSession.builder.appName('CIS6180_FinalProject').getOrCreate()


JAVA_HOME is not set


PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.

## Import the Dataset

In [18]:
# iterate through all the files in the dataset (~6-8 minutes)
for dataset in datasets:
    data_files = listdir(dataset)
    for f in data_files:
        file_path = dataset + '/' + f # file path is the relative file path for the current excel file
        # print(file_path)

        df = pd.read_excel(file_path, sheet_name='Gaze Data')
        print(df.head())

   Type  Timestamp  Data Gaze2D X  Data Gaze2D Y  Data Gaze3D X  \
0  gaze   0.018720       0.413040       0.475212     190.063171   
1  gaze   0.038702       0.413051       0.474923     193.716840   
2  gaze   0.058794       0.412978       0.474673     195.663523   
3  gaze   0.078773       0.413090       0.474573     199.225129   
4  gaze   0.098865       0.413208       0.474575     201.761480   

   Data Gaze3D Y  Data Gaze3D Z  Data Eyeleft Gazeorigin X  \
0      50.176424    1021.396991                  35.464028   
1      51.505135    1041.127731                  35.464926   
2      52.289879    1050.687547                  35.464408   
3      53.438989    1071.169531                  35.463332   
4      54.192755    1086.274977                  35.460945   

   Data Eyeleft Gazeorigin Y  Data Eyeleft Gazeorigin Z  ...  \
0                  -9.170123                 -25.879579  ...   
1                  -9.169466                 -25.879662  ...   
2                  -9.169011    