### Data information

Dataset has the following attributes:

- type: Event type
- duration:  Duration of the event, if it is a fixation, then it is the fixation duration
- sac_amplitude: Amplitude of the eye saccades
- sac_endpos_x: `x coordinate` of saccades end position 
- sac_endpos_y: `y coordinate` of saccades end position
- sac_startpos_x: `x coordinate` of saccades start position
- sac_startpos_y: `y coordinate` of saccades start position
- sac_vmax: Maximal velocity of saccade
- fix_avgpos_x: Average `x coordinate` position 
- fix_avgpos_y: Average `y coordinate` position
- fix_avgpupilsize: Average pupil size of the eye
- overlapping: Whether there are two bounding boxes that are overlapping (e.g. a face, being partially occluded by another head)
- fix_samebox: Whether the current fixation is within the same bounding box (e.g. same face) as the previous one.
- id: Subject ID
- picID: Picture ID
- trialnum: Trial Number
- fix_type: Type of the fixation.
- onset: Event onset time.


### Task Summary
- Analyse dataset and find relation between variables if any

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('sub-45/eeg/sub-45_task-WLFO_events.tsv', sep='\t')
df.head(2)

In [None]:
df.info()

we observe that: 
- 'type' and 'fix_type' are categorial values.
- duration, sac_amplitude, sac_endpos_x, sac_endpos_y, sac_startpos_x, sac_startpos_y, sac_vmax, fix_avgpos_x, fix_avgpos_y, fix_avgpupilsize, overlapping, fix_samebox, id, picID, trailnum, onset are numerical attributes
 

### Understanding categorical variables 

In [None]:
df['fix_type'].unique()

### Types:
- NonetoNone:Background to Background
- NonetoHF - HFtoNone: Background to Human Face - Human Face to Backgound
- HFtoHF: Human Face to Human Face
- NonetoOS - OStoNone: Background to Outside the image - Outside to background
- NonetoHH - HHToNone: Backgound to human head, in difference to human face
- OLtoNone: Overlapping bounding box, no unique attribution possible
- HFtoNH: Human Face to non human head (e.g. cardboard, or mannequin)
- NHtoNone: Non human head (e.g. cardboard, or mannequin) to Backgound
- OStoOS: Outside stimulus to outside stimulus
- NHtoNH: Non human head (e.g. cardboard, or mannequin) to Non human head (e.g. cardboard, or mannequin)
- OLtoHF: Overlapping bounding box to Human Face?
- HFtoHH: Human Face to Human Head?
- HHtoHH: Human Head to Human Head
- NonetoOL: Backgound to Overlapping bounding box
- OstoHF - OStoNH: Self decoded
- OStoHH - HFtoOS: Self decoded

In [None]:
df['type'].unique()

### Triggers
- 213, 214, 215 : Recalibration settings for eye tracker
- 180: End of stimulus  

`Null Values` in the dataset

In [None]:
df.isnull().sum()

# Insights on distribution 
Plotting individual attributes

In [None]:
df.describe()

In [None]:
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
from wordcloud import wordcloud
import warnings
warnings.filterwarnings('ignore')

Define Plotting functions

In [None]:
def Plot_dis(text):
  f= plt.figure(figsize=(21,5))
  
  ax=f.add_subplot(131)
  sns.distplot(df[text],color='b',ax=ax)
  ax.set_title('Distribution of {}'.format(text))

  ax=f.add_subplot(132)
  sns.distplot(df[(df.fix_type == 'NonetoNone')][text], color='g',ax=ax)
  ax.set_title('Distribution of {} for Background-Background'.format(text))
  
  ax=f.add_subplot(133)
  sns.distplot(df[(df.fix_type == 'NonetoHF') & (df.fix_type == 'HFtoNone')][text],color='c',ax=ax)
  ax.set_title('Distribution of {} for Background-HumanFace and vice-versa'.format(text))

  f1= plt.figure(figsize=(13,5))
  
  ax=f1.add_subplot(121)
  sns.distplot(df[(df.fix_type == 'HFtoHF')][text],color='g',ax=ax)
  ax.set_title('Distribution of {} for HumanFace-HumanFace'.format(text))

  ax=f1.add_subplot(122)
  sns.distplot(df[(df.fix_type == 'HFtoOS') & (df.fix_type == 'OStoHF')][text],color='g',ax=ax)
  ax.set_title('Distribution of {} for HumanFace-Outside and vice versa'.format(text))

In [None]:
def Plot_box(text):
  fig, axes = plt.subplots(figsize=(25, 15))
  fig.suptitle('Box plot of {}'.format(text))
  sns.boxplot(ax=axes, data=df, y=text, x='fix_type')

In [None]:
def Plot_scat(parameter1, parameter2,var1,var2):
  
  f= plt.figure(figsize=(25,5))
  ax=f.add_subplot(121)
  sns.scatterplot(x=parameter1,y=parameter2,hue=var1,data=df,ax=ax)
  ax.set_title('Relationship between {} and {} in function of {}'.format(parameter1,parameter2,var1))
  
  ax=f.add_subplot(122)
  sns.scatterplot(x=parameter1, y=parameter2,hue=var2,data=df,ax=ax)
  ax.set_title('Relationship between {} and {} in function of {}'.format(parameter1,parameter2,var2))

In [None]:
Plot_dis('sac_amplitude')

It appears that we mostly have right skewed distribution and, 
- People are are spending more time in background to background fixation which seems strange! Normally we have tendency to look foreground objects in the image.
- People are are spending more time doing human to human fixation which seems okay. This supports the hypothesis that we have tendency to look foreground objects in the image.

Futhermore, we can see that most test subjects are either looking/exploring the image background and then the image foreground -- and we don't have a significant effect at the boundary i.e. between image foreground and background. It is like we are segmenting the image and looking at individual pieces.

In [None]:
Plot_dis('onset')

It looks like the onset distribution is somewhat periodic with peaks at 500, 1500, 2500. 

Also, at the peak or at the middle of a period, the subjects have higher tendency of looking at the background of the image. 

In [None]:
Plot_box('sac_amplitude')

### Plotting scatter plots for eye fixations

In [None]:
Plot_scat('sac_startpos_x', 'sac_startpos_y', 'sac_amplitude', 'onset')

In [None]:
Plot_scat('fix_avgpos_x', 'fix_avgpos_y', 'sac_amplitude', 'onset')

In [None]:
Plot_scat('fix_avgpos_x', 'fix_avgpos_y', 'sac_vmax', 'fix_avgpupilsize')

## Finding Correlation

In [None]:
correlation = df[['duration', 'sac_amplitude', 'sac_endpos_x', 'sac_endpos_y', 'sac_startpos_x', 'sac_startpos_y', 'sac_vmax', 'fix_avgpos_x', 'fix_avgpos_y', 'fix_avgpupilsize', 'overlapping', 'fix_samebox', 'onset']].corr()

fig, ax = plt.subplots(figsize=(10,10))  

sns.heatmap(correlation, annot=True, cmap='Greens', ax=ax)
plt.title('Correlation between numerical parameters')

It appears that `sac_amplitude` and strong correlation with `sac_vmax, fix_avgpos_x, fix_avgpos_y, fix_avgpupilsize`

In [None]:
sns.catplot( kind='count', x='fix_type',data=df, height=8.27, aspect=20/5)

In [None]:
sns.catplot(kind='count', x='type',data=df, height=8.27, aspect=20/5)

In [None]:
sns.catplot(x='fix_type',kind='count',hue='type',data=df, aspect=20/5)

In [None]:
# sns.jointplot(x='onset',y='sac_amplitude',data=df)

In [None]:
# sns.lmplot(y='sac_amplitude',x='onset',hue='fix_type',col='type',data=df)