# Analysis Pipeline for An Eye On Semantics study

This is the analysis pipeline for the eye tracking study An Eye On Semantics.
As we are going to use some functions for analysing the data, please check the comments within each function.
If you need more information, feel free to contact me at Federica.Magnabosco@mrc-cbu.cam.ac.uk

Let's start!

Import relevant stuff

In [8]:
import os
import numpy as np
import pandas as pd
import re
import seaborn as sns
import matplotlib.pyplot as plt
import pickle

In [9]:
from pygazeanalyser.edfreader import read_edf

Set the screen size the data was recorded with.

In [10]:
DISPSIZE = (1280, 1024)

In pixels, x,y=(0,0) is the top-left corner

Now, let's specify which trials need to be excluded because of errors during recording.
They will be selected inside (normalised)attach_info function.

(key=subject_ID : values=trials_ID)

In [11]:
exclude = {111: [120,121],
           128: [240,241],
           130: np.concatenate([[240],
                                 np.arange(120,133)]).tolist(),
           136: [80,81],
           141: np.arange(20,39).tolist()} 

Function to get all blinks that happened for a certain participant.

In [12]:
def get_blinks(data_edf):
    blinks=[]
    for i,trial in enumerate(data_edf):
        blinks.append(data_edf[i]['events']['Eblk']) # get all blinks
        blinks = [x for x in blinks if x != []]
    blinks = [item for sublist in blinks for item in sublist]    
    return blinks 

This function is similar to pygaze analyser read_edf. The main difference is that it does not distinguish between different trials, but instead creates a dataframe for each participant with columns time and event ID. This will be used to find when participants blinked, and exclude trials that happened during blinks.

In [13]:
def read_edf_plain(filename):
    """Get a dataframe containing only and all the events from the EDF file,
        with the trackertime, not dividing the trials"""
    # check if the file exists
    if os.path.isfile(filename):
        # open file
        f = open(filename, 'r')
    # raise exception if the file does not exist
    else:
        raise Exception("Error in read_edf: file '%s' does not exist" % filename)
    raw = f.readlines()
    f.close()
    # variables
    data = []
    event = []
    timepoint = []
    # loop through all lines
    for line in raw:
        if line[0:4] == "SFIX":
            l = line[9:]
            timepoint.append(int(l))
            event.append(line[0:4])
        elif line[0:4] == "EFIX":
            l = line[9:]
            l = l.split('\t')
            timepoint.append(int(l[1]))
            event.append(line[0:4])
         			# saccade start
        elif line[0:5] == 'SSACC':
            l = line[9:]
            timepoint.append(int(l))
            event.append(line[0:5])
         			# saccade end
        elif line[0:5] == "ESACC":
            l = line[9:]
            l = l.split('\t')
            timepoint.append(int(l[1]))
            event.append(line[0:5])
         			# blink start
        elif line[0:6] == "SBLINK":
            l = line[9:]
            timepoint.append(int(l))
            event.append(line[0:6])
         			# blink end
        elif line[0:6] == "EBLINK":
            l = line[9:]
            l = l.split('\t')
            timepoint.append(int(l[1]))
            event.append(line[0:6])
   	# return
    data = pd.DataFrame()
    data['time'] = np.array(timepoint)
    data['event'] = np.array(event)
    
    return data

This is probably the most important function. In here those fixations that happened within the Area of Interesest (aka, the target word) are selected. Plus, checked that those are valid fixations. We are retrieving some information sent to Eyelink during the experiment.

In [14]:
def fixAOI(data_edf,data_plain):
    """Get all fixations within AOI. Checks that are not followed by a regression
    after the first fixation within the AOI + trials that do not contain
    a blink or error"""
    # get all fixation durations within a certain AOI for all trials for one subject
    # dur_all is the list where we include all the fixation durations that
    # respect certain inclusion criteria
    # 
    dur_all = []
    regressed = []
    time_before_fix = []
    tot_number_fixation = []
    # loop over each trial (remember that data_edf is a list, and each trial is a dict)        
    for i,trial in enumerate(data_edf):
        # select the trial's events, and select only the END OF FIXATION EVENT ('Efix')
        # and save the relative information into a pd.DataFrame
        # (specifically Efix is a list of lists, now converted to a pd.DataFrame)
        pd_fix = pd.DataFrame.from_records(trial['events']['Efix'],
                                           columns=['start',
                                                    'end',
                                                    'duration',
                                                    'x',
                                                    'y'])
        # save the total number of fixations that happened in a certain trial
        tot_number_fixation.append(len(pd_fix))
        
        # exclude those trials where all fixations are outside the screen
        # it used to happen if there was an error in the gaze position detection
        # it should not be a problem now, considering that gaze is required
        # to trigger the start of the sentence
        if (((pd_fix['x'] < 0).all()) or ((pd_fix['x'] > DISPSIZE[0]).all())):
            dur_all.append('Error in fixation detection')
            time_before_fix.append(np.nan)
            regressed.append(np.nan)
        elif (((pd_fix['y'] < 0).all()) or ((pd_fix['y'] > DISPSIZE[1]).all())):
            dur_all.append('Error in fixation detection')
            time_before_fix.append(np.nan)
            regressed.append(np.nan)
        # or when no fixations have been detected
        elif len(pd_fix)<2:
            dur_all.append('Error in fixation detection')
            time_before_fix.append(np.nan)
            regressed.append(np.nan)
        # if not all fixations are outside the screen, we'll go ahead    
        
        else:    
            # the following info is gathered from the stimulus presentation software
            # (specifically we look at the "msg" events, and extract the relevant info)
            # we know which messages to look at, because for each trial the messages sent
            # are always the same
            
            # tuple indicating dimension of each sentence in pixels
            size = re.search("SIZE OF THE STIMULUS: (.*)\n",trial['events']['msg'][3][1])
            size = eval(size.group(1)) # tuple (width,height)
            
            # size of each letter in pixels
            # this should is identical for each sentence, equal to 11 in our study
            unit = re.search("NUMBER OF CHARACTERS: (.*)\n",trial['events']['msg'][4][1])
            unit = size[0]/eval(unit.group(1)) 
            
            # position (in characters) of the target word inside the sentence
            pos_target = re.search("POS TARGET INSIDE BOX: (.*)\n",trial['events']['msg'][5][1])
            pos_target = eval(pos_target.group(1))
            
            # position (in pixels) of the target word
            # convert width to the position in x, y cohordinates where the sentence starts
            # stimulus starting position is = centre of x_axis screen - half size of the sentence
            # because sentence is presented aligned to the centre of the screen
            pos_startstim = DISPSIZE[0]/2-size[0]/2

            # no need to calculate y as always in the same position at the centre
            # there's only one line
            
            # get x and y position of the target word
            # as pos_target is in characters, we need to mutiply each letter*unit
            # including in the AOI also half space preceding and half space
            # following the target word
            # tuple (x0,x1) position of the target word in pixels 
            target_x = (pos_startstim+(pos_target[0]*unit)-unit/2,pos_startstim+(pos_target[1]*unit)+unit/2)
            
            # AOI limits for target_y position is two times the height of the letters
            # no need to be too strict as there's just one line
            target_y = (DISPSIZE[1]/2-size[1]*2,DISPSIZE[1]/2+size[1]*2)
            
            # get all fixations on target word
            # Specifically this gets x position if targetstart_position<fixation_position<targetend_position
            # for both x and y 
            fixAOI = pd_fix['x'][(target_x[0]<pd_fix['x']) &
                                 (pd_fix['x']<target_x[1]) &
                                 (target_y[0]<pd_fix['y']) &
                                 (pd_fix['y']<target_y[1])]
            
            # check if at least one fixation on target 
            if len(fixAOI)>0:
                
                # check this is first pass
                # by checking if all previous fixations (indetify by index) have a smaller x_position
                # (meaning that this is a first pass fixation, not skipped on a first instance)
                if all(pd_fix['x'][0:fixAOI.index[0]]<fixAOI[fixAOI.index[0]]):
                
                # check if this is not the last fixation (if so, automatically there's no regression)    
                    if (len(pd_fix['x'])>(fixAOI.index[0]+1)):
                        dur_all.append(pd_fix['duration'][(target_x[0]<pd_fix['x']) &
                                                  (pd_fix['x']<target_x[1]) &
                                                  (target_y[0]<pd_fix['y']) &
                                                  (pd_fix['y']<target_y[1])
                                                  ])
                        time_before_fix.append(pd_fix['start'][fixAOI.index[0]] - pd_fix['start'][0])
                        
                    # check if there is a regression to BEFORE the target area
                    # and save it in the relevant list 
                        if (pd_fix['x'].iloc[fixAOI.index[0]+1]>target_x[0]):
                        # if there wasn't a regression, save as 0
                            regressed.append(0)
                        else:
                        # if there was a regression, save as 1
                            regressed.append(1)
                    else:
                    # if this is the last fixation, than there is no regression
                    # so, get the fixations (otherwise it will give an error
                    # when explicitly looking if fixation is followed by regression)
                    # however, there should always be a fixation after on the square
                        dur_all.append(pd_fix['duration'][(target_x[0]<pd_fix['x']) &
                                                  (pd_fix['x']<target_x[1]) &
                                                  (target_y[0]<pd_fix['y']) &
                                                  (pd_fix['y']<target_y[1])
                                                  ])
                        # get rest of the data
                        time_before_fix.append(pd_fix['start'][fixAOI.index[0]] - pd_fix['start'][0])
                        regressed.append(0)
                        
                # this relates to the fixation being first pass
                else:
                    dur_all.append('Nope - not fixated during first pass')       
                    time_before_fix.append(np.nan)
                    regressed.append(np.nan)
                    
            else:
                # if there is no fixation, return empty Series
                # by returning empy series, we know that the word has been skipped
                dur_all.append(pd_fix['duration'][(target_x[0]<pd_fix['x']) &
                                                  (pd_fix['x']<target_x[1]) &
                                                  (target_y[0]<pd_fix['y']) &
                                                  (pd_fix['y']<target_y[1])
                                                  ])
                time_before_fix.append(np.nan)
                regressed.append(np.nan)
                
            # now check blinks
            # first, check at least one fixation and that the object is not string
            # remember that when trials are to be discarded there's a string
            
            if ((len(dur_all[-1])>0) and (type(dur_all[-1])!=str)):
            
                # get trackertime of the start of the first fixation on target words
                start = pd_fix['start'].iloc[dur_all[-1].index[0]]
            
                # get the  position in the events only list of data
                plain_start = data_plain[data_plain['time']==start].index[0]
                r = range(plain_start-2,plain_start+4)
                
                # this range because each blink generates an artefactual saccade event
                # so each blink is surrounded by SSACC and ESAC events
                # different ends (i.e., -2, +4) to include also EFIX event
                
                # this basically checks whether the fixation is immediately
                # preceded or followed by a blink
                if (any(data_plain['event'].iloc[r]=='SBLINK')
                    or any(data_plain['event'].iloc[r]=='EBLINK')):
                        dur_all[-1] = 'There was a blink'
                        time_before_fix[-1] = np.nan
                        regressed[-1] = np.nan
                        
    # returning:
    # dur_all is a list (len = 400) of series,
    #     each series contains all the fixations within AOI for that trial
    #     each element consist in index = ordinal number of fixation for that trial
    #     (eg if the first fixation within AOI was the 6th, index=6)
    #      duration = duration of the fixation in ms
    # regressed is binary info if that trial was regressed (0=not regressed, 1=regressed)
    # time_before_fix is duration in ms from start of trial to first fixation made on AOI
    # tot_number_fixation is how many fixation happened in that trial
    
    return dur_all, regressed, time_before_fix, tot_number_fixation

In [15]:
# function to get both FFD and GD
def ffdgd(dur_all):
    """Get first-fixation, gaze duration, whether it was fixated"""
    # dur_all is list of series, output from fixAOI
    # set everything to zero
    # this is convenient as words skipped have durations = 0 and fixated = 0
    FFD = np.zeros(len(dur_all))
    GD = np.zeros(len(dur_all))
    fixated = np.zeros(len(dur_all))
    
    # n_prior_fixation is set as nan, so specifying it only if there's a fixation
    n_prior_fixations = np.empty((len(dur_all)))
    n_prior_fixations[:] = np.nan
    
    # loop over each trial
    for i,trial in enumerate(dur_all):
        # if error in fixation, then indicate as NAN
        if type(trial)==str: # as all trials that should be excluded are strings ...
            if trial == 'Nope - not fixated during first pass':
                # ... apart if skipped during first pass and only then fixated
                # note that in this case it counts as skipped (not as invalid!)
                pass # so it stays to zero (it counts as skipped)
            else:
                # this will allow us to discard them from the analysis
                # when regressions, blinks or errors
                FFD[i] = np.nan
                GD[i] = np.nan
                fixated[i] = np.nan
        else:
            # check if there is at least one fixation in AOI, otherwise FFD=GD=0                
            if len(dur_all[i])>0:
                
                # check if the FIRST fixation fixation is btween 80-600ms long
                if (np.array(dur_all[i])[0]>80 and np.array(dur_all[i])[0]<600):
                    FFD[i] = np.array(dur_all[i])[0]
                    GD[i] = np.array(dur_all[i])[0]
                    fixated[i] = 1
                # if fixation is longer than 600ms, remove it
                elif np.array(dur_all[i])[0]>=600:
                    FFD[i] = np.nan
                    GD[i] = np.nan
                    fixated[i] = 1
                # check id there's more than one fixation
                # remember dur_all is a list of series, so dur_all[i] is a series with all fixations in that trial
                # (given that the first is first pass)
                if len(dur_all[i])>1:
                    # if more than one, check whether they are consecutive
                    # fixations inside the AOI by checking the index
                    for j in range(len(dur_all[i].index)-1):
                        if ((dur_all[i].index[j+1]-dur_all[i].index[j]==1) &
                            (FFD[i]>0)):
                            GD[i] += np.array(dur_all[i])[j+1]
                        else:
                            # this break is needed to get out of the loop as soon
                            # as two fixations are not consecutive
                            break
                n_prior_fixations[i] = dur_all[i].index[0]
    # returns numpy arrays (fixated is binary)
    return FFD, GD, fixated, n_prior_fixations
 

In [16]:
def attach_info(eyedata, regressed, time_before_ff, tot_number_fixation, n_prior_fix):
    """Include single word and sentence level statistics from relevant file"""
    # here will be saved each participant data (so it's list of dataframes)
    eyedata_all = []
    
    # loop over participants
    for i,participantdata in enumerate(eyedata):
        # this is the log file
        stimuli = pd.read_csv(f"{base_dir}/{participant[i]}/{participant[i]}.txt",
                              header=0,sep='\t',
                              encoding='ISO-8859-1')        
        # include info about order of presentation for each trial to FFD/GD
        eye_all_i = pd.DataFrame(list(zip(eyedata[i], stimuli.trialnr)),
                                 index=stimuli.IDstim,
                                 columns=['ms','trialnr'])
        # and append other info from eye tracker
        eye_all_i['time_before_ff'] = time_before_ff[i]
        eye_all_i['regressed'] = regressed[i]
        eye_all_i['n_tot_fix'] = tot_number_fixation[i]
        eye_all_i['n_prior_fix'] = n_prior_fix[i]
        
        # check if need to exclude any trial
        if participant[i] in exclude:
            for tr_number in exclude[participant[i]]:
                eye_all_i.drop(eye_all_i[eye_all_i['trialnr']==tr_number].index,
                               inplace=True)
        
        # merge data from eye tracker and predictors
        a = pd.merge(eye_all_i, stimuliALL[['ID',
                                     'ConcM',
                                     'LEN', 
                                     'UN2_F', 
                                     'UN3_F', 
                                     'Orth', 
                                     'OLD20',
                                     'FreqCount', 
                                     'LogFreq(Zipf)', 
                                     'V_MeanSum',
                                     'A_MeanSum', 
                                     'mink3_SM', 
                                     'BLP_rt',
                                     'BLP_accuracy', 
                                     'similarity', 
                                     'Position',	
                                     'PRECEDING_Frequency',	
                                     'PRECEDING_LogFreq(Zipf)',	
                                     'LENprec',
                                     'cloze',
                                     'Sim',
                                     'plausibility'
                                     #'SemD', # when include SemD, you loose 4 trials (don't have SemD for them)
                                     #'AoA'
                                     ]], how='inner',left_on=['IDstim'],
                                         right_on=['ID'])
        # append participant data
        eyedata_all.append(a)
        # remove na
        eyedata_all[-1] = eyedata_all[-1][eyedata_all[-1].iloc[:,0].notna()]    
    return eyedata_all

In [17]:
def attach_mean_centred(eyedata,regressed, time_before_ff, tot_number_fixation, n_prior_fix):
    """Supply the participants gd/ffd to obtain a gd/ffd_all that is mean_centred.
    Will do the sane as attach info, just for normalised predictors."""
    norm_eyedata_all = []
    for i,participantdata in enumerate(eyedata):
        stimuli = pd.read_csv(f"{base_dir}/{participant[i]}/{participant[i]}.txt",
                              header=0, sep='\t', encoding='ISO-8859-1')
        normalized_all_i = pd.DataFrame(list(zip(participantdata,
                                                 stimuli.trialnr)),
                                        index=stimuli.IDstim,
                                        columns=['ms','trialnr'])
        normalized_all_i['time_before_ff'] = time_before_ff[i]
        normalized_all_i['regressed'] = regressed[i]
        normalized_all_i['n_tot_fix'] = tot_number_fixation[i]
        normalized_all_i['n_prior_fix'] = n_prior_fix[i]
        normalized_all_i['time_before_ff'] = (normalized_all_i['time_before_ff'] - \
                                              normalized_all_i['time_before_ff'].mean() \
                                                  ) / normalized_all_i['time_before_ff'].std()
        normalized_all_i['n_tot_fix'] = (normalized_all_i['n_tot_fix'] - \
                                              normalized_all_i['n_tot_fix'].mean() \
                                                  ) / normalized_all_i['n_tot_fix'].std()
        normalized_all_i['n_prior_fix'] = (normalized_all_i['n_prior_fix'] - \
                                              normalized_all_i['n_prior_fix'].mean() \
                                                  ) / normalized_all_i['n_prior_fix'].std()            

            
        # check if need to exclude any trial
        if participant[i] in exclude:
            for tr_number in exclude[participant[i]]:
                normalized_all_i.drop(normalized_all_i[normalized_all_i['trialnr']==tr_number].index,
                               inplace=True)
        
        # get predictors
        normalized_all_i = pd.merge(normalized_all_i,
                                    stimuliALL_norm,
                                    how='inner',
                                    left_on=['IDstim'],
                                    right_on=['ID'])
        
        norm_eyedata_all.append(normalized_all_i)
        norm_eyedata_all[-1] = norm_eyedata_all[-1][norm_eyedata_all[-1].iloc[:,0].notna()]
    return norm_eyedata_all
        

Function are finished, this gets the work done!

In [18]:
DISPSIZE = (1280, 1024)
# Add information about target word
path = "C:/Users/fm02/OwnCloud/Sentences/"

In [19]:
os.chdir(path)
# stimuliALL = pd.read_excel('stimuli_all_onewordsemsim.xlsx', engine='openpyxl')
stimuliALL = pd.read_excel('stimuli_all_onewordsemsim.xlsx', engine='openpyxl')

Normalise predictors (this is needed for lme4).

In [20]:
# include only numeric predictors
to_norm = stimuliALL[['ConcM',
                      'LEN',
                      'UN2_F',
                      'UN3_F',
                      'Orth',
                      'OLD20',
                      'FreqCount',
                      'LogFreq(Zipf)', 
                     'V_MeanSum',
                     'A_MeanSum',
                     'mink3_SM',
                     'BLP_rt',
                     'BLP_accuracy',
                     'similarity',
                     'Position',
                     'PRECEDING_Frequency',
                     'PRECEDING_LogFreq(Zipf)',	
                     'LENprec',
                     'Predictability',
                     'cloze',
                     'plausibility',
                     'Sim'
                     #'SemD', # when include SemD, you loose 4 trials (don't have SemD for them)
                     #'AoA'
                     ]]

In [21]:
to_norm = (to_norm-to_norm.mean())/to_norm.std()
# put back Word and ID
stimuliALL_norm = stimuliALL[['Word','ID']].join(to_norm)

In [22]:
# import data from the participants
base_dir = "//cbsu/data/Imaging/hauk/users/fm02/EOS_data/EOS_data_fromLab"

In [23]:
participant = [
        101, 
        102, 
        103, 
        104, 
        105,
        106,
        107,
        108,
        109,
        110,
        111,
        112,
        113,
        114,
        115,
        116,
        117,
        118,
        119,
        120,
        121,
        122,
        123,
        124,
        125,
        126,
        127,
        128,
        129,
        130,
        131,
        132,
        133,
        134,
        135,
        136,
        137,
        138,
#        139 # excluded - not completed testing
        140,
        141
        ]

Initialise dicts where to store participants data (key is participant, value is data)

In [24]:
data = {}
data_plain = {}

This loops over each participant to import the relevant data.

In [None]:
for i in participant:
    print(f'Reading EDF data participant {i}')
    data[i] = read_edf(f"{base_dir}/{i}/{i}.asc",
                       "STIMONSET","STIMOFFSET")
    data_plain[i] = read_edf_plain(f"{base_dir}/{i}/{i}.asc")

Initialise the list where things will be saved.

In [1]:
dur = []
regressed = []
time_before = []
nfix = []

ffd = []
gd = []
prfix = []
nprior_fixs = []

In [None]:
# loop over participants  
for subject in data.keys():
    print(f'Extracting data participant {subject}')
    # this basically extracts fixations within AOI
    dur_i, regressed_i, time_before_i, nfix_i = fixAOI(data[subject],
                                                        data_plain[subject])
    # this saves relevant info from relevant fixations
    FFD_i, GD_i, fixated_i, nprior_fixs_i = ffdgd(dur_i)
    
    # append data for each subject
    dur.append(dur_i)
    regressed.append(regressed_i)
    time_before.append(time_before_i)
    nfix.append(nfix_i)
    
    ffd.append(FFD_i)
    gd.append(GD_i)
    prfix.append(fixated_i)
    nprior_fixs.append(nprior_fixs_i)    

In [None]:
# merge data and non-normalised predictors
gd_all = attach_info(gd, regressed, time_before, nfix, nprior_fixs)
ffd_all = attach_info(ffd, regressed, time_before, nfix, nprior_fixs)

In [None]:
# merge data and normalised predictors
norm_gd_all = attach_mean_centred(gd, regressed, time_before, nfix, nprior_fixs)
norm_ffd_all = attach_mean_centred(ffd, regressed, time_before, nfix, nprior_fixs)

In [None]:
# this retrieves additional information about participant, save in demographi info
pis = pd.read_excel("//cbsu/data/Imaging/hauk/users/fm02/EOS_data/Demographic_info.xlsx",
                    usecols=["Participant ID",
                             "Gender",	
                             "Age",	
                             "Handedness",	
                             "% Correct Responses"])

The final bit takes care of transforming the data in long format (which is necessary for fitting Linear Mixed Effect model om R).

In [None]:
for i,df in enumerate(norm_ffd_all):
    norm_ffd_all[i] = norm_ffd_all[i].rename(columns={'LogFreq(Zipf)':'LogFreqZipf',
                                                    'PRECEDING_LogFreq(Zipf)':'PRECEDING_LogFreqZipf'})
    norm_ffd_all[i]['Subject'] = [i]*len(norm_ffd_all[i])
    norm_ffd_all[i]['Gender'] = [pis["Gender"][pis["Participant ID"] == participant[i]].values[0]] \
                                        *len(norm_ffd_all[i])
    norm_ffd_all[i]['Age'] = [pis["Age"][pis["Participant ID"] == participant[i]].values[0]] \
                                        *len(norm_ffd_all[i])
      
# GD  - no regressions and normalised predictors, which is probably what we will use   

for i,df, in enumerate(norm_gd_all):
    norm_gd_all[i] = norm_gd_all[i].rename(columns={'LogFreq(Zipf)':'LogFreqZipf',
                                                    'PRECEDING_LogFreq(Zipf)':'PRECEDING_LogFreqZipf'})
    norm_gd_all[i]['Subject'] = [i]*len(norm_gd_all[i])
    norm_gd_all[i]['Gender'] = [pis["Gender"][pis["Participant ID"] == participant[i]].values[0]] \
                                        *len(norm_gd_all[i])
    norm_gd_all[i]['Age'] = [pis["Age"][pis["Participant ID"] == participant[i]].values[0]] \
                                        *len(norm_gd_all[i])

Let's run one example.

In [26]:
data = {}
data_plain = {}
i = 101
print(f'Reading EDF data participant {i}')
data[i] = read_edf(f"{base_dir}/{i}/{i}.asc",
                   "STIMONSET","STIMOFFSET")
data_plain[i] = read_edf_plain(f"{base_dir}/{i}/{i}.asc")

Reading EDF data participant 101


In [27]:
dur = []
regressed = []
time_before = []
nfix = []

In [28]:
ffd = []
gd = []
prfix = []
nprior_fixs = []

In [29]:
for subject in data.keys():
    print(f'Extracting data participant {subject}')
    dur_i, regressed_i, time_before_i, nfix_i = fixAOI(data[subject],
                                                        data_plain[subject])
    
    FFD_i, GD_i, fixated_i, nprior_fixs_i = ffdgd(dur_i)
    
    dur.append(dur_i)
    regressed.append(regressed_i)
    time_before.append(time_before_i)
    nfix.append(nfix_i)
    
    ffd.append(FFD_i)
    gd.append(GD_i)
    prfix.append(fixated_i)
    nprior_fixs.append(nprior_fixs_i)    

Extracting data participant 101


In [30]:
gd_all = attach_info(gd, regressed, time_before, nfix, nprior_fixs)
ffd_all = attach_info(ffd, regressed, time_before, nfix, nprior_fixs)
norm_gd_all = attach_mean_centred(gd, regressed, time_before, nfix, nprior_fixs)
norm_ffd_all = attach_mean_centred(ffd, regressed, time_before, nfix, nprior_fixs)

In [31]:
pis = pd.read_excel("//cbsu/data/Imaging/hauk/users/fm02/EOS_data/Demographic_info.xlsx",
                    usecols=["Participant ID",
                             "Gender",	
                             "Age",	
                             "Handedness",	
                             "% Correct Responses"])

In [32]:
i = 0

Let's explore the first trial of the first participant.

In [71]:
trial = data[101][0]

In [72]:
type(trial)

dict

In [73]:
trial.keys()

dict_keys(['x', 'y', 'size', 'time', 'trackertime', 'events'])

In [74]:
trial['events'].keys()

dict_keys(['Sfix', 'Ssac', 'Sblk', 'Efix', 'Esac', 'Eblk', 'msg'])

Save 'Efix' events

In [75]:
pd_fix = pd.DataFrame.from_records(trial['events']['Efix'],
                                   columns=['start',
                                            'end',
                                            'duration',
                                            'x',
                                            'y'])

We get information from all the fixations that happened during that trial.

In [76]:
pd_fix

Unnamed: 0,start,end,duration,x,y
0,1455422,1455802,381,72.0,513.3
1,1455837,1456107,271,343.4,506.0
2,1456117,1456255,139,323.4,506.1
3,1456280,1456452,173,416.3,505.4
4,1456475,1456640,166,503.1,502.3
5,1456664,1456865,202,593.6,497.3
6,1456886,1457021,136,649.2,500.2
7,1457049,1457225,177,765.6,507.5
8,1457253,1457417,165,876.1,510.3
9,1457435,1457684,250,933.0,512.1


In [77]:
size = re.search("SIZE OF THE STIMULUS: (.*)\n",trial['events']['msg'][3][1])
size = eval(size.group(1))
print(f"The stimulus dimensions, in pixels, is {size}, respectively x and y axes")

The stimulus dimensions, in pixels, is (803, 23), respectively x and y axes


In [78]:
unit = re.search("NUMBER OF CHARACTERS: (.*)\n",trial['events']['msg'][4][1])
unit = size[0]/eval(unit.group(1))
print(f"Each character occupies {unit} pixels")

Each character occupies 11.0 pixels


In [79]:
pos_target = re.search("POS TARGET INSIDE BOX: (.*)\n",trial['events']['msg'][5][1])
pos_target = eval(pos_target.group(1))
print(f"Position of the target word within the sentence, from character {pos_target[0]} to character {pos_target[1]}")

Position of the target word within the sentence, from character 37 to character 44


In [80]:
pos_startstim = DISPSIZE[0]/2-size[0]/2
pos_startstim

238.5

This is the location of the target word, in pixels (x_start, x_end) (y_start, y_end)

In [81]:
target_x = (pos_startstim+(pos_target[0]*unit)-unit/2,pos_startstim+(pos_target[1]*unit)+unit/2)
target_y = (DISPSIZE[1]/2-size[1]*2,DISPSIZE[1]/2+size[1]*2)
print(target_x, target_y)

(640.0, 728.0) (466.0, 558.0)


In [87]:
fixAOI = pd_fix['x'][(target_x[0]<pd_fix['x']) &
                                      (pd_fix['x']<target_x[1]) &
                                      (target_y[0]<pd_fix['y']) &
                                      (pd_fix['y']<target_y[1])]
durAOI = pd_fix['duration'][(target_x[0]<pd_fix['x']) &
                                      (pd_fix['x']<target_x[1]) &
                                      (target_y[0]<pd_fix['y']) &
                                      (pd_fix['y']<target_y[1])]

In [88]:
fixAOI

6    649.2
Name: x, dtype: float64

In [89]:
durAOI

6    136
Name: duration, dtype: int64

In [90]:
for i in range(len(fixAOI)):
    print(f"The {fixAOI.index[i]+1}th fixation was on the target word and lasted {durAOI.values[i]} ms")

The 7th fixation was on the target word and lasted 136 ms
