In [1]:
import pandas as pd
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Raw Dataset


Three sets of features were computed: 46 global gaze features, 23 local gaze features, and 11 context features, yielding 80 features overall.

**Global gaze features**:

- Global gaze features (In paper, listed in Table 2) were independent of the words. There were two categories of gaze features: eye behavior descriptives and miscellaneous gaze properties.

- For each of these five behavior measurements, we computed the min, max, mean, median, standard deviation, skew, kurtosis, and range, thereby yielding 40 features.

- The remaining six global features consisted of miscellaneous gaze properties.

**Local gaze features:**

- Unlike global features, local gaze features (In paper, listed in Table 3) were sensitive to the words being read. There were three categories of local features: fixation types, word characteristics, and eye movement metrics.


**Context features:**

- Context features were an amalgamation of reading times and situational factors. They included timing features and the difficulty and value of the text (In paper,see Section 3.2)





Global Features:
--------------------------

- Fixation Duration     : Duration in milliseconds of a fixation
- Saccade Duration      : Duration in milliseconds between two subsequent fixations
- Saccade Distance      : Distance in pixels between two subsequent fixations
- Saccade Angle         : Angle in degrees between the x-axis and the saccade
- Pupil Diameter        : Diameter of pupil (standardized within-participant)
- Number of Saccades    : Total number of saccades within window
- Horizontal Saccade    : Proportion Proportion of saccades with angles no more than 30 degrees above or below the   horizontal axis
- Fixation Dispersion   : Root mean square of the distances from each fixation to the average fixation position in   the window
- Fixation Saccade Ratio: Ratio of fixation duration to saccade duration
- Blink Count           : Total number of blinks within window
- Blink Duration        : Proportion of time spent blinking

Local Features:
------------------------

- First Pass Fixations:First fixation on each word during the first pass through text
- Regression Fixations:Fixations on words that were already passed
- Gaze Fixations:Consecutive fixations on the same word
- Single Fixations:Fixations on words that were only fixated on once
- Non-word Fixations:Fixations not on a word
- End-of-clause Fixations:Number of fixations on the last word of a sentence
- Word Length:Number of characters within a word
- Hypernym Depth:Semantic specificity of a word (i.e. “crimson”, is more specific than “red”, which is more
  specific than “color”)
- Global Frequency:Overall frequency of a word in English as measured by the CELEX data (Baayen et al. 1995)
- Synset Size:Number of synonyms of a word
- Line Cross Saccades: Proportion of saccades with a vertical distance greater than the height of a line of text
- Words Skipped: Proportion of words that were skipped between fixations on subsequent words
- Reading Time Ratio: Ratio of actual to expected reading time (200ms times the # of words read)

Context features:
---------------------------
- Session time 
- text time 
- page time 
- session page number
- text page number
- Average page time
- Previous page time
- Previous page time ratio
- Current difficulty
- current value
- Previous difficulty
- previous value

# Dataset

In [2]:
df=pd.read_csv('FixationStatistics.txt', sep='\t', lineterminator='\r')
df.head(n=100)

Unnamed: 0,ParticipantID,TrialID,TrialIndex,SegmentID,SegmentIndex,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportType,FirstReportContent,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
0,\nBE7-P1002-Memphis,MainText,1.0,,57.0,1294808.0,1322271.0,27463.0,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0,0.0,none,none,,,,,,11.0,200.0,221.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,132.924202,210.010957,210.956418,80.856303,793.888636,713.032333,2.8609,8.535334,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,9.0,353.875658,240.549308,174.528607,4.791021,359.971189,355.180168,-0.857144,-1.710707,10.0,6.684761,5.48446,3.70356,0.407012,10.050112,9.6431,-0.457247,-1.396898,1.0,1.972,0.0,,,,,,,,,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,261.0,102.0,0.55,,,0.0,,,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56
1,\nBE7-P1002-Memphis,MainText,1.0,,56.0,1274590.0,1294808.0,20218.0,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0,0.0,none,none,,,,,,11.0,183.0,209.090909,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,155.985073,274.13892,289.943023,66.923698,926.600337,859.676639,1.787491,2.228981,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,9.0,351.088886,198.966328,183.765405,3.750852,359.611378,355.860527,-0.270329,-2.570038,10.0,4.602806,5.216583,3.52537,0.660415,11.163859,10.503445,0.681916,-0.361619,1.0,2.195,0.0,,,,,,,,,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,209.0,120.0,0.82,,,0.0,133.0,,0.09,208.0,128.0,0.73,134.0,,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,,-0.078,-0.657
2,\nBE7-P1002-Memphis,MainText,1.0,,55.0,1251125.0,1274590.0,23465.0,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0,0.0,none,none,,,,,,13.0,167.0,188.538462,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,118.155517,256.603244,288.874145,66.587272,887.889747,821.302475,1.815159,2.015756,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,11.0,5.640831,98.631825,160.84248,0.174334,353.873826,353.636743,1.189756,-0.759343,12.0,5.031037,5.015407,3.913982,0.608951,12.285944,11.676993,0.515436,-0.530908,1.0,1.695,0.0,,,,,,,,,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,200.0,107.0,0.69,,,0.0,,,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234
3,\nBE7-P1002-Memphis,MainText,1.0,,54.0,1227726.0,1251125.0,23399.0,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0,0.0,none,none,,,,,,14.0,158.0,198.714286,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,87.00475,155.084632,201.615678,16.933348,801.90246,784.969112,3.179386,10.736262,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,12.0,30.463093,154.156733,173.179469,3.238186,358.733251,355.495065,0.381106,-2.24494,13.0,3.882799,4.405353,3.557801,0.092029,10.979019,10.88699,0.545283,-0.608007,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,176.0,94.0,0.86,,,0.0,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059
4,\nBE7-P1002-Memphis,MainText,1.0,,53.0,1207674.0,1227726.0,20052.0,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0,1.0,self-caught,other,1382120000000.0,,1219182.0,1219182.0,11508.0,12.0,191.5,190.166667,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,119.230911,268.14413,270.211194,68.773714,799.07515,730.301436,1.494452,0.925619,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,10.0,181.425399,179.690831,183.741867,0.296653,358.835734,358.53908,-0.00049,-2.567767,11.0,4.685158,4.913032,4.044616,0.661556,11.926495,11.264939,0.699785,-0.512241,1.0,1.692,1.0,233.0,233.0,,233.0,233.0,0.0,,,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,190.0,41.0,1.0,,,0.0,233.0,,0.08,190.0,41.0,1.0,183.0,,0.08,,,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554
5,\nBE7-P1002-Memphis,MainText,1.0,,52.0,1187056.0,1207674.0,20618.0,1382120000000.0,1382120000000.0,44:50.2,45:10.9,0.828618,67.0,12.0,6.0,0.0,12650.0,16650.0,1199706.0,1203706.0,0.0,none,none,,,,,,12.0,191.5,190.083333,47.959184,116.0,267.0,151.0,-0.106415,-0.658047,0.437,11.0,33.0,115.090909,175.656742,16.0,500.0,484.0,1.892711,2.117848,11.0,191.70898,288.700837,239.109446,82.725345,758.423429,675.698084,1.609357,1.329828,11.0,173.957922,194.634535,168.54549,0.091508,359.459246,359.367738,-0.185212,-2.083062,10.0,7.546411,144.321033,181.594749,0.379793,356.469993,356.0902,0.483505,-2.276052,11.0,5.809363,5.890486,3.223682,0.538783,11.202957,10.664174,-0.355108,0.053551,1.0,1.802,0.0,,,,,,,,,82.0,-0.780376,-0.765,0.338393,-1.412361,0.068867,1.481229,0.225383,-0.601105,197.0,44.0,0.92,,,0.0,,,0.0,200.0,45.0,0.83,142.0,35.0,0.17,,,0.0,0.857143,0.181818,81.0,0.011,-0.472,0.554,0.256
6,\nBE7-P1002-Memphis,MainText,1.0,,51.0,1166688.0,1187056.0,20368.0,1382120000000.0,1382120000000.0,44:29.9,44:50.2,0.837152,74.0,14.0,7.0,3.0,12650.0,16650.0,1179338.0,1183338.0,0.0,none,none,,,,,,14.0,175.0,177.142857,57.608569,83.0,300.0,217.0,0.302888,0.311253,0.513,13.0,33.0,87.153846,104.532169,16.0,283.0,267.0,1.3303,-0.037422,13.0,155.586861,244.233884,246.402628,37.343423,816.285016,778.941593,1.877969,2.505976,13.0,354.434033,272.685563,137.444747,0.806665,359.339329,358.532663,-1.37342,0.410469,12.0,11.800075,150.559241,177.600544,2.496261,358.971076,356.474815,0.390333,-2.253009,13.0,5.73823,5.608477,3.484987,0.131956,11.223482,11.091527,-0.222618,-0.797721,1.0,2.189,3.0,250.0,222.333333,63.216559,150.0,267.0,117.0,-1.592204,,72.0,-0.788561,-0.729166,0.514497,-1.572326,0.229389,1.801714,0.500279,-0.94574,177.0,60.0,0.93,,,0.0,83.0,,0.07,178.0,63.0,0.86,233.0,,0.07,175.0,11.0,0.14,0.772727,0.153846,75.0,-0.385,0.36,-0.124,-0.566
7,\nBE7-P1002-Memphis,MainText,1.0,,50.0,1147702.0,1166688.0,18986.0,1382120000000.0,1382120000000.0,44:10.9,44:29.9,0.834065,63.0,14.0,6.0,2.0,12650.0,16650.0,1160352.0,1164352.0,0.0,none,none,,,,,,14.0,167.0,164.214286,59.12201,83.0,283.0,200.0,0.408151,-0.168474,0.499,13.0,50.0,113.923077,118.374027,16.0,366.0,350.0,1.209604,0.420303,13.0,165.718567,245.579825,247.360982,48.376295,800.894412,752.518116,1.924017,2.571143,13.0,352.983484,244.323163,153.288328,1.648142,359.73813,358.089988,-0.897224,-1.034737,12.0,194.342661,183.350348,180.421154,2.350191,359.86713,357.516939,-0.006937,-2.430812,13.0,4.658974,4.051463,2.766631,0.264351,9.64933,9.384979,0.143549,-0.139571,0.923077,1.552,2.0,166.5,166.5,23.334524,150.0,183.0,33.0,,,47.0,-0.634671,-0.673365,0.352872,-1.617126,0.260005,1.877131,-0.306104,1.089734,168.0,58.0,0.86,,,0.0,150.0,47.0,0.14,176.0,54.0,0.79,250.0,,0.07,142.0,83.0,0.14,0.888889,0.153846,90.0,-0.44,0.392,-0.1,0.754
8,\nBE7-P1002-Memphis,MainText,1.0,,49.0,1127368.0,1147702.0,20334.0,1382120000000.0,1382120000000.0,43:50.5,44:10.9,0.864754,68.0,10.0,6.0,0.0,12650.0,16650.0,1140018.0,1144018.0,0.0,none,none,,,,,,10.0,191.5,196.5,87.22799,83.0,367.0,284.0,0.518089,0.37788,0.395,9.0,34.0,118.444444,138.556948,17.0,416.0,399.0,1.504962,1.625597,9.0,165.199587,231.798686,241.087999,65.132056,849.327025,784.194968,2.568249,7.090813,9.0,181.358583,197.312832,162.839883,0.845461,359.544912,358.699451,-0.256858,-2.013394,8.0,175.610683,177.498406,184.335787,0.513122,356.082114,355.568992,0.001095,-2.794366,9.0,5.499587,4.865226,3.873412,0.260528,10.232856,9.972327,0.065792,-1.540166,1.0,1.843,0.0,,,,,,,,,59.0,-0.50132,-0.337933,0.55612,-1.550347,0.618199,2.168546,0.014726,-0.887079,216.0,81.0,0.6,,,0.0,217.0,,0.1,244.0,106.0,0.3,166.0,104.0,0.3,183.0,141.0,0.2,0.6,0.111111,505.0,0.135,-0.338,0.735,0.454
9,\nBE7-P1002-Memphis,MainText,1.0,,48.0,1107116.0,1127368.0,20252.0,1382120000000.0,1382120000000.0,43:30.3,43:50.5,0.846914,70.0,11.0,2.0,1.0,12650.0,16650.0,1119766.0,1123766.0,0.0,none,none,,,,,,11.0,217.0,222.545455,85.264721,99.0,333.0,234.0,-0.043643,-1.429516,0.506,10.0,58.5,146.6,174.128433,16.0,450.0,434.0,1.176361,-0.388564,10.0,164.572755,313.629321,276.564764,100.995571,819.583801,718.58823,1.421164,0.528226,10.0,352.915197,249.49478,149.091911,3.310677,359.790099,356.479422,-0.958666,-0.78745,9.0,350.650774,198.758027,183.456203,1.019166,358.765948,357.746782,-0.271016,-2.56917,10.0,5.68472,5.321607,3.789643,0.333343,9.874504,9.541161,-0.22434,-1.674641,1.0,1.67,1.0,267.0,267.0,,267.0,267.0,0.0,,,81.0,-0.284697,-0.351449,0.506121,-1.452596,0.899361,2.351957,-0.295356,0.007503,215.0,86.0,0.91,,,0.0,184.0,70.0,0.18,217.0,91.0,0.82,250.0,70.0,0.18,,,0.0,1.105263,0.2,67.0,0.138,0.604,-0.33,0.536


Pandas allow you to have a high-level simple statistical description of the numerical features. This can be done using the describe method.

In [3]:
df.describe()

Unnamed: 0,TrialIndex,SegmentID,SegmentIndex,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
count,4076.0,0.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,1113.0,0.0,1113.0,1113.0,1113.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,1816.0,1816.0,625.0,1816.0,1816.0,1816.0,182.0,40.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4076.0,4061.0,4076.0,34.0,9.0,4076.0,3111.0,1764.0,4076.0,4056.0,4011.0,4076.0,3796.0,3041.0,4076.0,2597.0,2597.0,4076.0,4076.0,4076.0,4061.0,4059.0,3829.0,3993.0,3732.0
mean,1.0,,25.905545,762386.0,792720.6,30334.59421,1384935000000.0,1384935000000.0,0.829745,87.439401,12.202404,4.802748,0.657507,12857.30422,16857.30422,775243.3,779243.3,0.312561,1384846000000.0,,782263.4,782263.4,20409.184187,12.180569,219.403944,244.034112,108.787621,122.50368,476.429342,351.639537,0.996244,1.298634,0.477894,11.165849,29.32571,65.326965,87.81747,11.512267,264.664902,258.862365,1.956464,4.18578,11.167554,163.288298,245.568984,234.530286,69.223759,802.600796,735.131763,1.999277,4.302593,11.165849,201.600177,194.144413,144.684971,7.241965,356.743806,349.965334,-0.1564,-1.211052,10.165849,178.106495,178.970778,175.515431,2.961315,356.98809,353.783332,0.019102,-2.126824,11.165849,8.14074,8.244053,4.032931,2.051137,14.9395,12.888364,0.132923,0.272021,0.964906,6.593431,0.654503,161.071861,161.535169,31.901193,153.186123,170.496145,17.310022,0.20901,0.702644,287.520608,-0.315771,-0.311956,0.397219,-1.135466,0.557059,1.692525,0.073302,-0.235972,246.094701,105.557252,0.741992,230.029412,107.666667,0.000895,240.69206,79.17517,0.121057,247.291913,102.024183,0.582826,236.179926,88.285761,0.225861,232.409318,88.682326,0.171043,1.141159,0.256053,195.879586,-0.003111,-0.015639,-0.001991,0.012287
std,0.0,,16.435952,568963.3,571564.2,10622.407794,1624615000.0,1624614000.0,0.16661,29.399812,2.875324,5.153054,0.893313,5406.837514,5406.837514,569458.8,569458.8,0.558657,1684925000.0,,593496.6,593496.6,10330.064837,2.887209,45.870434,53.792348,62.189677,28.308532,189.840517,176.113809,0.795421,2.547685,0.071925,2.901655,25.710168,60.323696,106.662368,7.270993,262.022751,281.259216,0.858751,3.910018,2.898518,50.654266,66.202164,71.000667,25.008923,205.177156,200.154889,0.854413,3.879658,2.901655,94.823896,47.457778,18.510662,27.404695,11.97648,27.534653,0.627277,0.989999,2.901655,147.376798,40.073046,11.249984,9.928564,10.321726,16.676733,0.629899,1.207821,2.901655,3.114519,2.777068,1.280407,2.304753,4.225646,4.226886,0.816272,1.646024,0.076675,4.33707,0.888403,63.798246,63.641444,34.485271,63.561805,69.886898,39.826196,1.305798,2.866314,142.623511,0.849375,0.835383,0.194162,0.887844,0.949528,0.751333,0.649009,1.33988,59.569856,65.426298,0.143996,82.748793,85.653371,0.010781,107.450189,75.141926,0.099612,66.043942,67.790591,0.203935,88.141063,80.460886,0.14493,86.172834,90.929748,0.174558,0.89971,0.175483,299.746176,0.314982,0.423895,0.460385,0.57093
min,1.0,,1.0,0.0,17923.0,8360.0,1381340000000.0,1381340000000.0,0.03654,7.0,6.0,0.0,0.0,15.0,4015.0,799.0,4799.0,0.0,1381340000000.0,,7799.0,7799.0,7015.0,6.0,117.0,127.142857,24.36938,83.0,175.0,66.0,-2.449136,-2.879022,0.268,5.0,8.0,9.4,0.377964,8.0,16.0,1.0,-2.645751,-3.333333,5.0,51.02439,57.246533,11.998179,0.952858,79.352714,37.386633,-2.329558,-3.306244,5.0,0.822356,26.512658,66.120587,0.0,185.675382,169.2308,-2.550109,-3.333013,4.0,1.524554,1.970569,8.129163,0.000245,22.18636,2.206239,-2.448342,-5.999967,5.0,0.290926,0.66161,0.50904,0.002338,1.436354,1.276716,-2.329386,-3.294416,0.375,0.328,0.0,83.0,83.0,0.0,83.0,83.0,0.0,-1.732051,-6.0,4.0,-3.384376,-3.161523,0.031728,-4.553955,-2.627697,0.099516,-3.230114,-2.494463,109.0,1.0,0.07,125.0,11.0,0.0,83.0,0.0,0.0,92.0,0.0,0.0,83.0,0.0,0.0,92.0,0.0,0.0,0.0,0.0,14.0,-0.932,-1.0,-1.0,-1.0
25%,1.0,,12.0,304539.0,331949.8,22830.0,1383780000000.0,1383780000000.0,0.767246,70.0,10.0,1.0,0.0,12650.0,16650.0,318018.0,322018.0,0.0,1383680000000.0,,286369.0,286369.0,12240.0,10.0,187.0,207.13125,69.185654,100.0,350.0,233.0,0.447695,-0.55931,0.425,9.0,17.0,30.1,26.531383,8.0,91.0,83.0,1.376895,0.989331,9.0,128.826568,201.686293,206.242848,57.060127,761.465834,690.56319,1.487635,1.23415,9.0,173.152773,162.108689,135.095013,0.484611,357.393788,353.994935,-0.540932,-1.805117,8.0,14.692647,153.054848,171.571887,0.527362,356.822727,353.430566,-0.373934,-2.540045,9.0,5.440435,5.850329,3.127325,0.461505,12.093436,9.854592,-0.42688,-0.881009,1.0,3.32275,0.0,117.0,117.0,11.313709,108.0,125.0,0.0,-1.106108,-1.04876,169.75,-0.907863,-0.889797,0.259026,-1.749153,-0.055583,1.170338,-0.300039,-0.99236,206.0,64.0,0.65,162.25,43.0,0.0,175.0,31.75,0.06,204.0,60.0,0.44,181.0,38.0,0.12,180.0,35.0,0.0,0.666667,0.13125,82.0,-0.225,-0.312,-0.325,-0.4115
50%,1.0,,24.0,654191.5,683649.5,28197.0,1385070000000.0,1385070000000.0,0.887972,85.0,12.0,3.0,0.0,12650.0,16650.0,667428.5,671428.5,0.0,1385070000000.0,,674648.0,674648.0,18448.0,12.0,209.0,233.5,92.849751,117.0,433.0,308.0,0.921989,0.49318,0.477,11.0,21.0,44.846154,54.285811,8.0,183.0,167.0,1.991773,3.741482,11.0,156.354574,242.451385,243.289083,67.393291,855.925005,783.316331,2.035969,3.432282,11.0,177.51338,194.799032,147.04907,1.260044,358.823241,356.983735,-0.160512,-1.436767,10.0,179.62838,179.929086,177.899605,1.392556,358.636418,356.630884,0.000134,-2.282108,11.0,8.868506,8.663045,3.993678,0.970296,15.40155,12.932225,0.092462,-0.078608,1.0,5.8575,0.0,146.0,146.0,22.181073,134.0,150.0,0.0,0.405934,1.247536,303.5,-0.341211,-0.340486,0.360294,-1.150425,0.512813,1.572328,0.065047,-0.554679,234.0,89.0,0.75,209.5,76.0,0.0,217.0,59.0,0.11,233.0,86.0,0.57,217.0,68.0,0.21,212.0,66.0,0.15,1.0,0.214286,116.0,-0.004,-0.019,-0.032,0.021
75%,1.0,,40.0,1113294.0,1142741.0,35198.25,1386360000000.0,1386360000000.0,0.946642,104.0,14.0,7.0,1.0,12650.0,16650.0,1125599.0,1129599.0,1.0,1386280000000.0,,1152456.0,1152456.0,25880.0,14.0,242.0,268.75,128.551158,141.0,549.0,417.0,1.514626,2.496111,0.528,13.0,33.0,75.827922,100.994566,16.0,327.0,317.0,2.604248,7.174066,13.0,188.111155,285.168831,277.825728,81.837365,923.721106,858.40243,2.662692,7.555701,13.0,264.082083,227.076246,157.396156,2.976306,359.537315,358.482376,0.226959,-0.831148,12.0,345.431314,205.705618,181.892834,3.21926,359.469253,358.28037,0.377163,-2.046081,13.0,10.586773,10.49442,4.896642,3.004345,17.736699,15.8613,0.658521,1.06051,1.0,8.92575,1.0,183.5,184.0,36.363902,175.0,200.0,17.0,1.466667,2.885738,415.0,0.239795,0.230524,0.494064,-0.541892,1.116296,2.090335,0.44566,0.088549,272.0,125.0,0.85,301.75,159.0,0.0,275.0,100.25,0.18,276.0,122.5,0.71,270.0,112.0,0.31,258.0,110.0,0.27,1.4,0.333333,172.0,0.2295,0.28,0.312,0.454
max,1.0,,57.0,3344127.0,3386645.0,108570.0,1387130000000.0,1387130000000.0,1.0,288.0,21.0,41.0,6.0,68157.0,72157.0,3346487.0,3350487.0,5.0,1387130000000.0,,3353487.0,3353487.0,75157.0,21.0,529.0,597.166667,609.742276,283.0,1925.0,1413.022926,3.6887,11.852581,0.736342,20.0,400.0,568.2,1071.196854,91.0,2267.0,2400.0,3.988124,15.931546,20.0,790.690217,746.640449,478.892265,255.660528,1273.423648,1180.285433,3.85856,15.493315,20.0,358.909602,335.644424,194.792277,189.677904,359.999117,359.982688,2.397787,5.312167,19.0,358.831683,355.401129,204.797658,345.491279,359.999694,359.970053,2.449345,5.997423,20.0,16.598825,16.120969,10.397077,12.102106,40.141742,37.763217,2.969283,9.243417,1.0,61.776,6.0,400.0,400.0,212.132034,400.0,400.0,300.0,2.236068,5.0,480.0,2.903637,3.209197,1.577418,2.748382,7.562523,8.11102,3.395078,14.183782,592.0,650.0,1.0,425.0,272.0,0.25,1399.0,695.0,0.69,800.0,677.0,1.0,1175.0,943.0,1.0,1058.0,1225.0,1.0,17.0,1.0,1966.0,0.926,1.0,1.0,1.0


In [4]:
#Size of the dataset
print(df.shape)

(4077, 129)


In [5]:
df.isnull().sum()

ParticipantID                    0
TrialID                          1
TrialIndex                       1
SegmentID                     4077
SegmentIndex                     1
StartTime(ms)                    1
EndTime(ms)                      1
Length(ms)                       1
StartTimestamp                   1
EndTimestamp                     1
StartTimeGMT                     1
EndTimeGMT                       1
ValidityRate                     1
PageFixations                    1
WindowFixations                  1
PageBlinks                       1
WindowBlinks                     1
BottomWindowBound_Page           1
TopWindowBound_Page              1
BottomWindowBound_Session        1
TopWindowBound_Session           1
NumberOfReports                  1
FirstReportType                  1
FirstReportContent               1
FirstReportTimestamp          2964
FirstReportTimesGMT           4077
FirstReportSessionTime(ms)    2964
FirstReportTrialTime(ms)      2964
FirstReportSegmentTi

In [6]:
df=df.drop(columns=['ParticipantID','TrialID','SegmentIndex','SegmentID','TrialIndex'])
df.head()

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportType,FirstReportContent,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
0,1294808.0,1322271.0,27463.0,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0,0.0,none,none,,,,,,11.0,200.0,221.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,132.924202,210.010957,210.956418,80.856303,793.888636,713.032333,2.8609,8.535334,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,9.0,353.875658,240.549308,174.528607,4.791021,359.971189,355.180168,-0.857144,-1.710707,10.0,6.684761,5.48446,3.70356,0.407012,10.050112,9.6431,-0.457247,-1.396898,1.0,1.972,0.0,,,,,,,,,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,261.0,102.0,0.55,,,0.0,,,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56
1,1274590.0,1294808.0,20218.0,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0,0.0,none,none,,,,,,11.0,183.0,209.090909,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,155.985073,274.13892,289.943023,66.923698,926.600337,859.676639,1.787491,2.228981,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,9.0,351.088886,198.966328,183.765405,3.750852,359.611378,355.860527,-0.270329,-2.570038,10.0,4.602806,5.216583,3.52537,0.660415,11.163859,10.503445,0.681916,-0.361619,1.0,2.195,0.0,,,,,,,,,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,209.0,120.0,0.82,,,0.0,133.0,,0.09,208.0,128.0,0.73,134.0,,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,,-0.078,-0.657
2,1251125.0,1274590.0,23465.0,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0,0.0,none,none,,,,,,13.0,167.0,188.538462,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,118.155517,256.603244,288.874145,66.587272,887.889747,821.302475,1.815159,2.015756,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,11.0,5.640831,98.631825,160.84248,0.174334,353.873826,353.636743,1.189756,-0.759343,12.0,5.031037,5.015407,3.913982,0.608951,12.285944,11.676993,0.515436,-0.530908,1.0,1.695,0.0,,,,,,,,,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,200.0,107.0,0.69,,,0.0,,,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234
3,1227726.0,1251125.0,23399.0,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0,0.0,none,none,,,,,,14.0,158.0,198.714286,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,87.00475,155.084632,201.615678,16.933348,801.90246,784.969112,3.179386,10.736262,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,12.0,30.463093,154.156733,173.179469,3.238186,358.733251,355.495065,0.381106,-2.24494,13.0,3.882799,4.405353,3.557801,0.092029,10.979019,10.88699,0.545283,-0.608007,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,176.0,94.0,0.86,,,0.0,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059
4,1207674.0,1227726.0,20052.0,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0,1.0,self-caught,other,1382120000000.0,,1219182.0,1219182.0,11508.0,12.0,191.5,190.166667,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,119.230911,268.14413,270.211194,68.773714,799.07515,730.301436,1.494452,0.925619,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,10.0,181.425399,179.690831,183.741867,0.296653,358.835734,358.53908,-0.00049,-2.567767,11.0,4.685158,4.913032,4.044616,0.661556,11.926495,11.264939,0.699785,-0.512241,1.0,1.692,1.0,233.0,233.0,,233.0,233.0,0.0,,,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,190.0,41.0,1.0,,,0.0,233.0,,0.08,190.0,41.0,1.0,183.0,,0.08,,,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554


## Window Size Creation

In [7]:
df['Length(ms)']=df['Length(ms)']/1000.

In [8]:
df.head(n=15)

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportType,FirstReportContent,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
0,1294808.0,1322271.0,27.463,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0,0.0,none,none,,,,,,11.0,200.0,221.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,132.924202,210.010957,210.956418,80.856303,793.888636,713.032333,2.8609,8.535334,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,9.0,353.875658,240.549308,174.528607,4.791021,359.971189,355.180168,-0.857144,-1.710707,10.0,6.684761,5.48446,3.70356,0.407012,10.050112,9.6431,-0.457247,-1.396898,1.0,1.972,0.0,,,,,,,,,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,261.0,102.0,0.55,,,0.0,,,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56
1,1274590.0,1294808.0,20.218,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0,0.0,none,none,,,,,,11.0,183.0,209.090909,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,155.985073,274.13892,289.943023,66.923698,926.600337,859.676639,1.787491,2.228981,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,9.0,351.088886,198.966328,183.765405,3.750852,359.611378,355.860527,-0.270329,-2.570038,10.0,4.602806,5.216583,3.52537,0.660415,11.163859,10.503445,0.681916,-0.361619,1.0,2.195,0.0,,,,,,,,,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,209.0,120.0,0.82,,,0.0,133.0,,0.09,208.0,128.0,0.73,134.0,,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,,-0.078,-0.657
2,1251125.0,1274590.0,23.465,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0,0.0,none,none,,,,,,13.0,167.0,188.538462,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,118.155517,256.603244,288.874145,66.587272,887.889747,821.302475,1.815159,2.015756,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,11.0,5.640831,98.631825,160.84248,0.174334,353.873826,353.636743,1.189756,-0.759343,12.0,5.031037,5.015407,3.913982,0.608951,12.285944,11.676993,0.515436,-0.530908,1.0,1.695,0.0,,,,,,,,,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,200.0,107.0,0.69,,,0.0,,,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234
3,1227726.0,1251125.0,23.399,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0,0.0,none,none,,,,,,14.0,158.0,198.714286,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,87.00475,155.084632,201.615678,16.933348,801.90246,784.969112,3.179386,10.736262,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,12.0,30.463093,154.156733,173.179469,3.238186,358.733251,355.495065,0.381106,-2.24494,13.0,3.882799,4.405353,3.557801,0.092029,10.979019,10.88699,0.545283,-0.608007,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,176.0,94.0,0.86,,,0.0,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059
4,1207674.0,1227726.0,20.052,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0,1.0,self-caught,other,1382120000000.0,,1219182.0,1219182.0,11508.0,12.0,191.5,190.166667,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,119.230911,268.14413,270.211194,68.773714,799.07515,730.301436,1.494452,0.925619,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,10.0,181.425399,179.690831,183.741867,0.296653,358.835734,358.53908,-0.00049,-2.567767,11.0,4.685158,4.913032,4.044616,0.661556,11.926495,11.264939,0.699785,-0.512241,1.0,1.692,1.0,233.0,233.0,,233.0,233.0,0.0,,,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,190.0,41.0,1.0,,,0.0,233.0,,0.08,190.0,41.0,1.0,183.0,,0.08,,,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554
5,1187056.0,1207674.0,20.618,1382120000000.0,1382120000000.0,44:50.2,45:10.9,0.828618,67.0,12.0,6.0,0.0,12650.0,16650.0,1199706.0,1203706.0,0.0,none,none,,,,,,12.0,191.5,190.083333,47.959184,116.0,267.0,151.0,-0.106415,-0.658047,0.437,11.0,33.0,115.090909,175.656742,16.0,500.0,484.0,1.892711,2.117848,11.0,191.70898,288.700837,239.109446,82.725345,758.423429,675.698084,1.609357,1.329828,11.0,173.957922,194.634535,168.54549,0.091508,359.459246,359.367738,-0.185212,-2.083062,10.0,7.546411,144.321033,181.594749,0.379793,356.469993,356.0902,0.483505,-2.276052,11.0,5.809363,5.890486,3.223682,0.538783,11.202957,10.664174,-0.355108,0.053551,1.0,1.802,0.0,,,,,,,,,82.0,-0.780376,-0.765,0.338393,-1.412361,0.068867,1.481229,0.225383,-0.601105,197.0,44.0,0.92,,,0.0,,,0.0,200.0,45.0,0.83,142.0,35.0,0.17,,,0.0,0.857143,0.181818,81.0,0.011,-0.472,0.554,0.256
6,1166688.0,1187056.0,20.368,1382120000000.0,1382120000000.0,44:29.9,44:50.2,0.837152,74.0,14.0,7.0,3.0,12650.0,16650.0,1179338.0,1183338.0,0.0,none,none,,,,,,14.0,175.0,177.142857,57.608569,83.0,300.0,217.0,0.302888,0.311253,0.513,13.0,33.0,87.153846,104.532169,16.0,283.0,267.0,1.3303,-0.037422,13.0,155.586861,244.233884,246.402628,37.343423,816.285016,778.941593,1.877969,2.505976,13.0,354.434033,272.685563,137.444747,0.806665,359.339329,358.532663,-1.37342,0.410469,12.0,11.800075,150.559241,177.600544,2.496261,358.971076,356.474815,0.390333,-2.253009,13.0,5.73823,5.608477,3.484987,0.131956,11.223482,11.091527,-0.222618,-0.797721,1.0,2.189,3.0,250.0,222.333333,63.216559,150.0,267.0,117.0,-1.592204,,72.0,-0.788561,-0.729166,0.514497,-1.572326,0.229389,1.801714,0.500279,-0.94574,177.0,60.0,0.93,,,0.0,83.0,,0.07,178.0,63.0,0.86,233.0,,0.07,175.0,11.0,0.14,0.772727,0.153846,75.0,-0.385,0.36,-0.124,-0.566
7,1147702.0,1166688.0,18.986,1382120000000.0,1382120000000.0,44:10.9,44:29.9,0.834065,63.0,14.0,6.0,2.0,12650.0,16650.0,1160352.0,1164352.0,0.0,none,none,,,,,,14.0,167.0,164.214286,59.12201,83.0,283.0,200.0,0.408151,-0.168474,0.499,13.0,50.0,113.923077,118.374027,16.0,366.0,350.0,1.209604,0.420303,13.0,165.718567,245.579825,247.360982,48.376295,800.894412,752.518116,1.924017,2.571143,13.0,352.983484,244.323163,153.288328,1.648142,359.73813,358.089988,-0.897224,-1.034737,12.0,194.342661,183.350348,180.421154,2.350191,359.86713,357.516939,-0.006937,-2.430812,13.0,4.658974,4.051463,2.766631,0.264351,9.64933,9.384979,0.143549,-0.139571,0.923077,1.552,2.0,166.5,166.5,23.334524,150.0,183.0,33.0,,,47.0,-0.634671,-0.673365,0.352872,-1.617126,0.260005,1.877131,-0.306104,1.089734,168.0,58.0,0.86,,,0.0,150.0,47.0,0.14,176.0,54.0,0.79,250.0,,0.07,142.0,83.0,0.14,0.888889,0.153846,90.0,-0.44,0.392,-0.1,0.754
8,1127368.0,1147702.0,20.334,1382120000000.0,1382120000000.0,43:50.5,44:10.9,0.864754,68.0,10.0,6.0,0.0,12650.0,16650.0,1140018.0,1144018.0,0.0,none,none,,,,,,10.0,191.5,196.5,87.22799,83.0,367.0,284.0,0.518089,0.37788,0.395,9.0,34.0,118.444444,138.556948,17.0,416.0,399.0,1.504962,1.625597,9.0,165.199587,231.798686,241.087999,65.132056,849.327025,784.194968,2.568249,7.090813,9.0,181.358583,197.312832,162.839883,0.845461,359.544912,358.699451,-0.256858,-2.013394,8.0,175.610683,177.498406,184.335787,0.513122,356.082114,355.568992,0.001095,-2.794366,9.0,5.499587,4.865226,3.873412,0.260528,10.232856,9.972327,0.065792,-1.540166,1.0,1.843,0.0,,,,,,,,,59.0,-0.50132,-0.337933,0.55612,-1.550347,0.618199,2.168546,0.014726,-0.887079,216.0,81.0,0.6,,,0.0,217.0,,0.1,244.0,106.0,0.3,166.0,104.0,0.3,183.0,141.0,0.2,0.6,0.111111,505.0,0.135,-0.338,0.735,0.454
9,1107116.0,1127368.0,20.252,1382120000000.0,1382120000000.0,43:30.3,43:50.5,0.846914,70.0,11.0,2.0,1.0,12650.0,16650.0,1119766.0,1123766.0,0.0,none,none,,,,,,11.0,217.0,222.545455,85.264721,99.0,333.0,234.0,-0.043643,-1.429516,0.506,10.0,58.5,146.6,174.128433,16.0,450.0,434.0,1.176361,-0.388564,10.0,164.572755,313.629321,276.564764,100.995571,819.583801,718.58823,1.421164,0.528226,10.0,352.915197,249.49478,149.091911,3.310677,359.790099,356.479422,-0.958666,-0.78745,9.0,350.650774,198.758027,183.456203,1.019166,358.765948,357.746782,-0.271016,-2.56917,10.0,5.68472,5.321607,3.789643,0.333343,9.874504,9.541161,-0.22434,-1.674641,1.0,1.67,1.0,267.0,267.0,,267.0,267.0,0.0,,,81.0,-0.284697,-0.351449,0.506121,-1.452596,0.899361,2.351957,-0.295356,0.007503,215.0,86.0,0.91,,,0.0,184.0,70.0,0.18,217.0,91.0,0.82,250.0,70.0,0.18,,,0.0,1.105263,0.2,67.0,0.138,0.604,-0.33,0.536


# Missing Value Handling

In [9]:
#Filling the NaN values with zero
df=df.fillna(0)

In [10]:
df.isnull().sum()

StartTime(ms)                 0
EndTime(ms)                   0
Length(ms)                    0
StartTimestamp                0
EndTimestamp                  0
StartTimeGMT                  0
EndTimeGMT                    0
ValidityRate                  0
PageFixations                 0
WindowFixations               0
PageBlinks                    0
WindowBlinks                  0
BottomWindowBound_Page        0
TopWindowBound_Page           0
BottomWindowBound_Session     0
TopWindowBound_Session        0
NumberOfReports               0
FirstReportType               0
FirstReportContent            0
FirstReportTimestamp          0
FirstReportTimesGMT           0
FirstReportSessionTime(ms)    0
FirstReportTrialTime(ms)      0
FirstReportSegmentTime(ms)    0
FixDurN                       0
FixDurMed                     0
FixDurMean                    0
FixDurSD                      0
FixDurMin                     0
FixDurMax                     0
FixDurRange                   0
FixDurSk

# Features

## Global Features

In [11]:
#Global Features
#Fixations
X_Global_fixation=df[['FixDurN','FixDurMed','FixDurSD','FixDurMin','FixDurMax','FixDurRange','FixDurSkew','FixDurKur','FxDisp']]
X_Global_fixation.head()

Unnamed: 0,FixDurN,FixDurMed,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp
0,11.0,200.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429
1,11.0,183.0,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436
2,13.0,167.0,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554
3,14.0,158.0,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333
4,12.0,191.5,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502


In [12]:
#Global Features
#Saccade
X_Global_Saccade=df[['SacDurN','SacDurMed','SacDurMean','SacDurSD','SacDurMin','SacDurMax','SacDurRange','SacDurSkew','SacDurKur']]
X_Global_Saccade.head()

Unnamed: 0,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur
0,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705
1,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941
2,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127
3,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978
4,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871


In [13]:
#Global Features
#Saccade Angle
X_Global_Saccade_Angle=df[['SacAngAbsN','SacAngAbsMed','SacAngAbsMean','SacAngAbsSD','SacAngAbsMin','SacAngAbsMax','SacAngAbsRange','SacAngAbsSkew','SacAngAbsKur']]
X_Global_Saccade_Angle.head()

Unnamed: 0,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur
0,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323
1,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207
2,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259
3,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169
4,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184


In [14]:
#Global Features
#Pupil Diameter
X_Global_Pupil_Diameter=df[['PupilDiametersZN','PupilDiametersZMed','PupilDiametersZMean','PupilDiametersZSD','PupilDiametersZMin','PupilDiametersZMax','PupilDiametersZRange','PupilDiametersZSkew','PupilDiametersZKur']]
X_Global_Pupil_Diameter.head()

Unnamed: 0,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur
0,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603
1,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409
2,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456
3,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687
4,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933


In [15]:
#Global Features
#Horizontal Saccade Proportion
X_Global_Horizontal_Saccade_Proportion=df[['horizontalSaccadeProp']]
X_Global_Horizontal_Saccade_Proportion.head()

Unnamed: 0,horizontalSaccadeProp
0,1.0
1,1.0
2,1.0
3,0.923077
4,1.0


In [16]:
#Global Features
#Fixation Saccade Ratio
X_Global_Fixation_Saccade_Ratio=df[['FxSacRatio']]
X_Global_Fixation_Saccade_Ratio.head()

Unnamed: 0,FxSacRatio
0,1.972
1,2.195
2,1.695
3,2.785
4,1.692


In [17]:
#Global Features
#Blink Duration
X_Global_Blink_Duration=df[['BlinkDurN','BlinkDurMed','BlinkDurMean','BlinkDurSD','BlinkDurMin','BlinkDurMax','BlinkDurRange','BlinkDurSkew','BlinkDurKur']]
X_Global_Blink_Duration.head()

Unnamed: 0,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,0.0
4,1.0,233.0,233.0,0.0,233.0,233.0,0.0,0.0,0.0


In [18]:
#Global Features
#All Found Global Features Except Saccade Distance, Number of Saccades, Fixation Dispersion and Blink Count
X_Global_Features=df[['FixDurN','FixDurMed','FixDurSD','FixDurMin','FixDurMax','FixDurRange','FixDurSkew','FixDurKur','FxDisp','SacDurN','SacDurMed','SacDurMean','SacDurSD','SacDurMin','SacDurMax','SacDurRange','SacDurSkew','SacDurKur','SacAngAbsN','SacAngAbsMed','SacAngAbsMean','SacAngAbsSD','SacAngAbsMin','SacAngAbsMax','SacAngAbsRange','SacAngAbsSkew','SacAngAbsKur','PupilDiametersZN','PupilDiametersZMed','PupilDiametersZMean','PupilDiametersZSD','PupilDiametersZMin','PupilDiametersZMax','PupilDiametersZRange','PupilDiametersZSkew','PupilDiametersZKur','horizontalSaccadeProp','FxSacRatio','BlinkDurN','BlinkDurMed','BlinkDurMean','BlinkDurSD','BlinkDurMin','BlinkDurMax','BlinkDurRange','BlinkDurSkew','BlinkDurKur']]
X_Global_Features.head()

Unnamed: 0,FixDurN,FixDurMed,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur
0,11.0,200.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,1.0,1.972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,11.0,183.0,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,1.0,2.195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,13.0,167.0,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,1.0,1.695,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,14.0,158.0,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,0.0
4,12.0,191.5,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,1.0,1.692,1.0,233.0,233.0,0.0,233.0,233.0,0.0,0.0,0.0


In [19]:
df.head()

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportType,FirstReportContent,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
0,1294808.0,1322271.0,27.463,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,11.0,200.0,221.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,132.924202,210.010957,210.956418,80.856303,793.888636,713.032333,2.8609,8.535334,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,9.0,353.875658,240.549308,174.528607,4.791021,359.971189,355.180168,-0.857144,-1.710707,10.0,6.684761,5.48446,3.70356,0.407012,10.050112,9.6431,-0.457247,-1.396898,1.0,1.972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,261.0,102.0,0.55,0.0,0.0,0.0,0.0,0.0,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56
1,1274590.0,1294808.0,20.218,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,11.0,183.0,209.090909,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,155.985073,274.13892,289.943023,66.923698,926.600337,859.676639,1.787491,2.228981,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,9.0,351.088886,198.966328,183.765405,3.750852,359.611378,355.860527,-0.270329,-2.570038,10.0,4.602806,5.216583,3.52537,0.660415,11.163859,10.503445,0.681916,-0.361619,1.0,2.195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,209.0,120.0,0.82,0.0,0.0,0.0,133.0,0.0,0.09,208.0,128.0,0.73,134.0,0.0,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,0.0,-0.078,-0.657
2,1251125.0,1274590.0,23.465,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,13.0,167.0,188.538462,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,118.155517,256.603244,288.874145,66.587272,887.889747,821.302475,1.815159,2.015756,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,11.0,5.640831,98.631825,160.84248,0.174334,353.873826,353.636743,1.189756,-0.759343,12.0,5.031037,5.015407,3.913982,0.608951,12.285944,11.676993,0.515436,-0.530908,1.0,1.695,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,200.0,107.0,0.69,0.0,0.0,0.0,0.0,0.0,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234
3,1227726.0,1251125.0,23.399,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,14.0,158.0,198.714286,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,87.00475,155.084632,201.615678,16.933348,801.90246,784.969112,3.179386,10.736262,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,12.0,30.463093,154.156733,173.179469,3.238186,358.733251,355.495065,0.381106,-2.24494,13.0,3.882799,4.405353,3.557801,0.092029,10.979019,10.88699,0.545283,-0.608007,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,0.0,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,176.0,94.0,0.86,0.0,0.0,0.0,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059
4,1207674.0,1227726.0,20.052,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0,1.0,self-caught,other,1382120000000.0,0.0,1219182.0,1219182.0,11508.0,12.0,191.5,190.166667,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,119.230911,268.14413,270.211194,68.773714,799.07515,730.301436,1.494452,0.925619,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,10.0,181.425399,179.690831,183.741867,0.296653,358.835734,358.53908,-0.00049,-2.567767,11.0,4.685158,4.913032,4.044616,0.661556,11.926495,11.264939,0.699785,-0.512241,1.0,1.692,1.0,233.0,233.0,0.0,233.0,233.0,0.0,0.0,0.0,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,190.0,41.0,1.0,0.0,0.0,0.0,233.0,0.0,0.08,190.0,41.0,1.0,183.0,0.0,0.08,0.0,0.0,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554


## Local Features

In [20]:
#Local Features
#First Pass FIxations
X_Local_First_Pass_Fixations=df[['FirstPassFixDurMean','FirstPassFixDurSD','FirstPassFixProp']]
X_Local_First_Pass_Fixations.head()

Unnamed: 0,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp
0,261.0,102.0,0.55
1,209.0,120.0,0.82
2,200.0,107.0,0.69
3,176.0,94.0,0.86
4,190.0,41.0,1.0


In [21]:
#Local Features
#Regression Fixation
X_Local_Regression_Fixation=df[['RegFixDurMean','RegFixDurSD','RegFixProp']]
X_Local_Regression_Fixation.head()

Unnamed: 0,RegFixDurMean,RegFixDurSD,RegFixProp
0,0.0,0.0,0.0
1,133.0,0.0,0.09
2,0.0,0.0,0.0
3,108.0,35.0,0.14
4,233.0,0.0,0.08


In [22]:
#Local Features
#Single Fixations
X_Local_Single_Fixations=df[['SingleFixDurMean','SingleFixDurSD','SingleFixProp']]
X_Local_Single_Fixations.head()

Unnamed: 0,SingleFixDurMean,SingleFixDurSD,SingleFixProp
0,290.0,83.0,0.45
1,208.0,128.0,0.73
2,192.0,111.0,0.62
3,192.0,96.0,0.71
4,190.0,41.0,1.0


In [23]:
#Local Features
#Non-word Fixations
X_Local_Non_Word_Fixations=df[['NoWordFixDurMean','NoWordFixDurSD','NoWordFixProp']]
X_Local_Non_Word_Fixations.head()

Unnamed: 0,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp
0,164.0,80.0,0.55
1,134.0,0.0,0.09
2,183.0,93.0,0.38
3,117.0,47.0,0.14
4,183.0,0.0,0.08


In [24]:
#Local Features
#Gaze Fixations
X_Local_Gaze_Fixations=df[['GazeFixDurMean','GazeFixDurSD','GazeFixProp']]
X_Local_Gaze_Fixations.head()

Unnamed: 0,GazeFixDurMean,GazeFixDurSD,GazeFixProp
0,163.0,89.0,0.45
1,200.0,23.0,0.18
2,183.0,93.0,0.38
3,316.0,283.0,0.14
4,0.0,0.0,0.0


In [25]:
#Local Features
#Words Skipped
X_Local_Words_Skipped=df[['WordSkipProp']]
X_Local_Words_Skipped.head()

Unnamed: 0,WordSkipProp
0,0.636364
1,1.384615
2,1.142857
3,1.5
4,1.041667


In [26]:
#Local Features
#Line Cross Saccades
X_Local_Line_Cross_Saccades=df[['propCrossLineSaccades']]
X_Local_Line_Cross_Saccades.head()

Unnamed: 0,propCrossLineSaccades
0,0.1
1,0.2
2,0.166667
3,0.076923
4,0.181818


In [27]:
#Local Features
#Reading Time Ration
X_Local_Reading_Time_Ratio=df[['readingDepth']]
X_Local_Reading_Time_Ratio.head()

Unnamed: 0,readingDepth
0,153.0
1,60.0
2,67.0
3,82.0
4,67.0


In [28]:
#Local Features
#Word Length
X_Local_Word_Lenght=df[['WordLenToFixDurCorr']]
X_Local_Word_Lenght.head()

Unnamed: 0,WordLenToFixDurCorr
0,-0.375
1,0.058
2,-0.096
3,-0.054
4,0.66


In [29]:
#Local Features
#Global Frequency
X_Local_Global_Frequency=df[['FreqToFixDurCorr']]
X_Local_Global_Frequency.head()

Unnamed: 0,FreqToFixDurCorr
0,-0.223
1,0.0
2,0.159
3,-0.575
4,-0.849


In [30]:
#Local Features
#Synset Size
X_Local_Synset_Size=df[['NumSynsToFixDurCorr']]
X_Local_Synset_Size.head()


Unnamed: 0,NumSynsToFixDurCorr
0,0.404
1,-0.078
2,-0.092
3,-0.117
4,-0.344


In [31]:
#Local Features
#Hypernym Depth
X_Local_Hypernym_Depth=df[['HypDepthToFixDurCorr']]
X_Local_Hypernym_Depth.head()

Unnamed: 0,HypDepthToFixDurCorr
0,-0.56
1,-0.657
2,-0.234
3,-0.059
4,0.554


In [32]:
#Local Features
#End of Clause Fixations
X_Local_End_Of_Clause_Fixations=df[['EndOfClauseFixDurMean','EndOfClauseFixDurSD','EndOfClauseFixProp']]
X_Local_End_Of_Clause_Fixations.head()

Unnamed: 0,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp
0,0.0,0.0,0.0
1,0.0,0.0,0.0
2,0.0,0.0,0.0
3,0.0,0.0,0.0
4,0.0,0.0,0.0


In [33]:
#Local Features
X_Local_Features=df[['FirstPassFixDurMean','FirstPassFixDurSD','FirstPassFixProp','RegFixDurMean','RegFixDurSD','RegFixProp','SingleFixDurMean','SingleFixDurSD','SingleFixProp','NoWordFixDurMean','NoWordFixDurSD','NoWordFixProp','GazeFixDurMean','GazeFixDurSD','GazeFixProp','WordSkipProp','propCrossLineSaccades','readingDepth','WordLenToFixDurCorr','FreqToFixDurCorr','NumSynsToFixDurCorr','HypDepthToFixDurCorr','EndOfClauseFixDurMean','EndOfClauseFixDurSD','EndOfClauseFixProp']]
X_Local_Features.head()

Unnamed: 0,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp
0,261.0,102.0,0.55,0.0,0.0,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56,0.0,0.0,0.0
1,209.0,120.0,0.82,133.0,0.0,0.09,208.0,128.0,0.73,134.0,0.0,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,0.0,-0.078,-0.657,0.0,0.0,0.0
2,200.0,107.0,0.69,0.0,0.0,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234,0.0,0.0,0.0
3,176.0,94.0,0.86,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059,0.0,0.0,0.0
4,190.0,41.0,1.0,233.0,0.0,0.08,190.0,41.0,1.0,183.0,0.0,0.08,0.0,0.0,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554,0.0,0.0,0.0


## Context Features

In [34]:
df.head()

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session,NumberOfReports,FirstReportType,FirstReportContent,FirstReportTimestamp,FirstReportTimesGMT,FirstReportSessionTime(ms),FirstReportTrialTime(ms),FirstReportSegmentTime(ms),FixDurN,FixDurMed,FixDurMean,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAmpN,SacAmpMed,SacAmpMean,SacAmpSD,SacAmpMin,SacAmpMax,SacAmpRange,SacAmpSkew,SacAmpKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,SacAngRelN,SacAngRelMed,SacAngRelMean,SacAngRelSD,SacAngRelMin,SacAngRelMax,SacAngRelRange,SacAngRelSkew,SacAngRelKur,SacVelN,SacVelMed,SacVelMean,SacVelSD,SacVelMin,SacVelMax,SacVelRange,SacVelSkew,SacVelKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr
0,1294808.0,1322271.0,27.463,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,11.0,200.0,221.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,132.924202,210.010957,210.956418,80.856303,793.888636,713.032333,2.8609,8.535334,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,9.0,353.875658,240.549308,174.528607,4.791021,359.971189,355.180168,-0.857144,-1.710707,10.0,6.684761,5.48446,3.70356,0.407012,10.050112,9.6431,-0.457247,-1.396898,1.0,1.972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,261.0,102.0,0.55,0.0,0.0,0.0,0.0,0.0,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56
1,1274590.0,1294808.0,20.218,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,11.0,183.0,209.090909,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,155.985073,274.13892,289.943023,66.923698,926.600337,859.676639,1.787491,2.228981,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,9.0,351.088886,198.966328,183.765405,3.750852,359.611378,355.860527,-0.270329,-2.570038,10.0,4.602806,5.216583,3.52537,0.660415,11.163859,10.503445,0.681916,-0.361619,1.0,2.195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,209.0,120.0,0.82,0.0,0.0,0.0,133.0,0.0,0.09,208.0,128.0,0.73,134.0,0.0,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,0.0,-0.078,-0.657
2,1251125.0,1274590.0,23.465,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,13.0,167.0,188.538462,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,118.155517,256.603244,288.874145,66.587272,887.889747,821.302475,1.815159,2.015756,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,11.0,5.640831,98.631825,160.84248,0.174334,353.873826,353.636743,1.189756,-0.759343,12.0,5.031037,5.015407,3.913982,0.608951,12.285944,11.676993,0.515436,-0.530908,1.0,1.695,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,200.0,107.0,0.69,0.0,0.0,0.0,0.0,0.0,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234
3,1227726.0,1251125.0,23.399,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0,0.0,none,none,0.0,0.0,0.0,0.0,0.0,14.0,158.0,198.714286,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,87.00475,155.084632,201.615678,16.933348,801.90246,784.969112,3.179386,10.736262,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,12.0,30.463093,154.156733,173.179469,3.238186,358.733251,355.495065,0.381106,-2.24494,13.0,3.882799,4.405353,3.557801,0.092029,10.979019,10.88699,0.545283,-0.608007,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,0.0,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,176.0,94.0,0.86,0.0,0.0,0.0,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059
4,1207674.0,1227726.0,20.052,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0,1.0,self-caught,other,1382120000000.0,0.0,1219182.0,1219182.0,11508.0,12.0,191.5,190.166667,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,119.230911,268.14413,270.211194,68.773714,799.07515,730.301436,1.494452,0.925619,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,10.0,181.425399,179.690831,183.741867,0.296653,358.835734,358.53908,-0.00049,-2.567767,11.0,4.685158,4.913032,4.044616,0.661556,11.926495,11.264939,0.699785,-0.512241,1.0,1.692,1.0,233.0,233.0,0.0,233.0,233.0,0.0,0.0,0.0,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,190.0,41.0,1.0,0.0,0.0,0.0,233.0,0.0,0.08,190.0,41.0,1.0,183.0,0.0,0.08,0.0,0.0,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554


In [35]:
X_Context_Features=df[['StartTime(ms)','EndTime(ms)','Length(ms)','StartTimestamp','EndTimestamp','StartTimeGMT','EndTimeGMT','ValidityRate','PageFixations','WindowFixations','PageBlinks','WindowBlinks','BottomWindowBound_Page','TopWindowBound_Page','BottomWindowBound_Session','TopWindowBound_Session']]
X_Context_Features.head()

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session
0,1294808.0,1322271.0,27.463,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0
1,1274590.0,1294808.0,20.218,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0
2,1251125.0,1274590.0,23.465,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0
3,1227726.0,1251125.0,23.399,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0
4,1207674.0,1227726.0,20.052,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0


# Variables

In [36]:
X_Global_Features.head()

Unnamed: 0,FixDurN,FixDurMed,FixDurSD,FixDurMin,FixDurMax,FixDurRange,FixDurSkew,FixDurKur,FxDisp,SacDurN,SacDurMed,SacDurMean,SacDurSD,SacDurMin,SacDurMax,SacDurRange,SacDurSkew,SacDurKur,SacAngAbsN,SacAngAbsMed,SacAngAbsMean,SacAngAbsSD,SacAngAbsMin,SacAngAbsMax,SacAngAbsRange,SacAngAbsSkew,SacAngAbsKur,PupilDiametersZN,PupilDiametersZMed,PupilDiametersZMean,PupilDiametersZSD,PupilDiametersZMin,PupilDiametersZMax,PupilDiametersZRange,PupilDiametersZSkew,PupilDiametersZKur,horizontalSaccadeProp,FxSacRatio,BlinkDurN,BlinkDurMed,BlinkDurMean,BlinkDurSD,BlinkDurMin,BlinkDurMax,BlinkDurRange,BlinkDurSkew,BlinkDurKur
0,11.0,200.0,101.29462,83.0,366.0,283.0,0.132989,-1.389706,0.429,10.0,17.0,123.3,162.960834,16.0,450.0,434.0,1.244961,0.053705,10.0,352.773596,232.045048,168.309166,0.289026,359.406308,359.117281,-0.725668,-1.670323,38.0,-1.362156,-1.458751,0.415926,-2.365184,-0.868433,1.496751,-0.302367,-1.148603,1.0,1.972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,11.0,183.0,107.757556,133.0,499.0,366.0,2.241717,5.662505,0.436,10.0,25.5,104.8,145.311772,16.0,383.0,367.0,1.609426,1.031941,10.0,174.196626,179.839838,166.849182,2.67997,359.925463,357.245493,0.031703,-2.129207,71.0,-0.20008,-0.211293,0.297847,-0.938658,0.47558,1.414238,-0.44971,-0.030409,1.0,2.195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,13.0,167.0,100.261006,83.0,416.0,333.0,1.010131,0.510088,0.554,12.0,42.0,120.5,158.185449,16.0,466.0,450.0,1.434272,0.694127,12.0,262.754541,208.224594,165.723937,0.126346,359.952012,359.825666,-0.358558,-1.953259,53.0,0.992546,0.974411,0.534047,-0.171852,1.834626,2.006478,-0.345951,-0.968456,1.0,1.695,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,14.0,158.0,125.994156,83.0,516.0,433.0,1.562621,2.184748,0.333,13.0,17.0,76.846154,84.129708,16.0,250.0,234.0,1.033058,-0.469978,13.0,173.778792,187.886791,165.726863,0.868051,358.535732,357.66768,-0.09278,-2.130169,78.0,-0.877777,-0.860678,0.325079,-1.563895,-0.070521,1.493373,0.559856,-0.078687,0.923077,2.785,3.0,167.0,161.333333,9.814955,150.0,167.0,17.0,-1.732051,0.0
4,12.0,191.5,41.252732,116.0,250.0,134.0,-0.243659,-0.937062,0.502,11.0,67.0,122.636364,150.207372,16.0,399.0,383.0,1.202905,-0.322871,11.0,174.502695,210.187959,154.208257,1.987851,358.756616,356.768765,-0.371691,-1.666184,82.0,0.848486,0.86626,0.189015,0.332403,1.426649,1.094245,0.619191,1.268933,1.0,1.692,1.0,233.0,233.0,0.0,233.0,233.0,0.0,0.0,0.0


In [37]:
X_Local_Features.head()

Unnamed: 0,FirstPassFixDurMean,FirstPassFixDurSD,FirstPassFixProp,RegFixDurMean,RegFixDurSD,RegFixProp,SingleFixDurMean,SingleFixDurSD,SingleFixProp,NoWordFixDurMean,NoWordFixDurSD,NoWordFixProp,GazeFixDurMean,GazeFixDurSD,GazeFixProp,WordSkipProp,propCrossLineSaccades,readingDepth,WordLenToFixDurCorr,FreqToFixDurCorr,NumSynsToFixDurCorr,HypDepthToFixDurCorr,EndOfClauseFixDurMean,EndOfClauseFixDurSD,EndOfClauseFixProp
0,261.0,102.0,0.55,0.0,0.0,0.0,290.0,83.0,0.45,164.0,80.0,0.55,163.0,89.0,0.45,0.636364,0.1,153.0,-0.375,-0.223,0.404,-0.56,0.0,0.0,0.0
1,209.0,120.0,0.82,133.0,0.0,0.09,208.0,128.0,0.73,134.0,0.0,0.09,200.0,23.0,0.18,1.384615,0.2,60.0,0.058,0.0,-0.078,-0.657,0.0,0.0,0.0
2,200.0,107.0,0.69,0.0,0.0,0.0,192.0,111.0,0.62,183.0,93.0,0.38,183.0,93.0,0.38,1.142857,0.166667,67.0,-0.096,0.159,-0.092,-0.234,0.0,0.0,0.0
3,176.0,94.0,0.86,108.0,35.0,0.14,192.0,96.0,0.71,117.0,47.0,0.14,316.0,283.0,0.14,1.5,0.076923,82.0,-0.054,-0.575,-0.117,-0.059,0.0,0.0,0.0
4,190.0,41.0,1.0,233.0,0.0,0.08,190.0,41.0,1.0,183.0,0.0,0.08,0.0,0.0,0.0,1.041667,0.181818,67.0,0.66,-0.849,-0.344,0.554,0.0,0.0,0.0


In [38]:
X_Context_Features.head()

Unnamed: 0,StartTime(ms),EndTime(ms),Length(ms),StartTimestamp,EndTimestamp,StartTimeGMT,EndTimeGMT,ValidityRate,PageFixations,WindowFixations,PageBlinks,WindowBlinks,BottomWindowBound_Page,TopWindowBound_Page,BottomWindowBound_Session,TopWindowBound_Session
0,1294808.0,1322271.0,27.463,1382120000000.0,1382120000000.0,46:38.0,47:05.5,0.873786,100.0,11.0,7.0,0.0,12650.0,16650.0,1307458.0,1311458.0
1,1274590.0,1294808.0,20.218,1382120000000.0,1382120000000.0,46:17.8,46:38.0,0.839242,72.0,11.0,4.0,0.0,12650.0,16650.0,1287240.0,1291240.0
2,1251125.0,1274590.0,23.465,1382120000000.0,1382120000000.0,45:54.3,46:17.8,0.762784,73.0,13.0,4.0,0.0,12650.0,16650.0,1263775.0,1267775.0
3,1227726.0,1251125.0,23.399,1382120000000.0,1382120000000.0,45:30.9,45:54.3,0.805556,74.0,14.0,9.0,3.0,12650.0,16650.0,1240376.0,1244376.0
4,1207674.0,1227726.0,20.052,1382120000000.0,1382120000000.0,45:10.9,45:30.9,0.807149,66.0,12.0,9.0,1.0,4508.0,8508.0,1212182.0,1216182.0


# List of Models

The following Weka implementations (with default hyperparameters) were used: 

1. bagging (with REPTree as a base learner); 
2. Bayes net; 
3. naïve Bayes; 
4. logistic regression; 
5. simple logistic regression; 
6. SMO (SVM); 
7. SPegasos (SVM); 
8. voted perceptron; 
9. k-nearest neighbors;
10. conjunctive rule; 
11. decision table; 
12. JRip; 
13. ridor; 
14. decision stump; 
15. AdaBoost; 
16. C4.5 decision tree; 
17. grafted C4.5 decision tree;
18. REPTree; 
19. random forest; and 
20. random tree.

# Model Building using Combination of Features

**varied six additional parameters:** 
- type of mind
- wandering report;
- window size;
- minimum number of fixations; 
- feature types; 
- feature selection;
- outlier treatment/sampling.

## Logistic Regression

### Logistic Regression for Global Features

In [39]:
print(X_Global_Features.shape)

(4077, 47)


In [40]:
y=df['NumberOfReports']
print(y.shape)

(4077,)


In [41]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X_Global_Features, y, test_size=0.25, random_state=0)

In [42]:
from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


LogisticRegression()

In [43]:
y_pred = logisticRegr.predict(x_test)

Kappa values of 0, 1, > 0, and < 0 indicate chance, perfect, above chance, and below chance agreement,
respectively

In [44]:
from sklearn.metrics import cohen_kappa_score
cohen_score = cohen_kappa_score(y_test, y_pred)
print(cohen_score)

0.003067080876158479


### Logistic Regression for Local Features

In [45]:
print(X_Local_Features.shape)
print(y.shape)

(4077, 25)
(4077,)


In [46]:
x_train1, x_test1, y_train1, y_test1 = train_test_split(X_Local_Features, y, test_size=0.25, random_state=0)

In [47]:
logisticRegr1 = LogisticRegression()
logisticRegr1.fit(x_train1, y_train1)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


LogisticRegression()

In [48]:
y_pred1 = logisticRegr1.predict(x_test1)

In [49]:
cohen_score1 = cohen_kappa_score(y_test1, y_pred1)
print(cohen_score1)

0.015482541380254733


### Logistic Regression for Global and Local Features

In [50]:
X_Global_Local_Features=df[['FixDurN','FixDurMed','FixDurSD','FixDurMin','FixDurMax','FixDurRange','FixDurSkew','FixDurKur','FxDisp','SacDurN','SacDurMed','SacDurMean','SacDurSD','SacDurMin','SacDurMax','SacDurRange','SacDurSkew','SacDurKur','SacAngAbsN','SacAngAbsMed','SacAngAbsMean','SacAngAbsSD','SacAngAbsMin','SacAngAbsMax','SacAngAbsRange','SacAngAbsSkew','SacAngAbsKur','PupilDiametersZN','PupilDiametersZMed','PupilDiametersZMean','PupilDiametersZSD','PupilDiametersZMin','PupilDiametersZMax','PupilDiametersZRange','PupilDiametersZSkew','PupilDiametersZKur','horizontalSaccadeProp','FxSacRatio','BlinkDurN','BlinkDurMed','BlinkDurMean','BlinkDurSD','BlinkDurMin','BlinkDurMax','BlinkDurRange','BlinkDurSkew','BlinkDurKur','FirstPassFixDurMean','FirstPassFixDurSD','FirstPassFixProp','RegFixDurMean','RegFixDurSD','RegFixProp','SingleFixDurMean','SingleFixDurSD','SingleFixProp','NoWordFixDurMean','NoWordFixDurSD','NoWordFixProp','GazeFixDurMean','GazeFixDurSD','GazeFixProp','WordSkipProp','propCrossLineSaccades','readingDepth','WordLenToFixDurCorr','FreqToFixDurCorr','NumSynsToFixDurCorr','HypDepthToFixDurCorr','EndOfClauseFixDurMean','EndOfClauseFixDurSD','EndOfClauseFixProp']]

In [51]:
print(X_Global_Local_Features.shape)
print(y.shape)

(4077, 72)
(4077,)


In [52]:
x_train2, x_test2, y_train2, y_test2 = train_test_split(X_Global_Local_Features, y, test_size=0.25, random_state=0)

In [53]:
logisticRegr2 = LogisticRegression()
logisticRegr2.fit(x_train2, y_train2)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


LogisticRegression()

In [54]:
y_pred2 = logisticRegr2.predict(x_test2)

In [55]:
cohen_score2 = cohen_kappa_score(y_test2, y_pred2)
print(cohen_score2)

0.001242040100151831


### Logistic Regression for Global, Local and Contextual Features

In [56]:
X_Global_Local_Context_Features=df[['FixDurN','FixDurMed','FixDurSD','FixDurMin','FixDurMax','FixDurRange','FixDurSkew','FixDurKur','FxDisp','SacDurN','SacDurMed','SacDurMean','SacDurSD','SacDurMin','SacDurMax','SacDurRange','SacDurSkew','SacDurKur','SacAngAbsN','SacAngAbsMed','SacAngAbsMean','SacAngAbsSD','SacAngAbsMin','SacAngAbsMax','SacAngAbsRange','SacAngAbsSkew','SacAngAbsKur','PupilDiametersZN','PupilDiametersZMed','PupilDiametersZMean','PupilDiametersZSD','PupilDiametersZMin','PupilDiametersZMax','PupilDiametersZRange','PupilDiametersZSkew','PupilDiametersZKur','horizontalSaccadeProp','FxSacRatio','BlinkDurN','BlinkDurMed','BlinkDurMean','BlinkDurSD','BlinkDurMin','BlinkDurMax','BlinkDurRange','BlinkDurSkew','BlinkDurKur','FirstPassFixDurMean','FirstPassFixDurSD','FirstPassFixProp','RegFixDurMean','RegFixDurSD','RegFixProp','SingleFixDurMean','SingleFixDurSD','SingleFixProp','NoWordFixDurMean','NoWordFixDurSD','NoWordFixProp','GazeFixDurMean','GazeFixDurSD','GazeFixProp','WordSkipProp','propCrossLineSaccades','readingDepth','WordLenToFixDurCorr','FreqToFixDurCorr','NumSynsToFixDurCorr','HypDepthToFixDurCorr','EndOfClauseFixDurMean','EndOfClauseFixDurSD','EndOfClauseFixProp','StartTime(ms)','EndTime(ms)','Length(ms)','StartTimestamp','EndTimestamp','ValidityRate','PageFixations','WindowFixations','PageBlinks','WindowBlinks','BottomWindowBound_Page','TopWindowBound_Page','BottomWindowBound_Session','TopWindowBound_Session']]

In [57]:
print(X_Global_Local_Context_Features.shape)
print(y.shape)

(4077, 86)
(4077,)


In [58]:
x_train3, x_test3, y_train3, y_test3 = train_test_split(X_Global_Local_Context_Features, y, test_size=0.25, random_state=0)

In [59]:
logisticRegr3 = LogisticRegression()
logisticRegr3.fit(x_train3, y_train3)

LogisticRegression()

In [60]:
y_pred3 = logisticRegr3.predict(x_test3)

In [61]:
cohen_score3 = cohen_kappa_score(y_test3, y_pred3)
print(cohen_score3)

0.0


Most significant features are the global features since kappa value is close to 1 or greater than 0

## SVM

In [62]:
from sklearn.svm import SVC

### SVM for Global Features

In [63]:
clf1 = SVC(gamma='auto')
clf1.fit(x_train, y_train)
y_pred_SVM1=clf1.predict(x_test)
cohen_scoreSVM = cohen_kappa_score(y_test, y_pred_SVM1)
print(cohen_scoreSVM)

0.0


### SVM for Local Features

In [64]:
clf2 = SVC(gamma='auto')
clf2.fit(x_train1, y_train1)
y_pred_SVM2=clf2.predict(x_test1)
cohen_scoreSVM2 = cohen_kappa_score(y_test1, y_pred_SVM2)
print(cohen_scoreSVM2)

0.0


### SVM for Global and Local Features

In [65]:
clf3 = SVC(gamma='auto')
clf3.fit(x_train2, y_train2)
y_pred_SVM3=clf3.predict(x_test2)
cohen_scoreSVM3 = cohen_kappa_score(y_test2, y_pred_SVM3)
print(cohen_scoreSVM3)

0.0


### SVM for Global, Local and Contextual Features

In [66]:
clf4 = SVC(gamma='auto')
clf4.fit(x_train3, y_train3)
y_pred4=clf4.predict(x_test3)
cohen_score4 = cohen_kappa_score(y_test3, y_pred4)
print(cohen_score4)

0.0


## Naive Bayes

GaussianNB implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:

In [67]:
from sklearn.naive_bayes import GaussianNB

### Naive Bayes for Global Features

In [68]:
clf_NB1 = GaussianNB()
clf_NB1.fit(x_train, y_train)
y_pred_NB1=clf_NB1.predict(x_test)
cohen_scoreNB1 = cohen_kappa_score(y_test, y_pred_NB1)
print(cohen_scoreNB1)

0.072811189654999


### Naive Bayes for Local Features

In [69]:
clf_NB2 = GaussianNB()
clf_NB2.fit(x_train1, y_train1)
y_pred_NB2=clf_NB2.predict(x_test1)
cohen_scoreNB2 = cohen_kappa_score(y_test1, y_pred_NB2)
print(cohen_scoreNB2)

-0.004329637076863069


### Naive Bayes for Global and Local Features

In [70]:
clf_NB3 = GaussianNB()
clf_NB3.fit(x_train2, y_train2)
y_pred_NB3=clf_NB3.predict(x_test2)
cohen_scoreNB3 = cohen_kappa_score(y_test2, y_pred_NB3)
print(cohen_scoreNB3)

0.03582157305170974


## Random Forest

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

In [71]:
from sklearn.ensemble import RandomForestClassifier

### Random Forest for Global Features

In [72]:
clf_RF1 = RandomForestClassifier()
clf_RF1.fit(x_train, y_train)
y_pred_RF1=clf_RF1.predict(x_test)
cohen_scoreRF1 = cohen_kappa_score(y_test, y_pred_RF1)
print(cohen_scoreRF1)

0.06340768687808818


### Random Forest for Local Features

In [73]:
clf_RF2 = RandomForestClassifier()
clf_RF2.fit(x_train1, y_train1)
y_pred_RF2=clf_RF2.predict(x_test1)
cohen_scoreRF2 = cohen_kappa_score(y_test1, y_pred_RF2)
print(cohen_scoreRF2)

0.017785590569500798


### Random Forest for Global and Local Features

In [74]:
clf_RF3 = RandomForestClassifier()
clf_RF3.fit(x_train2, y_train2)
y_pred_RF3=clf_RF3.predict(x_test2)
cohen_scoreRF3 = cohen_kappa_score(y_test3, y_pred_RF3)
print(cohen_scoreRF3)

0.08836114277172613


## AdaBoost

An AdaBoost [1] classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

This class implements the algorithm known as AdaBoost-SAMME [2].

In [75]:
from sklearn.ensemble import AdaBoostClassifier

### AdaBoost for Global Features

In [76]:
clf_ADA = AdaBoostClassifier()
clf_ADA.fit(x_train, y_train)
y_pred_ADA1=clf_ADA.predict(x_test)
cohen_scoreADA1 = cohen_kappa_score(y_test, y_pred_ADA1)
print(cohen_scoreADA1)

0.0


### AdaBoost for Local Features

In [77]:
clf_ADA1 = AdaBoostClassifier()
clf_ADA1.fit(x_train1, y_train1)
y_pred_ADA2=clf_ADA1.predict(x_test1)
cohen_scoreADA2 = cohen_kappa_score(y_test1, y_pred_ADA2)
print(cohen_scoreADA2)

0.0


### AdaBoost for Global and Local Features

In [78]:
clf_ADA2 = AdaBoostClassifier()
clf_ADA2.fit(x_train2, y_train2)
y_pred_ADA3=clf_ADA2.predict(x_test2)
cohen_scoreADA3 = cohen_kappa_score(y_test2, y_pred_ADA3)
print(cohen_scoreADA3)

0.0


## k-nearest neighbors

Classifier implementing the k-nearest neighbors vote.

In [79]:
from sklearn.neighbors import KNeighborsClassifier

### k-nearest neighbors for Global Features

In [80]:
clf_KNN1 = KNeighborsClassifier()
clf_KNN1.fit(x_train, y_train)
y_pred_KNN1=clf_KNN1.predict(x_test)
cohen_score_KNN1 = cohen_kappa_score(y_test, y_pred_KNN1)
print(cohen_score_KNN1)

0.0022604185790342513


### k-nearest neighbors for Local Features

In [81]:
clf_KNN2 = KNeighborsClassifier()
clf_KNN2.fit(x_train1, y_train1)
y_pred_KNN2=clf_KNN2.predict(x_test1)
cohen_score_KNN2 = cohen_kappa_score(y_test1, y_pred_KNN2)
print(cohen_score_KNN2)

0.07159487776484286


### k-nearest neighbors for Global and Local Features

In [82]:
clf_KNN3 = KNeighborsClassifier()
clf_KNN3.fit(x_train2, y_train2)
y_pred_KNN3=clf_KNN3.predict(x_test2)
cohen_score_KNN3 = cohen_kappa_score(y_test2, y_pred_KNN3)
print(cohen_score_KNN3)

-0.02331245650661118


### k-nearest neighbors for Global, Local and Contextual Features

In [83]:
clf_KNN4 = KNeighborsClassifier()
clf_KNN4.fit(x_train3, y_train3)
y_pred_KNN4=clf_KNN4.predict(x_test3)
cohen_score_KNN4 = cohen_kappa_score(y_test3, y_pred_KNN4)
print(cohen_score_KNN4)

0.1689931532296587


## Bagging

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting . If samples are drawn with replacement, then the method is known as Bagging . When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces . Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches .

In [84]:
from sklearn.ensemble import BaggingClassifier

### Bagging for Global Features

In [85]:
#clf = BaggingClassifier(base_estimator=SVC(),n_estimators=10, random_state=0).fit(X, y)
clf_BAG1 = BaggingClassifier()
clf_BAG1.fit(x_train, y_train)
y_pred_clf_BAG1=clf_BAG1.predict(x_test)
cohen_score_clf_BAG1 = cohen_kappa_score(y_test, y_pred_clf_BAG1)
print(cohen_score_clf_BAG1)

0.09934033043682688


### Bagging for Local Features

In [86]:
clf_BAG2 = BaggingClassifier()
clf_BAG2.fit(x_train1, y_train1)
y_pred_clf_BAG2=clf_BAG2.predict(x_test1)
cohen_score_clf_BAG2 = cohen_kappa_score(y_test1, y_pred_clf_BAG2)
print(cohen_score_clf_BAG2)

-0.024340373172813123


### Bagging for Global and Local Features

In [87]:
clf_BAG3 = BaggingClassifier()
clf_BAG3.fit(x_train2, y_train2)
y_pred_clf_BAG3=clf_BAG3.predict(x_test2)
cohen_score_clf_BAG3 = cohen_kappa_score(y_test2, y_pred_clf_BAG3)
print(cohen_score_clf_BAG3)

0.06581031652314173


### Bagging for Global, Local and Contextual Features

In [88]:
clf_BAG4 = BaggingClassifier()
clf_BAG4.fit(x_train3, y_train3)
y_pred_clf_BAG4=clf_BAG4.predict(x_test3)
cohen_score_clf_BAG4 = cohen_kappa_score(y_test3, y_pred_clf_BAG4)
print(cohen_score_clf_BAG4)

0.9004613816940629


## Decision Tree

DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.

As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or dense, of size [n_samples, n_features] holding the training samples, and an array Y of integer values, size [n_samples], holding the class labels for the training samples

In [89]:
from sklearn import tree

### Decision Tree for Global Features

In [90]:
clf_Decision_Tree1 = tree.DecisionTreeClassifier()
clf_Decision_Tree1.fit(x_train,y_train)
y_pred_clf_Decision_Tree1=clf_Decision_Tree1.predict(x_test)
cohen_score_clf_Decision_Tree1 = cohen_kappa_score(y_test, y_pred_clf_Decision_Tree1)
print(cohen_score_clf_Decision_Tree1)

0.0452425416041875


### Decision Tree for Local Features

In [91]:
clf_Decision_Tree2 = tree.DecisionTreeClassifier()
clf_Decision_Tree2.fit(x_train1,y_train1)
y_pred_clf_Decision_Tree2=clf_Decision_Tree2.predict(x_test1)
cohen_score_clf_Decision_Tree2 = cohen_kappa_score(y_test1, y_pred_clf_Decision_Tree2)
print(cohen_score_clf_Decision_Tree2)

0.018987901514079586


### Decision Tree for Global and Local Features

In [92]:
clf_Decision_Tree3 = tree.DecisionTreeClassifier()
clf_Decision_Tree3.fit(x_train2,y_train2)
y_pred_clf_Decision_Tree3=clf_Decision_Tree3.predict(x_test2)
cohen_score_clf_Decision_Tree3 = cohen_kappa_score(y_test2, y_pred_clf_Decision_Tree3)
print(cohen_score_clf_Decision_Tree3)

0.08410450911673861


### Decision Tree for Global, Local and Contextual Features

In [93]:
clf_Decision_Tree4 = tree.DecisionTreeClassifier()
clf_Decision_Tree4.fit(x_train3,y_train3)
y_pred_clf_Decision_Tree4=clf_Decision_Tree4.predict(x_test3)
cohen_score_clf_Decision_Tree4 = cohen_kappa_score(y_test3, y_pred_clf_Decision_Tree4)
print(cohen_score_clf_Decision_Tree4)

0.8743923927913324


## Voted Perceptron

The Voted Perceptron (Freund and Schapire, 1999), is a variant using multiple weighted perceptrons. The algorithm starts a new perceptron every time an example is wrongly classified, initializing the weights vector with the final weights of the last perceptron.

**SGD and Voted Perceptron(weighted perceptron) are same:**

Stochastic Gradient Descent
Gradient Descent is the process of minimizing a function by following the gradients of the cost function.

This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction, e.g. downhill towards the minimum value.

In machine learning, we can use a technique that evaluates and updates the weights every iteration called stochastic gradient descent to minimize the error of a model on our training data.

The way this optimization algorithm works is that each training instance is shown to the model one at a time. The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction.

This procedure can be used to find the set of weights in a model that result in the smallest error for the model on the training data.

For the Perceptron algorithm, each iteration the weights (w) are updated using the equation:

w = w + learning_rate * (expected - predicted) * x

Where w is weight being optimized, learning_rate is a learning rate that you must configure (e.g. 0.01), (expected – predicted) is the prediction error for the model on the training data attributed to the weight and x is the input value.

In [94]:
from sklearn.linear_model import Perceptron

#Same As SGD:
#SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None)

###  Voted Perceptron for Global Features

In [95]:
clf_Perceptron1 = Perceptron()
clf_Perceptron1.fit(x_train, y_train)
y_pred_clf_Perceptron1=clf_Perceptron1.predict(x_test)
cohen_score_clf_Perceptron1 = cohen_kappa_score(y_test, y_pred_clf_Perceptron1)
print(cohen_score_clf_Perceptron1)

0.007914878339344233


###  Voted Perceptron for Local Features

In [96]:
clf_Perceptron2 = Perceptron()
clf_Perceptron2.fit(x_train1, y_train1)
y_pred_clf_Perceptron2=clf_Perceptron2.predict(x_test1)
cohen_score_clf_Perceptron2 = cohen_kappa_score(y_test1, y_pred_clf_Perceptron2)
print(cohen_score_clf_Perceptron2)

0.0


###  Voted Perceptron for Global and Local Features

In [97]:
clf_Perceptron3 = Perceptron()
clf_Perceptron3.fit(x_train2, y_train2)
y_pred_clf_Perceptron3=clf_Perceptron3.predict(x_test2)
cohen_score_clf_Perceptron3 = cohen_kappa_score(y_test2, y_pred_clf_Perceptron3)
print(cohen_score_clf_Perceptron3)

0.06851553675710753


###  Voted Perceptron for Global, Local and Contextual Features

In [98]:
clf_Perceptron4 = Perceptron()
clf_Perceptron4.fit(x_train3, y_train3)
y_pred_clf_Perceptron4=clf_Perceptron4.predict(x_test3)
cohen_score_clf_Perceptron4 = cohen_kappa_score(y_test3, y_pred_clf_Perceptron4)
print(cohen_score_clf_Perceptron4)

0.0
