# Imports

In [1]:
%matplotlib inline

import scipy as sp
from scipy import stats
import numpy as np
import pandas as pd

import statistics

# Load Data

In [2]:
tracing_df = statistics.load_tracing_features()
recording_df = statistics.load_recording_relations()
recording_df = statistics.associate_recordings(recording_df, tracing_df)
gaze_dfs = statistics.load_gaze_features()
statistics.check_gaze_recording_associations(recording_df, gaze_dfs)
statistics.compute_gaze_features(gaze_dfs)
gaze_df = statistics.combine_gaze_features(gaze_dfs)
full_df = statistics.combine_all_features(recording_df, gaze_df)
full_df.columns

Index(['id', 'subjectNumber', 'scenarioNumber', 'newAfterOld', 'scenarioType',
       'displayType', 'sensorPlacementTime', 'ppvStartTime', 'ccStartTime',
       'inSpO2TargetRangeDuration', 'inSpO2LooseTargetRangeDuration',
       'inSpO2TargetRangeStartTime', 'aboveSpO2TargetRangeDuration',
       'belowSpO2TargetRangeDuration', 'inFiO2TargetRangeDuration',
       'inFiO2TargetRangeStartTime', 'aboveFiO2TargetRangeDuration',
       'belowFiO2TargetRangeDuration', 'spO2SignedErrorIntegral',
       'spO2UnsignedErrorIntegral', 'spO2SquaredErrorIntegral',
       'fiO2LargeAdjustments', 'code', 'visitDuration_fiO2Dial',
       'visitDuration_infant', 'visitDuration_monitorApgarTimer',
       'visitDuration_monitorBlank', 'visitDuration_monitorFiO2',
       'visitDuration_monitorFull', 'visitDuration_monitorGraph',
       'visitDuration_monitorHeartRate', 'visitDuration_monitorSpO2',
       'visitDuration_spO2ReferenceTable',
       'visitDuration_warmerInstrumentPanel', 'visitDuration_co

# Pairing

### Scenario Type Pairing

In [3]:
scenario_pairing = statistics.build_pairing(full_df, 'scenarioType')

In [4]:
scenario_pairing.describe()

Pairing against scenarioType:
  0: easy vs. 1: hard
  28 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [5]:
statistics.test_gaze_count_outcomes(scenario_pairing)

visitCount_infant:
  mean diff = -2.179; stdev diff = 11.342
  Paired t-test:
    |diff| > 0: p = 0.327
   ~diff < 0: p = 0.164
    diff > 0: p = 0.836
visitCount_warmerInstrumentPanel:
  mean diff = 1.464; stdev diff = 10.622
  Paired t-test:
    |diff| > 0: p = 0.480
    diff < 0: p = 0.760
    diff > 0: p = 0.240
visitCount_fiO2Dial:
  mean diff = 1.929; stdev diff = 5.910
  Paired t-test:
   ~|diff| > 0: p = 0.101
    diff < 0: p = 0.949
   *diff > 0: p = 0.051
visitCount_spO2ReferenceTable:
  mean diff = 3.321; stdev diff = 7.000
  Paired t-test:
  **|diff| > 0: p = 0.020
    diff < 0: p = 0.990
  **diff > 0: p = 0.010
visitCount_monitorFull:
  mean diff = 3.357; stdev diff = 12.007
  Paired t-test:
   ~|diff| > 0: p = 0.158
    diff < 0: p = 0.921
   *diff > 0: p = 0.079
visitCount_monitorBlank:
  mean diff = 0.464; stdev diff = 11.268
  Paired t-test:
    |diff| > 0: p = 0.832
    diff < 0: p = 0.584
    diff > 0: p = 0.416
visitCount_monitorApgarTimer:
  mean diff = 0.286; stde

Observations:

* Subjects look at the FiO2 dial maybe more frequently, SpO2 reference table more frequently, the monitor maybe more frequently, the heart rate maybe less frequently, the SpO2 number more frequently, and the combined SpO2 elements more frequently in the easy scenario than in the hard scenario.

### Display Type Pairing

In [6]:
display_pairing = statistics.build_pairing(full_df, 'displayType')

In [7]:
display_pairing.describe()

Pairing against displayType:
  0: minimal vs. 1: full
  32 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [8]:
statistics.test_gaze_count_outcomes(display_pairing)

visitCount_infant:
  mean diff = 1.281; stdev diff = 8.614
  Paired t-test:
    |diff| > 0: p = 0.414
    diff < 0: p = 0.793
    diff > 0: p = 0.207
visitCount_warmerInstrumentPanel:
  mean diff = -0.156; stdev diff = 8.931
  Paired t-test:
    |diff| > 0: p = 0.923
    diff < 0: p = 0.462
    diff > 0: p = 0.538
visitCount_fiO2Dial:
  mean diff = 2.594; stdev diff = 6.118
  Paired t-test:
  **|diff| > 0: p = 0.025
    diff < 0: p = 0.988
  **diff > 0: p = 0.012
visitCount_spO2ReferenceTable:
  mean diff = 1.312; stdev diff = 5.451
  Paired t-test:
   ~|diff| > 0: p = 0.190
    diff < 0: p = 0.905
   *diff > 0: p = 0.095
visitCount_monitorFull:
  mean diff = 4.625; stdev diff = 12.267
  Paired t-test:
  **|diff| > 0: p = 0.044
    diff < 0: p = 0.978
  **diff > 0: p = 0.022
visitCount_monitorBlank:
  mean diff = 8.906; stdev diff = 12.943
  Paired t-test:
  **|diff| > 0: p = 0.001
    diff < 0: p = 1.000
  **diff > 0: p = 0.000
visitCount_monitorApgarTimer:
  mean diff = 12.875; stdev

Observations:

* Subjects look at the FiO2 dial less frequently, the SpO2 reference table maybe less frequently, the monitor less frequently, the blank parts of the monitor less frequently, the apgar timer less frequently, the heart rate less frequently, the SpO2 number less frequently, the combined FiO2 elements more frequently, and the combined SpO2 elements more frequently in the full display than in the minimal display.

### Display Type Pairing, Split by Scenario

In [9]:
scenario_display_pairings = {
    scenario: statistics.build_pairing(scenario_subset, 'displayType')
    for (scenario, scenario_subset) in enumerate(scenario_pairing)
}

#### Easy Scenarios

In [10]:
scenario_display_pairings[0].describe()

Pairing against displayType:
  0: minimal vs. 1: full
  14 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [11]:
statistics.test_gaze_count_outcomes(scenario_display_pairings[0])

visitCount_infant:
  mean diff = -0.357; stdev diff = 9.347
  Paired t-test:
    |diff| > 0: p = 0.893
    diff < 0: p = 0.446
    diff > 0: p = 0.554
visitCount_warmerInstrumentPanel:
  mean diff = -1.214; stdev diff = 7.803
  Paired t-test:
    |diff| > 0: p = 0.584
    diff < 0: p = 0.292
    diff > 0: p = 0.708
visitCount_fiO2Dial:
  mean diff = 2.357; stdev diff = 6.183
  Paired t-test:
   ~|diff| > 0: p = 0.193
    diff < 0: p = 0.904
   *diff > 0: p = 0.096
visitCount_spO2ReferenceTable:
  mean diff = -0.714; stdev diff = 5.861
  Paired t-test:
    |diff| > 0: p = 0.668
    diff < 0: p = 0.334
    diff > 0: p = 0.666
visitCount_monitorFull:
  mean diff = 1.786; stdev diff = 12.084
  Paired t-test:
    |diff| > 0: p = 0.603
    diff < 0: p = 0.698
    diff > 0: p = 0.302
visitCount_monitorBlank:
  mean diff = 8.214; stdev diff = 7.370
  Paired t-test:
  **|diff| > 0: p = 0.001
    diff < 0: p = 0.999
  **diff > 0: p = 0.001
visitCount_monitorApgarTimer:
  mean diff = 11.000; stde

Observations:

* Subjects look at the FiO2 dial maybe less frequently, blank parts of the monitor less frequently, the apgar timer less frequently, the heart rate maybe less frequently, the combined FiO2 elements more frequently, and the combined SpO2 elements more frequently in the full display than in the minimal display.

#### Hard Scenarios

In [12]:
scenario_display_pairings[1].describe()

Pairing against displayType:
  0: minimal vs. 1: full
  14 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [13]:
statistics.test_gaze_count_outcomes(scenario_display_pairings[1])

visitCount_infant:
  mean diff = 1.571; stdev diff = 6.422
  Paired t-test:
    |diff| > 0: p = 0.394
    diff < 0: p = 0.803
   ~diff > 0: p = 0.197
visitCount_warmerInstrumentPanel:
  mean diff = 0.000; stdev diff = 10.296
  Paired t-test:
    |diff| > 0: p = 1.000
    diff < 0: p = 0.500
    diff > 0: p = 0.500
visitCount_fiO2Dial:
  mean diff = 2.214; stdev diff = 5.401
  Paired t-test:
   ~|diff| > 0: p = 0.163
    diff < 0: p = 0.918
   *diff > 0: p = 0.082
visitCount_spO2ReferenceTable:
  mean diff = 1.643; stdev diff = 3.108
  Paired t-test:
   *|diff| > 0: p = 0.079
    diff < 0: p = 0.961
  **diff > 0: p = 0.039
visitCount_monitorFull:
  mean diff = 4.357; stdev diff = 10.913
  Paired t-test:
   ~|diff| > 0: p = 0.174
    diff < 0: p = 0.913
   *diff > 0: p = 0.087
visitCount_monitorBlank:
  mean diff = 10.000; stdev diff = 16.886
  Paired t-test:
   *|diff| > 0: p = 0.052
    diff < 0: p = 0.974
  **diff > 0: p = 0.026
visitCount_monitorApgarTimer:
  mean diff = 10.857; stde

Observations:

* Subjects look at the FiO2 dial maybe less frequently, the SpO2 reference table less frequently, the monitor maybe less frequently, the blank parts of the monitor less frequently, the apgar timer less frequently, the heart rate less frequently, the combined FiO2 elements more frequently, the SpO2 number maybe less frequently, the combined FiO2 elements more frequently, and the combined SpO2 elements more frequently in the full display than in the minimal display.

#### Summary

* 

### Scenario Order Pairing, Split by Scenario

In [14]:
scenario_order_pairings = {
    0: statistics.build_pairing(scenario_pairing[0], 'scenarioNumber', values=(1, 4), check_validity=False),
    1: statistics.build_pairing(scenario_pairing[1], 'scenarioNumber', values=(2, 3), check_validity=False),
}

#### Easy Scenarios

In [15]:
scenario_order_pairings[0].describe()

Pairing against scenarioNumber:
  0: first vs. 1: second
  14 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [16]:
statistics.test_gaze_count_outcomes(scenario_order_pairings[0])

visitCount_infant:
  mean diff = 1.786; stdev diff = 9.182
  Paired t-test:
    |diff| > 0: p = 0.496
    diff < 0: p = 0.752
    diff > 0: p = 0.248
visitCount_warmerInstrumentPanel:
  mean diff = 2.929; stdev diff = 7.334
  Paired t-test:
   ~|diff| > 0: p = 0.174
    diff < 0: p = 0.913
   *diff > 0: p = 0.087
visitCount_fiO2Dial:
  mean diff = 0.929; stdev diff = 6.552
  Paired t-test:
    |diff| > 0: p = 0.618
    diff < 0: p = 0.691
    diff > 0: p = 0.309
visitCount_spO2ReferenceTable:
  mean diff = 0.286; stdev diff = 5.897
  Paired t-test:
    |diff| > 0: p = 0.864
    diff < 0: p = 0.568
    diff > 0: p = 0.432
visitCount_monitorFull:
  mean diff = -0.929; stdev diff = 12.180
  Paired t-test:
    |diff| > 0: p = 0.788
    diff < 0: p = 0.394
    diff > 0: p = 0.606
visitCount_monitorBlank:
  mean diff = -1.500; stdev diff = 10.933
  Paired t-test:
    |diff| > 0: p = 0.629
    diff < 0: p = 0.315
    diff > 0: p = 0.685
visitCount_monitorApgarTimer:
  mean diff = -3.000; stde

No significant differences here.

#### Hard Scenarios

In [17]:
scenario_order_pairings[1].describe()

Pairing against scenarioNumber:
  0: first vs. 1: second
  14 0 vs. 1 pairs.
  Paired t-test alternative hypotheses:
    Ha left-tailed (diff < 0): mean 0 - mean 1 < 0
    Ha two-tailed (|diff| > 0): mean 0 - mean 1 != 0
    Ha right-tailed (diff > 0): mean 0 - mean 1 > 0


In [18]:
statistics.test_gaze_count_outcomes(scenario_order_pairings[1])

visitCount_infant:
  mean diff = -0.571; stdev diff = 6.587
  Paired t-test:
    |diff| > 0: p = 0.759
    diff < 0: p = 0.380
    diff > 0: p = 0.620
visitCount_warmerInstrumentPanel:
  mean diff = -5.000; stdev diff = 9.000
  Paired t-test:
   *|diff| > 0: p = 0.066
  **diff < 0: p = 0.033
    diff > 0: p = 0.967
visitCount_fiO2Dial:
  mean diff = -1.071; stdev diff = 5.738
  Paired t-test:
    |diff| > 0: p = 0.513
    diff < 0: p = 0.256
    diff > 0: p = 0.744
visitCount_spO2ReferenceTable:
  mean diff = -1.214; stdev diff = 3.299
  Paired t-test:
    |diff| > 0: p = 0.207
   ~diff < 0: p = 0.104
    diff > 0: p = 0.896
visitCount_monitorFull:
  mean diff = -5.214; stdev diff = 10.530
  Paired t-test:
   *|diff| > 0: p = 0.098
  **diff < 0: p = 0.049
    diff > 0: p = 0.951
visitCount_monitorBlank:
  mean diff = -9.143; stdev diff = 17.365
  Paired t-test:
   *|diff| > 0: p = 0.080
  **diff < 0: p = 0.040
    diff > 0: p = 0.960
visitCount_monitorApgarTimer:
  mean diff = 4.143; s

Observations:

* Participants in scenario 2 maybe look less at the insttrument panel of the warmer and less at the blank parts of the monitor compared to scenario 3.
* No other significant differences.

#### Summary

* There doesn't seem to be a learning or adaptation effect when we look at gaze counts. This is unlike the results for gaze durations.