# Control Analysis

This notebook exists to show the controlled trial that Stephen recorded to resolve the disparity of ~20% of trials have a RealEye timeseries terminating after the corresponding Tobii timeseries. Given the testing protocol, this makes no sense.
As we cannot reconduct the trials, we need to investigate to salvage this data.

## Sequence comparison
Simply put: do all our invariants hold?


**Invariants** (Trial-level)
1. The Tobii timeseries starts before the RealEye timeseries
2. The Tobii timeseries ends after the RealEye timeseries


**Invariants** (Dataset-level)

3. A given RealEye timeseries is of a (relatively) fixed length  <!--TODO: what is that length?-->

In [1]:
from pathlib import Path
import polars as pl


from RevChem.realeye import iter_parse_raw_data
from RevChem.tobii import (
    find_tobii_realeye_df_pairs,
    read_realeye_raw_gazes_csv,
    unroll_realeye_dataframe_into_record_dataframes,
)

In [2]:
RAW_DATA_ROOT = "../../RevChemData/2025-05-14-Data_Export"
TOBII_ROOT = f"{RAW_DATA_ROOT}/Tobii-All-Snapshot"
REALEYE_ROOT = f"{RAW_DATA_ROOT}/RealEye"

In [3]:
def check_tobii_starts_before_realeye(tobii_df: pl.DataFrame, realeye_df: pl.DataFrame) -> bool:
    return tobii_df["timestamp"].min() < realeye_df["timestamp"].min()

def check_tobii_ends_after_realeye(tobii_df: pl.DataFrame, realeye_df: pl.DataFrame) -> bool:
    return tobii_df["timestamp"].max() > realeye_df["timestamp"].max()


In [4]:
!ls "{TOBII_ROOT}/1.Realeye1,2,3 2025-05-05_Stephen-Kathy-Control.tsv"


[31m../../RevChemData/2025-05-14-Data_Export/Tobii-All-Snapshot/1.Realeye1,2,3 2025-05-05_Stephen-Kathy-Control.tsv[m[m


In [5]:
all_re_dfs = unroll_realeye_dataframe_into_record_dataframes(read_realeye_raw_gazes_csv(Path(REALEYE_ROOT, "raw-gazes.csv")))

In [12]:
from datetime import datetime

In [17]:
dfs_aligned_with_control = [
    df
    for df in all_re_dfs
    if ((time := df["timestamp"][0]).date() == datetime(2025, 5, 5).date())
    and time.hour == 20
]

In [None]:
# 2025-06-
[(len(df), df.head(1)) for df in dfs_aligned_with_control]

[(32,
  shape: (1, 3)
  ┌─────────────────────────┬─────┬─────┐
  │ timestamp               ┆ X   ┆ Y   │
  │ ---                     ┆ --- ┆ --- │
  │ datetime[μs, UTC]       ┆ i64 ┆ i64 │
  ╞═════════════════════════╪═════╪═════╡
  │ 2025-05-05 20:21:40 UTC ┆ 813 ┆ 522 │
  └─────────────────────────┴─────┴─────┘),
 (7285,
  shape: (1, 3)
  ┌─────────────────────────┬─────┬─────┐
  │ timestamp               ┆ X   ┆ Y   │
  │ ---                     ┆ --- ┆ --- │
  │ datetime[μs, UTC]       ┆ i64 ┆ i64 │
  ╞═════════════════════════╪═════╪═════╡
  │ 2025-05-05 20:21:41 UTC ┆ 884 ┆ 514 │
  └─────────────────────────┴─────┴─────┘)]

### Note for those who follow
In my almost-exactly 7:53 minute run (which really starts at 0:40)
- Calibration takes almost exactly a minute, given no errors or recalibration
- I start RealEye calibration at 20:16:13, yet RealEye recorded the start time as 5 minutes later
    - TS [1:53, 7:24] is my test, which makes it 5 minutes 31 seconds
- When I next see the desktop screen it's 20:21:51.
- That's 10 seconds before RealEye records the start time.


## Heatmap inspection
Having looked at the timeseries themselves, this will be a spot check for the visualization
* While the triangle was on screen, I tried to stay focused on it
* When the ocean acidification stimulus was displayed, I looked all over the place after looking at the answer
* On the organic molecule, I believe I traced the structure of the molecule with my eyes