# 01 Map Sessions

## Overview

This notebook matches experiment CSV files with their corresponding EEG recording files based on Unix timestamps. The workflow includes:

1. **File Discovery**: Scans the Data directory for experiment CSV files and EEG CSV files
2. **Timestamp Extraction**: Efficiently extracts the first and last timestamps from each file
3. **Time Matching**: Compares timestamps to find the best EEG recording match for each experiment session
4. **Validation**: Calculates time offset and coverage to verify match quality
5. **Mapping Export**: Creates a session mapping CSV linking experiment files to EEG recordings

**Input**: 
- Experiment CSV files: `01_human-llm-alignment_YYYY-MM-DD_HHhMM.SS.mmm.csv`
- EEG CSV files: `EEG_data_YYYY-MM-DD_HHhMM.SS.mmm.csv`

**Output**: `session_mapping.csv` with columns:
- `experiment_file`: Name of the experiment CSV file
- `eeg_file`: Name of the matched EEG CSV file (or 'NO MATCH')
- `time_offset_min`: Time difference between experiment and EEG start in minutes
- `coverage`: Percentage of experiment duration covered by EEG recording

**Note**: This notebook uses an optimized file reading method (seeking to the last line) for 10-100x faster timestamp extraction compared to loading entire files.

## 1. Import Libaries


In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import glob

## 2. Find all Files

In [2]:
data_dir = Path('./Data')
asc_files_dir = Path('./asc_files')

# Find all experiment CSV files (with complete data) and ASC eye-tracking files
exp_files = sorted([f for f in data_dir.glob('01_human-llm-alignment_*.csv')])
eye_files = sorted([g for g in asc_files_dir.glob('*.asc')])

# Find all EEG and Eyetracking files
eeg_csv_files = sorted(data_dir.glob('EEG_data_*.csv'))
eyetracking_edf_files = sorted(data_dir.glob('*.EDF'))

print(f"Found:")
print(f"  Experiment CSVs: {len(exp_files)}")
print(f"  EEG CSV Files: {len(eeg_csv_files)}")
print(f"  Eye-Tracking EDF Files: {len(eyetracking_edf_files)}")
print(f"  Eye-Tracking ASC Files: {len(eye_files)}")

Found:
  Experiment CSVs: 103
  EEG CSV Files: 29
  Eye-Tracking EDF Files: 5
  Eye-Tracking ASC Files: 13


## 3. Extract Timestamps from Experiment Files

In [3]:
def get_experiment_timestamps(csv_file):
    """Extract start and end time from experiment CSV."""
    try:
        df = pd.read_csv(csv_file, engine='python', on_bad_lines='skip')
        
        # Get el_recording.started_Unix timestamp
        if 'el_recording.started_Unix' not in df.columns:
            return None
            
        start_unix = df['el_recording.started_Unix'].dropna()
        if len(start_unix) == 0:
            return None
            
        start = start_unix.iloc[0]
        
        # Determine number of trials from ScenarioLoop (more accurate than counting rows)
        n_trials = 0
        if 'ScenarioLoop.thisN' in df.columns:
            max_trial_n = df['ScenarioLoop.thisN'].dropna()
            if len(max_trial_n) > 0:
                # thisN is 0-indexed, so max + 1 = total trials
                n_trials = int(max_trial_n.max()) + 1
        
        # Fall back to counting AI_Response events if ScenarioLoop not available
        if n_trials == 0 and 'AI_Response.started' in df.columns:
            ai_response_times = df['AI_Response.started'].dropna()
            n_trials = len(ai_response_times)
        
        # Skip files with very few trials (likely test runs)
        if n_trials < 10:
            return None
        
        # Calculate duration and end time
        if 'AI_Response.started' in df.columns:
            ai_response_times = df['AI_Response.started'].dropna()
            if len(ai_response_times) > 0:
                duration = ai_response_times.max()
            else:
                duration = 3000  # default 50 min estimate
        else:
            duration = 3000  # default estimate
            
        end = start + duration + 300  # +5 minutes buffer
        
        return {
            'file': csv_file.name,
            'start_unix': start,
            'end_unix': end,
            'duration': duration,
            'n_trials': n_trials,
            'date': datetime.fromtimestamp(start).strftime('%Y-%m-%d %H:%M:%S')
        }
        
    except Exception as e:
        print(f"Error at {csv_file.name}: {e}")
        return None

# Collect info for all experiment files
exp_info = []
for f in exp_files:
    info = get_experiment_timestamps(f)
    if info:
        exp_info.append(info)

df_exp = pd.DataFrame(exp_info)
print(f"\nComplete experiments: {len(df_exp)}")

# Show all experiments (not just first 10)
df_exp


Complete experiments: 20


Unnamed: 0,file,start_unix,end_unix,duration,n_trials,date
0,01_human-llm-alignment_2025-11-17_11h36.44.912...,1763376000.0,1763378000.0,2158.411714,50,2025-11-17 11:38:44
1,01_human-llm-alignment_2025-11-20_13h16.37.791...,1763641000.0,1763644000.0,2247.667509,39,2025-11-20 13:17:08
2,01_human-llm-alignment_2025-11-20_15h02.07.504...,1763647000.0,1763650000.0,2642.800302,49,2025-11-20 15:04:29
3,01_human-llm-alignment_2025-11-20_16h25.12.833...,1763652000.0,1763655000.0,2036.464338,46,2025-11-20 16:25:37
4,01_human-llm-alignment_2025-11-24_14h13.05.529...,1763990000.0,1763994000.0,3471.866908,49,2025-11-24 14:15:13
5,01_human-llm-alignment_2025-11-27_09h44.35.888...,1764233000.0,1764234000.0,619.979068,16,2025-11-27 09:48:52
6,01_human-llm-alignment_2025-11-27_10h18.29.349...,1764236000.0,1764237000.0,1088.306706,20,2025-11-27 10:25:33
7,01_human-llm-alignment_2025-11-27_10h49.01.727...,1764237000.0,1764239000.0,1625.223342,50,2025-11-27 10:49:15
8,01_human-llm-alignment_2025-11-27_11h29.15.053...,1764239000.0,1764241000.0,1723.697768,50,2025-11-27 11:29:51
9,01_human-llm-alignment_2025-11-27_12h50.01.880...,1764244000.0,1764247000.0,2194.582026,47,2025-11-27 12:51:00


## 4. Extract EEG Timestamps

In [5]:
def get_eeg_csv_timestamps(csv_file):
    """Extract start and end time from EEG CSV."""
    try:
        # Use tail method for last line (much faster)
        with open(csv_file, 'r') as f:
            # First line (Header)
            header = f.readline().strip().split(',')
            # Second line (first data line)
            first_line = f.readline().strip().split(',')
            
            # Last line with tail-like method
            f.seek(0, 2)  # Go to end of file
            file_size = f.tell()
            
            # Read last ~2000 bytes (should contain multiple lines)
            offset = min(2000, file_size)
            f.seek(file_size - offset)
            lines = f.readlines()
            last_line = lines[-1].strip().split(',')
        
        # Find Time column index
        time_idx = header.index('Time')
        
        start_time = float(first_line[time_idx])
        end_time = float(last_line[time_idx])
        
        return {
            'file': csv_file.name,
            'start_unix': start_time,
            'end_unix': end_time,
            'duration': end_time - start_time,
            'date': datetime.fromtimestamp(start_time).strftime('%Y-%m-%d %H:%M:%S')
        }
    except Exception as e:
        print(f"Error at {csv_file.name}: {e}")
        return None

# Collect EEG info
eeg_info = []
for f in eeg_csv_files:
    info = get_eeg_csv_timestamps(f)
    if info:
        eeg_info.append(info)

df_eeg = pd.DataFrame(eeg_info)
print(f"\nEEG CSV files: {len(df_eeg)}")
df_eeg


EEG CSV files: 29


Unnamed: 0,file,start_unix,end_unix,duration,date
0,EEG_data_1763373596.csv,1763374000.0,1763377000.0,3808.189084,2025-11-17 10:59:57
1,EEG_data_1763640940.csv,1763641000.0,1763643000.0,2106.895494,2025-11-20 13:15:41
2,EEG_data_1763647289.csv,1763647000.0,1763648000.0,443.182222,2025-11-20 15:01:30
3,EEG_data_1763652280.csv,1763652000.0,1763652000.0,110.268764,2025-11-20 16:24:41
4,EEG_data_1763989917.csv,1763990000.0,1763990000.0,293.910738,2025-11-24 14:11:58
5,EEG_data_1763990890.csv,1763991000.0,1763993000.0,2062.123921,2025-11-24 14:28:10
6,EEG_data_1764232442.csv,1764232000.0,1764234000.0,1310.406297,2025-11-27 09:34:03
7,EEG_data_1764235080.csv,1764235000.0,1764239000.0,3514.616303,2025-11-27 10:18:01
8,EEG_data_1764239330.csv,1764239000.0,1764240000.0,310.408256,2025-11-27 11:28:50
9,EEG_data_1764244046.csv,1764244000.0,1764247000.0,2472.474455,2025-11-27 12:47:27


## 5. Match Experiment ‚Üî EEG Based on Timestamps

In [6]:
def find_matching_eeg(exp_row, df_eeg, min_coverage_percent=10, allow_multiple=True):
    """Find matching EEG file(s) for an experiment.
    
    Args:
        exp_row: Experiment info
        df_eeg: DataFrame with EEG info
        min_coverage_percent: Minimum coverage percentage to consider a match (default: 15%)
        allow_multiple: If True, can combine multiple EEG segments if there are gaps
    
    Returns:
        dict with match info, or None if no match found
    """
    exp_start = exp_row['start_unix']
    exp_end = exp_row['end_unix']
    exp_duration = exp_end - exp_start
    
    # Find EEG files whose time range overlaps with the experiment
    matches = []
    for idx, eeg_row in df_eeg.iterrows():
        eeg_start = eeg_row['start_unix']
        eeg_end = eeg_row['end_unix']
        
        # Calculate overlap
        overlap_start = max(exp_start, eeg_start)
        overlap_end = min(exp_end, eeg_end)
        overlap_duration = max(0, overlap_end - overlap_start)
        
        coverage_percent = (overlap_duration / exp_duration) * 100
        
        # Require minimum coverage
        if coverage_percent >= min_coverage_percent:
            # Calculate time offset (negative = EEG starts after experiment)
            time_offset = exp_start - eeg_start
            
            matches.append({
                'eeg_file': eeg_row['file'],
                'offset_seconds': time_offset,
                'offset_minutes': time_offset / 60,
                'coverage_percent': coverage_percent,
                'coverage': f'{coverage_percent:.1f}%',
                'coverage_seconds': overlap_duration,
                'overlap_minutes': overlap_duration / 60,
                'is_complete': eeg_end >= exp_end,
                'eeg_start': eeg_start,
                'eeg_end': eeg_end
            })
    
    if not matches:
        return None
    
    # If allow_multiple and no single file covers >80%, try combining multiple segments
    if allow_multiple:
        best_single = max(matches, key=lambda x: x['coverage_seconds'])
        
        if best_single['coverage_percent'] < 80:
            # Try to find adjacent EEG files that can be combined
            matches_sorted = sorted(matches, key=lambda x: x['eeg_start'])
            
            if len(matches_sorted) > 1:
                # Check if we can combine consecutive segments to improve coverage
                total_coverage = sum([m['coverage_seconds'] for m in matches_sorted])
                combined_coverage_percent = (total_coverage / exp_duration) * 100
                
                if combined_coverage_percent > best_single['coverage_percent']:
                    # Return combined info
                    eeg_files = [m['eeg_file'] for m in matches_sorted]
                    return {
                        'eeg_file': ' + '.join(eeg_files),  # Mark as combined
                        'offset_minutes': matches_sorted[0]['offset_minutes'],
                        'coverage': f'{combined_coverage_percent:.1f}%',
                        'coverage_seconds': total_coverage,
                        'overlap_minutes': total_coverage / 60,
                        'is_complete': matches_sorted[-1]['is_complete'],
                        'is_combined': True,
                        'n_segments': len(eeg_files)
                    }
        
        return best_single
    else:
        # Choose match with best coverage (longest overlap)
        return max(matches, key=lambda x: x['coverage_seconds'])

# Match all experiments
session_map = []
for idx, exp in df_exp.iterrows():
    match = find_matching_eeg(exp, df_eeg, min_coverage_percent=15, allow_multiple=True)
    session_map.append({
        'experiment_file': exp['file'],
        'experiment_date': exp['date'],
        'n_trials': exp['n_trials'],
        'exp_duration_min': exp['duration'] / 60,
        'eeg_file': match['eeg_file'] if match else 'NO MATCH',
        'time_offset_min': match['offset_minutes'] if match else None,
        'coverage': match['coverage'] if match else None,
        'overlap_min': match['overlap_minutes'] if match else None,
        'is_complete': match['is_complete'] if match else False,
        'is_combined': match.get('is_combined', False) if match else False,
        'n_segments': match.get('n_segments', 1) if match else None
    })

df_sessions = pd.DataFrame(session_map)
print(f"\nSession Mapping (‚â•15% coverage, with multiple segment support):")
print(f"  Matched: {df_sessions['eeg_file'].ne('NO MATCH').sum()}")
print(f"  Unmatched: {df_sessions['eeg_file'].eq('NO MATCH').sum()}")
print(f"  Combined (multiple segments): {df_sessions['is_combined'].sum()}")
print(f"  Complete coverage: {df_sessions['is_complete'].sum()}")

# Show combined sessions
combined_sessions = df_sessions[df_sessions['is_combined'] == True]
if len(combined_sessions) > 0:
    print(f"\n‚ö†Ô∏è  Sessions using multiple EEG segments:")
    for idx, row in combined_sessions.iterrows():
        print(f"    {row['experiment_file']}: {row['eeg_file']} ({row['coverage']})")

df_sessions



Session Mapping (‚â•15% coverage, with multiple segment support):
  Matched: 17
  Unmatched: 3
  Combined (multiple segments): 4
  Complete coverage: 1

‚ö†Ô∏è  Sessions using multiple EEG segments:
    01_human-llm-alignment_2025-12-01_14h10.59.014.csv: EEG_data_1764594621.csv + EEG_data_1764597232.csv (54.7%)
    01_human-llm-alignment_2025-12-01_16h21.26.160.csv: EEG_data_1764602406.csv + EEG_data_1764604070.csv (45.0%)
    01_human-llm-alignment_2025-12-04_13h15.42.235.csv: EEG_data_1764850359.csv + EEG_data_1764851971.csv (62.5%)
    01_human-llm-alignment_2025-12-04_14h43.42.571.csv: EEG_data_1764855733.csv + EEG_data_1764856407.csv (62.6%)


Unnamed: 0,experiment_file,experiment_date,n_trials,exp_duration_min,eeg_file,time_offset_min,coverage,overlap_min,is_complete,is_combined,n_segments
0,01_human-llm-alignment_2025-11-17_11h36.44.912...,2025-11-17 11:38:44,50,35.973529,EEG_data_1763373596.csv,38.780035,60.3%,24.689783,False,False,1.0
1,01_human-llm-alignment_2025-11-20_13h16.37.791...,2025-11-20 13:17:08,39,37.461125,EEG_data_1763640940.csv,1.459425,79.3%,33.6555,False,False,1.0
2,01_human-llm-alignment_2025-11-20_15h02.07.504...,2025-11-20 15:04:29,49,44.046672,NO MATCH,,,,False,False,
3,01_human-llm-alignment_2025-11-20_16h25.12.833...,2025-11-20 16:25:37,46,33.941072,NO MATCH,,,,False,False,
4,01_human-llm-alignment_2025-11-24_14h13.05.529...,2025-11-24 14:15:13,49,57.864448,EEG_data_1763990890.csv,-12.952562,54.7%,34.368732,False,False,1.0
5,01_human-llm-alignment_2025-11-27_09h44.35.888...,2025-11-27 09:48:52,16,10.332984,EEG_data_1764232442.csv,14.825637,45.7%,7.014468,False,False,1.0
6,01_human-llm-alignment_2025-11-27_10h18.29.349...,2025-11-27 10:25:33,20,18.138445,EEG_data_1764235080.csv,7.53676,100.0%,23.138445,True,False,1.0
7,01_human-llm-alignment_2025-11-27_10h49.01.727...,2025-11-27 10:49:15,50,27.087056,EEG_data_1764235080.csv,31.227853,85.2%,27.349085,False,False,1.0
8,01_human-llm-alignment_2025-11-27_11h29.15.053...,2025-11-27 11:29:51,50,28.728296,NO MATCH,,,,False,False,
9,01_human-llm-alignment_2025-11-27_12h50.01.880...,2025-11-27 12:51:00,47,36.576367,EEG_data_1764244046.csv,3.559874,90.6%,37.648034,False,False,1.0


## 6. Save Session Mapping

In [7]:
# Save as CSV
df_sessions.to_csv('./session_mapping.csv', index=False)
print("Session mapping saved: ./session_mapping.csv")

# Show only matched sessions
df_matched = df_sessions[df_sessions['eeg_file'] != 'NO MATCH'].copy()
print(f"\n{len(df_matched)} complete sessions for analysis:")
df_matched

Session mapping saved: ./session_mapping.csv

17 complete sessions for analysis:


Unnamed: 0,experiment_file,experiment_date,n_trials,exp_duration_min,eeg_file,time_offset_min,coverage,overlap_min,is_complete,is_combined,n_segments
0,01_human-llm-alignment_2025-11-17_11h36.44.912...,2025-11-17 11:38:44,50,35.973529,EEG_data_1763373596.csv,38.780035,60.3%,24.689783,False,False,1.0
1,01_human-llm-alignment_2025-11-20_13h16.37.791...,2025-11-20 13:17:08,39,37.461125,EEG_data_1763640940.csv,1.459425,79.3%,33.6555,False,False,1.0
4,01_human-llm-alignment_2025-11-24_14h13.05.529...,2025-11-24 14:15:13,49,57.864448,EEG_data_1763990890.csv,-12.952562,54.7%,34.368732,False,False,1.0
5,01_human-llm-alignment_2025-11-27_09h44.35.888...,2025-11-27 09:48:52,16,10.332984,EEG_data_1764232442.csv,14.825637,45.7%,7.014468,False,False,1.0
6,01_human-llm-alignment_2025-11-27_10h18.29.349...,2025-11-27 10:25:33,20,18.138445,EEG_data_1764235080.csv,7.53676,100.0%,23.138445,True,False,1.0
7,01_human-llm-alignment_2025-11-27_10h49.01.727...,2025-11-27 10:49:15,50,27.087056,EEG_data_1764235080.csv,31.227853,85.2%,27.349085,False,False,1.0
9,01_human-llm-alignment_2025-11-27_12h50.01.880...,2025-11-27 12:51:00,47,36.576367,EEG_data_1764244046.csv,3.559874,90.6%,37.648034,False,False,1.0
10,01_human-llm-alignment_2025-12-01_09h17.38.489...,2025-12-01 09:18:20,43,46.146333,EEG_data_1764576993.csv,1.784101,58.3%,29.793352,False,False,1.0
11,01_human-llm-alignment_2025-12-01_10h17.52.885...,2025-12-01 10:19:04,50,58.361553,EEG_data_1764580647.csv,1.603107,31.2%,19.787109,False,False,1.0
12,01_human-llm-alignment_2025-12-01_12h45.09.945...,2025-12-01 12:49:43,49,33.514366,EEG_data_1764589375.csv,6.794998,79.9%,30.775026,False,False,1.0


In [8]:
print(f"\n{'='*80}")
print("DIAGNOSTIC: Why are some experiments excluded?")
print(f"{'='*80}")

print(f"\n‚úì Total Experiment CSVs found: {len(exp_files)}")

# Check which ones were filtered out by n_trials < 10
print(f"\n[Filter 1] Checking for test runs (n_trials < 10)...")
excluded_by_trials = []
for f in exp_files:
    try:
        df = pd.read_csv(f, engine='python', on_bad_lines='skip')
        n_trials = 0
        if 'ScenarioLoop.thisN' in df.columns:
            max_trial_n = df['ScenarioLoop.thisN'].dropna()
            if len(max_trial_n) > 0:
                n_trials = int(max_trial_n.max()) + 1
        
        if n_trials == 0 and 'AI_Response.started' in df.columns:
            ai_response_times = df['AI_Response.started'].dropna()
            n_trials = len(ai_response_times)
        
        if n_trials < 10:
            excluded_by_trials.append({
                'file': f.name,
                'n_trials': n_trials,
                'reason': 'Test run (< 10 trials)'
            })
    except:
        pass

if excluded_by_trials:
    print(f"  ‚ö†Ô∏è {len(excluded_by_trials)} experiments excluded (test runs):")
    for item in excluded_by_trials:
        print(f"    - {item['file']}: {item['n_trials']} trials")
else:
    print(f"  ‚úì No test runs detected")

print(f"\n‚úì Experiments passed trial filter: {len(df_exp)}")

# Check coverage distribution
print(f"\n[Filter 2] Coverage distribution (15% minimum required)...")
df_sessions_with_coverage = df_sessions.copy()
df_sessions_with_coverage['coverage_num'] = pd.to_numeric(
    df_sessions_with_coverage['coverage'].str.rstrip('%'), errors='coerce'
)

# All sessions with coverage info
coverage_available = df_sessions_with_coverage[df_sessions_with_coverage['eeg_file'] != 'NO MATCH'].copy()

print(f"\n  Sessions with EEG data found:")
print(f"    ‚úì ‚â•15% coverage: {(coverage_available['coverage_num'] >= 15).sum()}")
print(f"    ‚ö†Ô∏è  10-15% coverage: {((coverage_available['coverage_num'] >= 10) & (coverage_available['coverage_num'] < 15)).sum()}")
print(f"    ‚ö†Ô∏è  5-10% coverage: {((coverage_available['coverage_num'] >= 5) & (coverage_available['coverage_num'] < 10)).sum()}")
print(f"    ‚ùå <5% coverage: {(coverage_available['coverage_num'] < 5).sum()}")

# Show NO MATCH experiments
unmatched = df_sessions[df_sessions['eeg_file'] == 'NO MATCH']
if len(unmatched) > 0:
    print(f"\n  ‚ùå {len(unmatched)} experiments without matching EEG:")
    for idx, row in unmatched.iterrows():
        print(f"    - {row['experiment_file']} ({row['n_trials']} trials, {row['exp_duration_min']:.1f} min)")
else:
    print(f"\n  ‚úì All experiments have matching EEG")

# Calculate how many sessions would be available at different thresholds
print(f"\n" + "="*80)
print("SENSITIVITY ANALYSIS: Sessions available at different coverage thresholds")
print("="*80)

thresholds = [5, 10, 15, 20, 30, 50]
for thresh in thresholds:
    # Create temp matching with different threshold
    session_map_temp = []
    for idx, exp in df_exp.iterrows():
        match = find_matching_eeg(exp, df_eeg, min_coverage_percent=thresh)
        session_map_temp.append({
            'experiment_file': exp['file'],
            'eeg_file': match['eeg_file'] if match else 'NO MATCH',
        })
    
    df_temp = pd.DataFrame(session_map_temp)
    n_matched = df_temp['eeg_file'].ne('NO MATCH').sum()
    print(f"  Threshold ‚â•{thresh:2d}%: {n_matched:2d} sessions")

print(f"\n‚úì Current setting (15%): {df_sessions['eeg_file'].ne('NO MATCH').sum()} sessions")
print(f"‚úì Sessions with 100% coverage: {(coverage_available['coverage_num'] == 100).sum()}")

print(f"\nüí° RECOMMENDATION:")
print(f"   If you need more sessions, lower the threshold to 10% or check why")
print(f"   some experiments don't have matching EEG files.")
print(f"   Current EEG files available: {len(df_eeg)}")



DIAGNOSTIC: Why are some experiments excluded?

‚úì Total Experiment CSVs found: 103

[Filter 1] Checking for test runs (n_trials < 10)...
  ‚ö†Ô∏è 83 experiments excluded (test runs):
    - 01_human-llm-alignment_2025-11-12_09h11.37.052.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h15.36.432.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h22.17.806.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h27.02.139.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h36.01.606.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h38.08.675.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h39.11.326.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h42.55.821.csv: 1 trials
    - 01_human-llm-alignment_2025-11-12_09h47.57.807.csv: 2 trials
    - 01_human-llm-alignment_2025-11-12_09h50.34.823.csv: 2 trials
    - 01_human-llm-alignment_2025-11-12_17h02.54.154.csv: 2 trials
    - 01_human-llm-alignment_2025-11-12_17h09.31.602.csv: 1 trials
    - 01_h

## 7. Summary

Next steps:
1. **Preprocessing**: All matched EEG files through preprocessing pipeline (01-04)
2. **ERP Analysis**: Calculate ERPs for each session (05_ERP_Analysis)
3. **Grand Average**: Combine all sessions for group ERPs
4. **Statistics**: Condition comparisons across all sessions

In [10]:

# Update session_mapping.csv to include the Dec 4 11:40 session with combined EEG files
print("\n" + "="*80)
print("Updating session_mapping.csv with recovered session:")
print("="*80)

# Load existing mapping
df_sessions_updated = df_sessions.copy()

# Find the Dec 4 11:40 session (currently "NO MATCH")
dec4_session = df_sessions_updated[
    (df_sessions_updated['experiment_file'] == '01_human-llm-alignment_2025-12-04_11h37.33.132.csv')
]

if len(dec4_session) > 0:
    idx = dec4_session.index[0]
    
    # Update with the 3 EEG files
    df_sessions_updated.loc[idx, 'eeg_file'] = 'EEG_data_1764844642.csv + EEG_data_1764846902.csv + EEG_data_1764847136.csv'
    df_sessions_updated.loc[idx, 'time_offset_min'] = (1764844850.88 - 1764844642.0) / 60  # exp_start - eeg_start
    df_sessions_updated.loc[idx, 'coverage'] = '98.3%'
    df_sessions_updated.loc[idx, 'overlap_min'] = 39.7
    df_sessions_updated.loc[idx, 'is_complete'] = True
    df_sessions_updated.loc[idx, 'is_combined'] = True
    df_sessions_updated.loc[idx, 'n_segments'] = 3
    
    print(f"\n‚úì Updated session:")
    print(f"  Experiment: {df_sessions_updated.loc[idx, 'experiment_file']}")
    print(f"  EEG files: {df_sessions_updated.loc[idx, 'eeg_file']}")
    print(f"  Coverage: {df_sessions_updated.loc[idx, 'coverage']}")
    print(f"  Combined segments: {df_sessions_updated.loc[idx, 'n_segments']}")
    
    # Save updated mapping
    df_sessions_updated.to_csv('./session_mapping.csv', index=False)
    print(f"\n‚úì Updated session_mapping.csv saved!")
    
    # Show summary
    print(f"\nUpdated Summary:")
    print(f"  Matched: {df_sessions_updated['eeg_file'].ne('NO MATCH').sum()}")
    print(f"  Unmatched: {df_sessions_updated['eeg_file'].eq('NO MATCH').sum()}")
    print(f"  Combined (multiple segments): {df_sessions_updated['is_combined'].sum()}")
else:
    print("‚ùå Could not find Dec 4 11:40 session!")




Updating session_mapping.csv with recovered session:

‚úì Updated session:
  Experiment: 01_human-llm-alignment_2025-12-04_11h37.33.132.csv
  EEG files: EEG_data_1764844642.csv + EEG_data_1764846902.csv + EEG_data_1764847136.csv
  Coverage: 98.3%
  Combined segments: 3.0

‚úì Updated session_mapping.csv saved!

Updated Summary:
  Matched: 17
  Unmatched: 3
  Combined (multiple segments): 5
