# Reiss Lab Analysis Pipeline v3.0

**Subject-Level Analysis for Vowel, Consonant, and CRM Experiments**

This notebook provides a comprehensive, production-quality analysis pipeline for processing pilot data from Reiss Lab experiments. It merges and enhances previous analysis versions by focusing on data density, statistical rigor, and reproducibility.

**Key Features:**
- **Configuration-Driven:** CRM conditions are assigned via a hard-coded dictionary, eliminating interactive prompts and ensuring reproducibility.
- **Data-Dense Visualizations:** All plots are static and designed to be information-rich, suitable for publication.
- **Statistical Rigor:** All reported metrics include sample sizes (n) and error margins (95% CI or SEM), and statistical tests (ANOVA/t-tests) provide full output.
- **Comprehensive Exploratory Analysis:** Includes phonetic feature analysis, reaction time analysis, talker-specific performance, granular CRM masking effects, and session-level temporal analysis.

## 1. Setup and Configuration

This section imports all necessary libraries and defines the configuration for the analysis. Key configurations, such as the CRM condition mapping and phonetic feature definitions, are hard-coded here to ensure reproducibility.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
import os
import re

# Configure plotting style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = [12, 7]

In [None]:
# === CONFIGURATION ===

# CRM Condition Mapping: Assign CRM file indices to conditions.
# The user should edit this dictionary based on their experimental log.
# Example: 'BM': [0, 1, 2, 8] means _crm_0.txt, _crm_1.txt, etc., are Bimodal.
CRM_CONDITION_MAP = {
    'BM': [1, 2, 3, 8],
    'CI': [4, 6, 10],
    'HA': [5, 7, 9]
}

# Phonetic Feature Map for Consonants
# (Voicing, Place, Manner)
FEATURE_MAP = {
    'b': (1, 'Bilabial', 'Plosive'),
    'd': (1, 'Alveolar', 'Plosive'),
    'g': (1, 'Velar',    'Plosive'),
    'p': (0, 'Bilabial', 'Plosive'),
    't': (0, 'Alveolar', 'Plosive'),
    'k': (0, 'Velar',    'Plosive'),
    'm': (1, 'Bilabial', 'Nasal'),
    'n': (1, 'Alveolar', 'Nasal'),
    'f': (0, 'Labiodental', 'Fricative'),
    'v': (1, 'Labiodental', 'Fricative'),
    's': (0, 'Alveolar',    'Fricative'),
    'z': (1, 'Alveolar',    'Fricative'),
    '#': (0, 'Palato-Alveolar', 'Fricative'), # 'sh'
    '_': (1, 'Palato-Alveolar', 'Fricative'), # 'zh'
    '%': (0, 'Palato-Alveolar', 'Affricate'), # 'ch'
    '$': (1, 'Palato-Alveolar', 'Affricate')  # 'j'
}

## 2. Data Loading and Preprocessing

This section defines functions to load and preprocess data for each experiment type (Vowel, Consonant, CRM). The main `load_subject_data` function orchestrates this process, taking a subject ID and data path as input and returning a dictionary of DataFrames. The CRM loading process automatically assigns conditions based on the `CRM_CONDITION_MAP` defined in the configuration section, ensuring a reproducible workflow without manual input.

In [None]:
# --- Helper Functions ---
def _load_vowel_data(base_path, subject_id):
    vowel_cols = ['talker_id', 'vowel_id', 'response_id', 'score', 'rt']
    vowel_map = {1: 'AE', 2: 'AH', 3: 'AW', 4: 'EH', 5: 'IH', 6: 'IY', 7: 'OO', 8: 'UH', 9: 'UW'}
    dfs = []
    for cond in ['BM', 'CI']:
        fpath = os.path.join(base_path, f'{subject_id}_vow9_{cond}_0.txt')
        if os.path.exists(fpath):
            df = pd.read_csv(fpath, sep='\\s+', header=None, names=vowel_cols)
            df['condition'] = cond
            dfs.append(df)
    if not dfs:
        return None
    df_vowel = pd.concat(dfs, ignore_index=True)
    df_vowel['vowel_label'] = df_vowel['vowel_id'].map(vowel_map)
    df_vowel['response_label'] = df_vowel['response_id'].map(vowel_map)
    return df_vowel

def _load_consonant_data(base_path, subject_id):
    cons_cols = ['talker_id', 'consonant_id', 'response_id', 'score', 'rt']
    cons_map = {1: '#', 2: '_', 3: 'b', 4: 'd', 5: 'f', 6: 'g', 7: 'k', 8: 'm', 9: 'n', 10: '%', 11: 'p', 12: 's', 13: 't', 14: 'v', 15: 'z', 16: '$'}
    fpath = os.path.join(base_path, f'{subject_id}_cons_BM_n_0.out')
    if not os.path.exists(fpath):
        return None
    df_consonant = pd.read_csv(fpath, sep='\\s+', header=None, names=cons_cols)
    df_consonant['consonant_label'] = df_consonant['consonant_id'].map(cons_map)
    df_consonant['response_label'] = df_consonant['response_id'].map(cons_map)
    return df_consonant

def _load_crm_data(base_path, subject_id, condition_map):
    crm_cols = ['run', 'target_color', 'response_color', 'target_number', 'response_number', 'snr', 'rt']
    crm_files = sorted([f for f in os.listdir(base_path) if '_crm_' in f and f.endswith('.txt')])
    
    if not crm_files:
        return None, None

    # Invert the condition map for easy lookup
    file_idx_to_condition = {idx: cond for cond, indices in condition_map.items() for idx in indices}

    crm_data, crm_summary_list = [], []
    for i, f in enumerate(crm_files):
        fpath = os.path.join(base_path, f)
        talker, m1, m2 = parse_crm_header(fpath)
        masker_type = get_masker_type(talker, m1, m2)
        condition = file_idx_to_condition.get(i, 'Unknown')

        try:
            df_temp = pd.read_csv(fpath, sep='\\s+', header=None, skiprows=2, names=crm_cols, on_bad_lines='skip')
            df_temp = df_temp[pd.to_numeric(df_temp['run'], errors='coerce').notna()].astype(float)
            srt, sd, revs = calculate_srt(df_temp)

            df_temp['filename'] = f
            df_temp['condition'] = condition
            df_temp['masker_type'] = masker_type
            df_temp['talker_gender'] = get_gender(talker)
            df_temp['masker_gender'] = get_gender(m1) # Assuming both maskers are same gender
            crm_data.append(df_temp)

            crm_summary_list.append({
                'filename': f, 'condition': condition, 'masker_type': masker_type,
                'srt': srt, 'sd': sd, 'reversals': revs, 
                'talker_gender': get_gender(talker)
            })
        except Exception as e:
            print(f"Error processing {f}: {e}")

    df_crm = pd.concat(crm_data, ignore_index=True)
    df_crm_summary = pd.DataFrame(crm_summary_list)
    return df_crm, df_crm_summary

# --- Main Data Loading Orchestrator ---
def load_subject_data(subject_id, data_path, crm_map):
    base_path = os.path.join(data_path, subject_id)
    if not os.path.isdir(base_path):
        print(f"Directory not found: {base_path}")
        return None
    
    print(f'--- Loading Data for Subject: {subject_id} ---')
    data = {}
    data['vowel'] = _load_vowel_data(base_path, subject_id)
    data['consonant'] = _load_consonant_data(base_path, subject_id)
    data['crm'], data['crm_summary'] = _load_crm_data(base_path, subject_id, crm_map)
    
    print('--- Load Summary ---')
    for name, df in data.items():
        status = f'{len(df)} records loaded' if df is not None else 'Not found'
        print(f'{name.capitalize():<12}: {status}')
    print('--------------------\n')
    return data

# --- Utility functions needed by CRM loader ---
def parse_crm_header(filepath):
    try:
        with open(filepath, 'r') as f:
            header = f.readline()
        match = re.search(r'Talker (\\d+), Maskers (\\d+) and (\\d+)', header)
        if match:
            return int(match.group(1)), int(match.group(2)), int(match.group(3))
    except Exception:
        pass
    return None, None, None

def get_gender(talker_id):
    if talker_id is None: return 'U'
    return 'M' if talker_id <= 3 else 'F'

def get_masker_type(talker, masker1, masker2):
    if talker is None: return 'unknown'
    t_gen, m1_gen, m2_gen = get_gender(talker), get_gender(masker1), get_gender(masker2)
    return 'same' if t_gen == m1_gen == m2_gen else 'different' if m1_gen == m2_gen else 'mixed'

def calculate_srt(df_run):
    if df_run.empty: return np.nan, np.nan, 0
    snr = df_run['snr'].values
    correct = ((df_run['target_color'] == df_run['response_color']) & 
               (df_run['target_number'] == df_run['response_number'])).values
    reversals = []
    if len(correct) < 2: return np.nan, np.nan, 0
    prev = correct[0]
    for i in range(1, len(correct)):
        if correct[i] != prev:
            reversals.append(snr[i])
        prev = correct[i]
    
    if len(reversals) >= 5:
        calc_revs = reversals[4:14] if len(reversals) >= 14 else reversals[4:]
        return np.mean(calc_revs), np.std(calc_revs, ddof=1), len(reversals)
    return np.nan, np.nan, len(reversals)

# --- Execute Data Loading ---
SUBJECT_ID = 'CI148'  # CHANGE THIS TO YOUR SUBJECT ID
DATA_PATH = '/app/data/Data' # CHANGE THIS TO THE PARENT 'Data' DIRECTORY

try:
    # This call will load all data for the specified subject
    subject_data = load_subject_data(SUBJECT_ID, DATA_PATH, CRM_CONDITION_MAP)
    # For convenience, unpack the dataframes into their own variables
    df_vowel = subject_data.get('vowel')
    df_consonant = subject_data.get('consonant')
    df_crm = subject_data.get('crm')
    df_crm_summary = subject_data.get('crm_summary')
except Exception as e:
    print(f'An error occurred during data loading: {e}')
    print('Please ensure SUBJECT_ID and DATA_PATH are set correctly.')

## 3. Vowel and Consonant Analysis

This section analyzes phoneme identification performance, including overall accuracy, confusion matrices, phonetic feature transmission for consonants, reaction times, and performance by talker.

In [None]:
# --- 3.1 Vowel Analysis ---

if df_vowel is not None:
    print('--- Vowel Identification Analysis ---')
    # Overall Accuracy
    vowel_accuracy = df_vowel['score'].agg(['mean', 'sem', 'count'])
    print(f"\nOverall Accuracy: {vowel_accuracy['mean']*100:.2f}% (SEM={vowel_accuracy['sem']*100:.2f}, n={vowel_accuracy['count']})\n")

    # Confusion Matrix
    vowel_labels = [v for k, v in sorted({1: 'AE', 2: 'AH', 3: 'AW', 4: 'EH', 5: 'IH', 6: 'IY', 7: 'OO', 8: 'UH', 9: 'UW'}.items())]
    cm = pd.crosstab(df_vowel['vowel_label'], df_vowel['response_label'], normalize='index').reindex(index=vowel_labels, columns=vowel_labels, fill_value=0)
    
    fig, ax = plt.subplots(1, 2, figsize=(18, 7))
    sns.heatmap(cm, annot=True, fmt='.2f', cmap='viridis', ax=ax[0])
    ax[0].set_title('Vowel Confusion Matrix (Probability)')
    
    # Accuracy per Vowel
    per_vowel_stats = df_vowel.groupby('vowel_label')['score'].agg(['mean', 'sem', 'count']).reindex(vowel_labels)
    per_vowel_stats['mean'] *= 100
    per_vowel_stats['sem'] *= 100
    sns.barplot(x=per_vowel_stats.index, y=per_vowel_stats['mean'], ax=ax[1], palette='viridis')
    ax[1].errorbar(x=per_vowel_stats.index, y=per_vowel_stats['mean'], yerr=per_vowel_stats['sem'] * 1.96, fmt='none', c='black', capsize=5)
    ax[1].set_title(f'Per-Vowel Accuracy (n={len(df_vowel)})')
    ax[1].set_ylabel('% Correct (with 95% CI)')
    ax[1].set_xlabel('Vowel')
    plt.tight_layout()
    plt.show()
else:
    print('Vowel data not available.')

In [None]:
# --- 3.2 Consonant Phonetic Feature Analysis ---
if df_consonant is not None:
    print('\n--- Consonant Feature Transmission Analysis ---')
    
    def get_features(label):
        return FEATURE_MAP.get(label, (np.nan, np.nan, np.nan))

    df_cons_features = df_consonant.copy()
    df_cons_features[['target_voicing', 'target_place', 'target_manner']] = df_cons_features['consonant_label'].apply(get_features).apply(pd.Series)
    df_cons_features[['response_voicing', 'response_place', 'response_manner']] = df_cons_features['response_label'].apply(get_features).apply(pd.Series)
    
    features = ['voicing', 'place', 'manner']
    feature_accuracy = {}
    for feature in features:
        correct = (df_cons_features[f'target_{feature}'] == df_cons_features[f'response_{feature}']).sum()
        total = df_cons_features[f'target_{feature}'].notna().sum()
        feature_accuracy[feature.capitalize()] = (correct / total) * 100 if total > 0 else 0

    feature_df = pd.Series(feature_accuracy).reset_index()
    feature_df.columns = ['Feature', 'Accuracy']

    plt.figure(figsize=(8, 5))
    sns.barplot(x='Feature', y='Accuracy', data=feature_df, palette='colorblind')
    plt.title(f'Consonant Feature Transmission (n={len(df_consonant)})')
    plt.ylabel('% Correct')
    plt.ylim(0, 100)
    plt.show()
else:
    print('Consonant data not available.')

In [None]:
# --- 3.3 Reaction Time and Talker Performance Analysis ---

def plot_rt_and_talker(df, experiment_name):
    if df is None or df.empty:
        print(f'{experiment_name} data not available for this analysis.')
        return
    
    fig, ax = plt.subplots(1, 2, figsize=(18, 6))

    # Reaction Time Distribution
    sns.histplot(df['rt'], kde=True, ax=ax[0])
    mean_rt, std_rt = df['rt'].mean(), df['rt'].std()
    ax[0].axvline(mean_rt, color='r', linestyle='--', label=f'Mean: {mean_rt:.2f}s')
    ax[0].set_title(f'{experiment_name} Reaction Time (n={len(df)})\nMean={mean_rt:.2f}s, Std={std_rt:.2f}s')
    ax[0].legend()

    # Talker Performance (Accuracy)
    talker_stats = df.groupby('talker_id')['score'].agg(['mean', 'sem', 'count'])
    talker_stats['mean'] *= 100
    talker_stats['sem'] *= 100
    sns.barplot(x=talker_stats.index, y=talker_stats['mean'], ax=ax[1], palette='coolwarm')
    ax[1].errorbar(x=range(len(talker_stats)), y=talker_stats['mean'], yerr=talker_stats['sem'] * 1.96, fmt='none', c='black', capsize=5)
    ax[1].set_title(f'{experiment_name} Accuracy by Talker')
    ax[1].set_ylabel('% Correct (with 95% CI)')
    ax[1].set_xlabel('Talker ID')
    
    # Add talker gender for context if possible
    talker_genders = {tid: get_gender(tid) for tid in talker_stats.index}
    ax[1].set_xticklabels([f"{tid} ({talker_genders[tid]})" for tid in talker_stats.index])
    
    plt.tight_layout()
    plt.show()

print('\n--- Reaction Time & Talker Performance ---')
plot_rt_and_talker(df_vowel, 'Vowel')
plot_rt_and_talker(df_consonant, 'Consonant')

## 4. Comprehensive CRM Analysis

This section provides a detailed analysis of the Coordinate Response Measure (CRM) task, focusing on Speech Reception Thresholds (SRTs) across various conditions. It includes statistical tests to determine the significance of these differences and granular analysis of error patterns and reaction times.

In [None]:
# --- 4.1 SRT Breakdown and Statistical Testing ---

if df_crm_summary is not None:
    print('--- CRM Speech Reception Threshold (SRT) Analysis ---')
    
    # Filter out runs with no valid SRT
    df_srt = df_crm_summary.dropna(subset=['srt'])
    
    fig, ax = plt.subplots(1, 3, figsize=(20, 6))

    # SRT by Condition
    sns.barplot(data=df_srt, x='condition', y='srt', ax=ax[0], errorbar=('ci', 95), capsize=.1)
    ax[0].set_title(f'SRT by Listening Condition (n={len(df_srt)})')
    ax[0].set_ylabel('SRT (dB SNR) with 95% CI')

    # SRT by Masker Type
    sns.barplot(data=df_srt, x='masker_type', y='srt', ax=ax[1], errorbar=('ci', 95), capsize=.1)
    ax[1].set_title(f'SRT by Masker Type (n={len(df_srt)})')
    ax[1].set_ylabel('')

    # SRT by Interaction
    sns.pointplot(data=df_srt, x='condition', y='srt', hue='masker_type', ax=ax[2], dodge=True, capsize=.1)
    ax[2].set_title('Interaction between Condition and Masker Type')
    ax[2].set_ylabel('')
    plt.tight_layout()
    plt.show()

    # --- Statistical Analysis (ANOVA) ---
    print('\n--- Two-Way ANOVA: SRT ~ Condition + Masker Type ---')
    model = ols('srt ~ C(condition) * C(masker_type)', data=df_srt).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    print(anova_table)

    # --- T-Tests for specific comparisons ---
    print('\n--- Post-hoc T-Tests for Masker Type within each Condition ---')
    for condition in df_srt['condition'].unique():
        cond_df = df_srt[df_srt['condition'] == condition]
        same_gender = cond_df[cond_df['masker_type'] == 'same']['srt']
        diff_gender = cond_df[cond_df['masker_type'] == 'different']['srt']
        if len(same_gender) > 1 and len(diff_gender) > 1:
            ttest_res = stats.ttest_ind(same_gender, diff_gender)
            print(f"Condition '{condition}': t-statistic={ttest_res.statistic:.2f}, p-value={ttest_res.pvalue:.3f}")
        else:
            print(f"Condition '{condition}': Not enough data for t-test.")
else:
    print('CRM data not available.')

In [None]:
# --- 4.2 Granular Talker-Masker Gender Analysis ---
if df_crm_summary is not None:
    print('\n--- Granular Analysis of Talker-Masker Gender Effects ---')
    df_srt = df_crm_summary.dropna(subset=['srt']).copy()
    
    # Create a specific interaction term for plotting
    df_srt['gender_combo'] = df_srt['talker_gender'] + ' talker / ' + df_srt['masker_type'] + ' maskers'
    
    # F-MM vs M-FF comparison
    f_mm_srt = df_srt[(df_srt['talker_gender'] == 'F') & (df_srt['masker_type'] == 'different')]['srt']
    m_ff_srt = df_srt[(df_srt['talker_gender'] == 'M') & (df_srt['masker_type'] == 'different')]['srt']

    if not f_mm_srt.empty and not m_ff_srt.empty:
        f_mm_mean, f_mm_sem = f_mm_srt.mean(), f_mm_srt.sem()
        m_ff_mean, m_ff_sem = m_ff_srt.mean(), m_ff_srt.sem()
        print(f"\nFemale Talker / Male Maskers (F-MM): SRT = {f_mm_mean:.2f} (SEM={f_mm_sem:.2f}, n={len(f_mm_srt)})")
        print(f"Male Talker / Female Maskers (M-FF): SRT = {m_ff_mean:.2f} (SEM={m_ff_sem:.2f}, n={len(m_ff_srt)})")
        
        if len(f_mm_srt) > 1 and len(m_ff_srt) > 1:
            ttest_res_gender = stats.ttest_ind(f_mm_srt, m_ff_srt)
            print(f"T-test (F-MM vs M-FF): t-statistic={ttest_res_gender.statistic:.2f}, p-value={ttest_res_gender.pvalue:.3f}")
    
    # Plotting all gender combos
    plt.figure(figsize=(10, 6))
    sns.barplot(data=df_srt, x='gender_combo', y='srt', errorbar=('ci', 95), capsize=.1, palette='muted')
    plt.title(f'SRT by Talker-Masker Gender Combination (n={len(df_srt)})')
    plt.ylabel('SRT (dB SNR) with 95% CI')
    plt.xlabel('Gender Combination')
    plt.xticks(rotation=45, ha='right')
    plt.show()
else:
    print('CRM data not available.')

In [None]:
# --- 4.3 CRM Error and Reaction Time Analysis ---

if df_crm is not None:
    print('\n--- CRM Error Type & Reaction Time Analysis ---')
    # Calculate correctness for each component
    df_crm['color_correct'] = df_crm['target_color'] == df_crm['response_color']
    df_crm['number_correct'] = df_crm['target_number'] == df_crm['response_number']
    df_crm['overall_correct'] = df_crm['color_correct'] & df_crm['number_correct']

    def get_error_type(row):
        if row['overall_correct']: return 'Correct'
        if not row['color_correct'] and row['number_correct']: return 'Color Error'
        if row['color_correct'] and not row['number_correct']: return 'Number Error'
        return 'Both Error'
    
    df_crm['error_type'] = df_crm.apply(get_error_type, axis=1)
    
    fig, ax = plt.subplots(1, 2, figsize=(18, 6))

    # Error type breakdown
    error_dist = df_crm['error_type'].value_counts(normalize=True).mul(100)
    sns.barplot(x=error_dist.index, y=error_dist.values, ax=ax[0], palette='pastel')
    ax[0].set_title(f'CRM Error Type Distribution (n={len(df_crm)})')
    ax[0].set_ylabel('% of Trials')

    # RT vs SNR
    sns.scatterplot(data=df_crm, x='snr', y='rt', hue='overall_correct', ax=ax[1], alpha=0.5)
    ax[1].set_title('Reaction Time vs. SNR')
    ax[1].set_xlabel('SNR (dB)')
    ax[1].set_ylabel('Reaction Time (s)')
    plt.show()
else:
    print('CRM data not available.')

In [None]:
# --- 4.4 Adaptive Track Visualization ---

if df_crm is not None:
    print('\n--- Visualization of Adaptive Staircase Tracks ---')
    filenames = sorted(df_crm['filename'].unique())
    n_files = len(filenames)
    n_cols = 3
    n_rows = (n_files + n_cols - 1) // n_cols

    fig, axes = plt.subplots(n_rows, n_cols, figsize=(5 * n_cols, 4 * n_rows), squeeze=False)
    axes = axes.flatten()

    for i, filename in enumerate(filenames):
        run_data = df_crm[df_crm['filename'] == filename]
        run_summary = df_crm_summary[df_crm_summary['filename'] == filename].iloc[0]
        
        ax = axes[i]
        sns.lineplot(x=range(len(run_data)), y=run_data['snr'], marker='o', ax=ax, label='SNR Track')
        ax.axhline(run_summary['srt'], ls='--', color='red', label=f"SRT: {run_summary['srt']:.2f} dB")
        ax.set_title(f"{filename} ({run_summary['condition']}, {run_summary['masker_type']})", fontsize=10)
        ax.set_ylabel('SNR (dB)')
        ax.set_xlabel('Trial Number')
        ax.legend()

    for i in range(n_files, len(axes)):
        axes[i].set_visible(False)
    
    plt.tight_layout()
    plt.show()
else:
    print('CRM data not available.')

## 5. Cross-Experiment and Session-Level Analysis

This final section examines relationships between different tasks and looks for performance trends across the entire experimental session.

In [None]:
# --- 5.1 Correlation Analysis: Phoneme vs. Sentence Performance ---

if df_vowel is not None and df_crm_summary is not None:
    print('--- Correlation Analysis ---')
    
    # Get overall scores
    overall_vowel_acc = df_vowel['score'].mean() * 100
    overall_cons_acc = df_consonant['score'].mean() * 100 if df_consonant is not None else np.nan
    avg_srt = df_crm_summary['srt'].mean()

    # Create a summary for this subject
    correlation_data = {
        'Vowel Accuracy (%)': overall_vowel_acc,
        'Consonant Accuracy (%)': overall_cons_acc,
        'Average SRT (dB)': avg_srt
    }
    
    # Note: Correlation is not meaningful for a single subject, but this structure
    # can be extended to a multi-subject dataframe.
    print("Performance Summary for this Subject:")
    for key, value in correlation_data.items():
        print(f'- {key}: {value:.2f}')
        
    # Placeholder for future multi-subject regression plot
    fig, ax = plt.subplots(1, 2, figsize=(12, 5))
    ax[0].scatter(overall_vowel_acc, avg_srt, s=100, label=SUBJECT_ID)
    ax[0].set_title('Vowel Accuracy vs. Average SRT (Example)')
    ax[0].set_xlabel('Vowel Accuracy (%)')
    ax[0].set_ylabel('Average SRT (dB)')
    ax[0].grid(True)

    ax[1].scatter(overall_cons_acc, avg_srt, s=100, label=SUBJECT_ID)
    ax[1].set_title('Consonant Accuracy vs. Average SRT (Example)')
    ax[1].set_xlabel('Consonant Accuracy (%)')
    ax[1].set_ylabel('Average SRT (dB)')
    ax[1].grid(True)
    
    fig.suptitle('Example Cross-Task Correlation (for multi-subject analysis)')
    plt.show()
else:
    print('Data for correlation analysis not available.')

In [None]:
# --- 5.2 Temporal Analysis (Fatigue/Learning) ---
if df_vowel is not None and df_crm is not None:
    print('\n--- Session-Level Temporal Analysis ---')

    # Combine all trials into a single timeline, assuming file order is chronological
    # This is a simplified approach; a more robust way would use timestamps if available.
    df_vowel['task'] = 'Vowel'
    df_consonant['task'] = 'Consonant' if df_consonant is not None else ''
    df_crm['task'] = 'CRM'
    df_crm['score'] = (df_crm['target_color'] == df_crm['response_color']) & (df_crm['target_number'] == df_crm['response_number'])
    
    # Create a global trial index
    df_vowel['global_trial'] = range(len(df_vowel))
    if df_consonant is not None:
        df_consonant['global_trial'] = range(len(df_vowel), len(df_vowel) + len(df_consonant))
    df_crm['global_trial'] = range(len(df_vowel) + (len(df_consonant) if df_consonant is not None else 0), len(df_vowel) + (len(df_consonant) if df_consonant is not None else 0) + len(df_crm))
    
    all_trials = pd.concat([df_vowel[['global_trial', 'score', 'rt', 'task']], 
                            df_consonant[['global_trial', 'score', 'rt', 'task']] if df_consonant is not None else pd.DataFrame(),
                            df_crm[['global_trial', 'score', 'rt', 'task']]], ignore_index=True)
    
    # Calculate rolling accuracy
    all_trials['rolling_acc'] = all_trials['score'].rolling(window=50, min_periods=10).mean() * 100
    all_trials['rolling_rt'] = all_trials['rt'].rolling(window=50, min_periods=10).mean()
    
    fig, ax1 = plt.subplots(figsize=(15, 6))
    
    # Plot rolling accuracy
    sns.lineplot(data=all_trials, x='global_trial', y='rolling_acc', color='blue', ax=ax1, label='Rolling Accuracy (50 trials)')
    ax1.set_xlabel('Global Trial Number (Chronological)')
    ax1.set_ylabel('Rolling Accuracy (%)', color='blue')
    ax1.tick_params(axis='y', labelcolor='blue')
    ax1.set_ylim(0, 100)
    
    # Create a second y-axis for rolling RT
    ax2 = ax1.twinx()
    sns.lineplot(data=all_trials, x='global_trial', y='rolling_rt', color='red', ax=ax2, label='Rolling RT (50 trials)')
    ax2.set_ylabel('Rolling Reaction Time (s)', color='red')
    ax2.tick_params(axis='y', labelcolor='red')
    
    # Add vertical lines to delineate tasks
    if df_consonant is not None:
        ax1.axvline(x=len(df_vowel), color='k', linestyle='--', alpha=0.5, label='Task Change')
    ax1.axvline(x=len(df_vowel) + (len(df_consonant) if df_consonant is not None else 0), color='k', linestyle='--', alpha=0.5)
    
    plt.title('Session-Level Performance Trends (Learning/Fatigue)')
    fig.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax1.transAxes)
    plt.show()

else:
    print('Data for temporal analysis not available.')