# Comprehensive Analysis Pipeline for CI148 Data
## CRM, Vowel, and Consonant Perception Analysis

This pipeline provides comprehensive analysis of:
1. **CRM (Coordinate Response Measure)**: Speech-in-noise with gender-specific maskers
2. **Vowel Perception**: 9-vowel identification in bimodal (BM) and CI conditions
3. **Consonant Perception**: Consonant identification performance

### Key Research Questions:
- How does masker gender affect speech perception (VGRM)?
- Are there differences between same vs. different gender masker conditions?
- Do vowel and consonant perception patterns differ?
- What are the relationships between different perceptual measures?

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import ttest_ind, ttest_rel, f_oneway, mannwhitneyu, wilcoxon, kruskal
from scipy.stats import pearsonr, spearmanr
import glob
import re
import warnings
from pathlib import Path
import itertools
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

warnings.filterwarnings('ignore')

# Set style for publication-quality figures
sns.set_style('whitegrid')
sns.set_context('paper', font_scale=1.2)
plt.rcParams['figure.dpi'] = 150
plt.rcParams['savefig.dpi'] = 300

## 1. Data Loading Functions

In [2]:
def parse_crm_file(filepath):
    """Parse CRM data file with automatic gender detection."""
    with open(filepath, 'r') as f:
        lines = f.readlines()
    
    # Extract header information
    header = lines[0]
    talker = int(re.search(r'Talker (\d+)', header).group(1))
    maskers_match = re.search(r'Maskers (\d+) and (\d+)', header)
    masker1 = int(maskers_match.group(1))
    masker2 = int(maskers_match.group(2))
    
    # Determine genders (0-3 = male, 4-7 = female)
    target_gender = 'F' if talker >= 4 else 'M'
    masker1_gender = 'F' if masker1 >= 4 else 'M'
    masker2_gender = 'F' if masker2 >= 4 else 'M'
    masker_genders = ''.join(sorted([masker1_gender, masker2_gender]))
    
    # Classify condition
    if target_gender == 'M' and masker_genders == 'MM':
        condition = 'M-MM'  # Same gender
    elif target_gender == 'F' and masker_genders == 'FF':
        condition = 'F-FF'  # Same gender
    elif target_gender == 'M' and masker_genders == 'FF':
        condition = 'M-FF'  # Different gender
    elif target_gender == 'F' and masker_genders == 'MM':
        condition = 'F-MM'  # Different gender
    else:
        condition = f'{target_gender}-{masker_genders}'  # Mixed maskers
    
    # Parse data
    data = []
    for line in lines[2:]:
        if 'SRT' in line:
            srt_match = re.search(r'SRT.*?([-\d.]+) dB.*?SD.*?([-\d.]+) dB', line)
            if srt_match:
                srt = float(srt_match.group(1))
                sd = float(srt_match.group(2))
                return {
                    'file': filepath.name,
                    'target': talker,
                    'masker1': masker1,
                    'masker2': masker2,
                    'target_gender': target_gender,
                    'masker_genders': masker_genders,
                    'condition': condition,
                    'gender_match': 'same' if condition in ['M-MM', 'F-FF'] else 'different',
                    'srt': srt,
                    'sd': sd
                }
        elif line.strip() and not line.startswith('Run'):
            parts = line.split()
            if len(parts) >= 6:
                try:
                    run = int(parts[0])
                    col_target = int(parts[1])
                    col_response = int(parts[2])
                    num_target = int(parts[3])
                    num_response = int(parts[4])
                    snr = float(parts[5])
                    correct = (col_target == col_response) and (num_target == num_response)
                    data.append({
                        'run': run,
                        'col_target': col_target,
                        'col_response': col_response,
                        'num_target': num_target,
                        'num_response': num_response,
                        'snr': snr,
                        'correct': correct
                    })
                except ValueError:
                    continue
    return None

def parse_vowel_file(filepath):
    """Parse vowel identification data."""
    data = []
    with open(filepath, 'r') as f:
        for line in f:
            parts = line.strip().split()
            if len(parts) == 5:
                trial = int(parts[0])
                target = int(parts[1])
                response = int(parts[2])
                correct = int(parts[3])
                rt = float(parts[4])
                data.append({
                    'trial': trial,
                    'target': target,
                    'response': response,
                    'correct': correct,
                    'rt': rt
                })
    return pd.DataFrame(data)

def parse_consonant_file(filepath):
    """Parse consonant identification data."""
    # This will depend on the specific format of consonant files
    # Placeholder for now
    return pd.DataFrame()

## 2. Load and Process All Data

In [3]:
# Load CRM data
crm_files = glob.glob('CI148_crm_*.txt')
crm_data = []

for filepath in crm_files:
    result = parse_crm_file(Path(filepath))
    if result:
        crm_data.append(result)

crm_df = pd.DataFrame(crm_data)

# Load vowel data
vowel_bm = parse_vowel_file('CI148_vow9_BM_0.txt')
vowel_ci = parse_vowel_file('CI148_vow9_CI_0.txt')
vowel_bm['condition'] = 'Bimodal'
vowel_ci['condition'] = 'CI'
vowel_df = pd.concat([vowel_bm, vowel_ci], ignore_index=True)

print("Data Loading Summary:")
print(f"CRM files loaded: {len(crm_df)}")
print(f"Vowel trials: {len(vowel_df)} ({len(vowel_bm)} BM, {len(vowel_ci)} CI)")
print("\nCRM Conditions:")
print(crm_df['condition'].value_counts())

FileNotFoundError: [Errno 2] No such file or directory: 'CI148_vow9_BM_0.txt'

## 3. Primary Analysis: Voice-Gender Release from Masking (VGRM)

In [None]:
# Calculate VGRM for each target gender
def calculate_vgrm(df):
    """Calculate Voice-Gender Release from Masking."""
    vgrm_results = {}
    
    # For male targets
    m_same = df[df['condition'] == 'M-MM']['srt'].values
    m_diff = df[df['condition'] == 'M-FF']['srt'].values
    if len(m_same) > 0 and len(m_diff) > 0:
        vgrm_results['Male_Target'] = {
            'same_gender_srt': m_same.mean(),
            'diff_gender_srt': m_diff.mean(),
            'vgrm': m_same.mean() - m_diff.mean(),
            'n_same': len(m_same),
            'n_diff': len(m_diff)
        }
    
    # For female targets
    f_same = df[df['condition'] == 'F-FF']['srt'].values
    f_diff = df[df['condition'] == 'F-MM']['srt'].values
    if len(f_same) > 0 and len(f_diff) > 0:
        vgrm_results['Female_Target'] = {
            'same_gender_srt': f_same.mean(),
            'diff_gender_srt': f_diff.mean(),
            'vgrm': f_same.mean() - f_diff.mean(),
            'n_same': len(f_same),
            'n_diff': len(f_diff)
        }
    
    # Overall VGRM
    same = df[df['gender_match'] == 'same']['srt'].values
    diff = df[df['gender_match'] == 'different']['srt'].values
    if len(same) > 0 and len(diff) > 0:
        vgrm_results['Overall'] = {
            'same_gender_srt': same.mean(),
            'diff_gender_srt': diff.mean(),
            'vgrm': same.mean() - diff.mean(),
            'n_same': len(same),
            'n_diff': len(diff)
        }
    
    return vgrm_results

vgrm_results = calculate_vgrm(crm_df)

# Display results
print("\n" + "="*60)
print("VOICE-GENDER RELEASE FROM MASKING (VGRM) ANALYSIS")
print("="*60)
for target, results in vgrm_results.items():
    print(f"\n{target}:")
    print(f"  Same-gender SRT: {results['same_gender_srt']:.2f} dB (n={results['n_same']})")
    print(f"  Diff-gender SRT: {results['diff_gender_srt']:.2f} dB (n={results['n_diff']})")
    print(f"  VGRM: {results['vgrm']:.2f} dB")
    print(f"  {'BENEFIT' if results['vgrm'] > 0 else 'NO BENEFIT'} from different-gender maskers")

## 4. Statistical Testing for VGRM

In [None]:
# Statistical tests for VGRM
print("\n" + "="*60)
print("STATISTICAL TESTING FOR VGRM")
print("="*60)

# Test overall gender match effect
same_srt = crm_df[crm_df['gender_match'] == 'same']['srt']
diff_srt = crm_df[crm_df['gender_match'] == 'different']['srt']

if len(same_srt) > 0 and len(diff_srt) > 0:
    # Parametric test
    t_stat, t_pval = ttest_ind(same_srt, diff_srt)
    # Non-parametric test
    u_stat, u_pval = mannwhitneyu(same_srt, diff_srt, alternative='two-sided')
    
    print("\nOverall Same vs Different Gender Maskers:")
    print(f"  Independent t-test: t={t_stat:.3f}, p={t_pval:.4f}")
    print(f"  Mann-Whitney U: U={u_stat:.1f}, p={u_pval:.4f}")
    print(f"  Effect size (Cohen's d): {(same_srt.mean() - diff_srt.mean()) / np.sqrt((same_srt.var() + diff_srt.var()) / 2):.3f}")

# Test for specific contrasts
contrasts = [
    ('M-MM', 'M-FF', 'Male Target VGRM'),
    ('F-FF', 'F-MM', 'Female Target VGRM'),
    ('M-MM', 'F-FF', 'Same-Gender: Male vs Female'),
    ('M-FF', 'F-MM', 'Different-Gender: Male vs Female')
]

for cond1, cond2, label in contrasts:
    data1 = crm_df[crm_df['condition'] == cond1]['srt']
    data2 = crm_df[crm_df['condition'] == cond2]['srt']
    
    if len(data1) > 0 and len(data2) > 0:
        t_stat, t_pval = ttest_ind(data1, data2)
        print(f"\n{label}:")
        print(f"  {cond1}: {data1.mean():.2f} ± {data1.std():.2f} dB")
        print(f"  {cond2}: {data2.mean():.2f} ± {data2.std():.2f} dB")
        print(f"  Difference: {data1.mean() - data2.mean():.2f} dB")
        print(f"  t-test: t={t_stat:.3f}, p={t_pval:.4f}")

## 5. Comprehensive Visualization Suite

In [None]:
# Create comprehensive visualization
fig = plt.figure(figsize=(16, 12))

# 1. SRT by condition
ax1 = plt.subplot(3, 3, 1)
condition_order = ['M-MM', 'M-FF', 'F-FF', 'F-MM']
sns.boxplot(data=crm_df, x='condition', y='srt', order=condition_order, ax=ax1)
ax1.set_title('SRT by Target-Masker Configuration')
ax1.set_ylabel('SRT (dB)')
ax1.set_xlabel('Condition')
ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.5)

# 2. VGRM comparison
ax2 = plt.subplot(3, 3, 2)
vgrm_data = pd.DataFrame([
    {'Target': 'Male', 'Masker': 'Same', 'SRT': crm_df[crm_df['condition'] == 'M-MM']['srt'].mean()},
    {'Target': 'Male', 'Masker': 'Different', 'SRT': crm_df[crm_df['condition'] == 'M-FF']['srt'].mean()},
    {'Target': 'Female', 'Masker': 'Same', 'SRT': crm_df[crm_df['condition'] == 'F-FF']['srt'].mean()},
    {'Target': 'Female', 'Masker': 'Different', 'SRT': crm_df[crm_df['condition'] == 'F-MM']['srt'].mean()}
])
sns.barplot(data=vgrm_data, x='Target', y='SRT', hue='Masker', ax=ax2)
ax2.set_title('VGRM by Target Gender')
ax2.set_ylabel('SRT (dB)')
ax2.legend(title='Masker Gender')

# 3. Variability analysis
ax3 = plt.subplot(3, 3, 3)
sns.barplot(data=crm_df, x='condition', y='sd', order=condition_order, ax=ax3)
ax3.set_title('Response Variability by Condition')
ax3.set_ylabel('Standard Deviation (dB)')
ax3.set_xlabel('Condition')

# 4. Vowel accuracy comparison
ax4 = plt.subplot(3, 3, 4)
vowel_acc = vowel_df.groupby(['condition', 'target'])['correct'].mean().reset_index()
sns.barplot(data=vowel_acc, x='target', y='correct', hue='condition', ax=ax4)
ax4.set_title('Vowel Identification by Target')
ax4.set_ylabel('Proportion Correct')
ax4.set_xlabel('Target Vowel')
ax4.legend(title='Condition')

# 5. Confusion matrix for vowels (Bimodal)
ax5 = plt.subplot(3, 3, 5)
confusion_bm = pd.crosstab(vowel_bm['target'], vowel_bm['response'], normalize='index')
sns.heatmap(confusion_bm, annot=True, fmt='.2f', cmap='YlOrRd', ax=ax5, cbar_kws={'label': 'Proportion'})
ax5.set_title('Vowel Confusion Matrix (Bimodal)')
ax5.set_xlabel('Response')
ax5.set_ylabel('Target')

# 6. Confusion matrix for vowels (CI)
ax6 = plt.subplot(3, 3, 6)
confusion_ci = pd.crosstab(vowel_ci['target'], vowel_ci['response'], normalize='index')
sns.heatmap(confusion_ci, annot=True, fmt='.2f', cmap='YlOrRd', ax=ax6, cbar_kws={'label': 'Proportion'})
ax6.set_title('Vowel Confusion Matrix (CI)')
ax6.set_xlabel('Response')
ax6.set_ylabel('Target')

# 7. Reaction time analysis
ax7 = plt.subplot(3, 3, 7)
sns.violinplot(data=vowel_df, x='condition', y='rt', ax=ax7)
ax7.set_title('Reaction Time Distribution')
ax7.set_ylabel('Reaction Time (s)')
ax7.set_xlabel('Condition')

# 8. Correlation matrix
ax8 = plt.subplot(3, 3, 8)
crm_pivot = crm_df.pivot_table(values='srt', index='file', columns='condition')
if len(crm_pivot.columns) > 1:
    corr = crm_pivot.corr()
    sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0, ax=ax8)
    ax8.set_title('SRT Correlation Between Conditions')

# 9. Overall performance summary
ax9 = plt.subplot(3, 3, 9)
summary_data = pd.DataFrame({
    'Measure': ['CRM Same', 'CRM Diff', 'Vowel BM', 'Vowel CI'],
    'Performance': [
        -crm_df[crm_df['gender_match'] == 'same']['srt'].mean(),
        -crm_df[crm_df['gender_match'] == 'different']['srt'].mean(),
        vowel_bm['correct'].mean() * 100,
        vowel_ci['correct'].mean() * 100
    ],
    'Type': ['CRM', 'CRM', 'Vowel', 'Vowel']
})
sns.barplot(data=summary_data, x='Measure', y='Performance', hue='Type', ax=ax9)
ax9.set_title('Overall Performance Summary')
ax9.set_ylabel('Performance (Higher is Better)')
ax9.legend().remove()

plt.tight_layout()
plt.show()

## 6. Vowel Perception Deep Dive

In [None]:
print("\n" + "="*60)
print("VOWEL PERCEPTION ANALYSIS")
print("="*60)

# Overall accuracy
print("\nOverall Accuracy:")
print(f"  Bimodal: {vowel_bm['correct'].mean():.3f} ({vowel_bm['correct'].sum()}/{len(vowel_bm)})")
print(f"  CI: {vowel_ci['correct'].mean():.3f} ({vowel_ci['correct'].sum()}/{len(vowel_ci)})")

# Statistical comparison
chi2_stat = stats.chi2_contingency(pd.crosstab(vowel_df['condition'], vowel_df['correct']))
print(f"\nChi-square test: χ²={chi2_stat[0]:.3f}, p={chi2_stat[1]:.4f}")

# Per-vowel analysis
print("\nPer-Vowel Accuracy:")
vowel_summary = vowel_df.groupby(['condition', 'target'])['correct'].agg(['mean', 'count'])
vowel_summary['CI_Benefit'] = vowel_summary.loc['CI', 'mean'] - vowel_summary.loc['Bimodal', 'mean']
print(vowel_summary.round(3))

# Identify most confused pairs
print("\nMost Confused Vowel Pairs (Bimodal):")
confusion_pairs_bm = []
for target in range(1, 10):
    target_data = vowel_bm[vowel_bm['target'] == target]
    incorrect = target_data[target_data['correct'] == 0]
    if len(incorrect) > 0:
        most_common_error = incorrect['response'].value_counts().index[0]
        error_rate = incorrect['response'].value_counts().values[0] / len(target_data)
        confusion_pairs_bm.append((target, most_common_error, error_rate))

confusion_pairs_bm.sort(key=lambda x: x[2], reverse=True)
for target, response, rate in confusion_pairs_bm[:5]:
    print(f"  Vowel {target} → {response}: {rate:.3f}")

print("\nMost Confused Vowel Pairs (CI):")
confusion_pairs_ci = []
for target in range(1, 10):
    target_data = vowel_ci[vowel_ci['target'] == target]
    incorrect = target_data[target_data['correct'] == 0]
    if len(incorrect) > 0:
        most_common_error = incorrect['response'].value_counts().index[0]
        error_rate = incorrect['response'].value_counts().values[0] / len(target_data)
        confusion_pairs_ci.append((target, most_common_error, error_rate))

confusion_pairs_ci.sort(key=lambda x: x[2], reverse=True)
for target, response, rate in confusion_pairs_ci[:5]:
    print(f"  Vowel {target} → {response}: {rate:.3f}")

## 7. Reaction Time Analysis

In [None]:
print("\n" + "="*60)
print("REACTION TIME ANALYSIS")
print("="*60)

# Overall RT statistics
print("\nOverall Reaction Times:")
print(f"  Bimodal: {vowel_bm['rt'].mean():.3f} ± {vowel_bm['rt'].std():.3f} s")
print(f"  CI: {vowel_ci['rt'].mean():.3f} ± {vowel_ci['rt'].std():.3f} s")

t_stat, p_val = ttest_ind(vowel_bm['rt'], vowel_ci['rt'])
print(f"  t-test: t={t_stat:.3f}, p={p_val:.4f}")

# RT by accuracy
print("\nRT by Accuracy:")
for condition in ['Bimodal', 'CI']:
    data = vowel_df[vowel_df['condition'] == condition]
    correct_rt = data[data['correct'] == 1]['rt'].mean()
    incorrect_rt = data[data['correct'] == 0]['rt'].mean()
    print(f"  {condition}:")
    print(f"    Correct: {correct_rt:.3f} s")
    print(f"    Incorrect: {incorrect_rt:.3f} s")
    print(f"    Difference: {incorrect_rt - correct_rt:.3f} s")

# RT correlations
print("\nRT-Accuracy Correlations:")
for condition in ['Bimodal', 'CI']:
    data = vowel_df[vowel_df['condition'] == condition]
    per_vowel = data.groupby('target').agg({'rt': 'mean', 'correct': 'mean'})
    if len(per_vowel) > 2:
        r, p = pearsonr(per_vowel['rt'], per_vowel['correct'])
        print(f"  {condition}: r={r:.3f}, p={p:.4f}")

## 8. Advanced Exploratory Analyses

In [None]:
print("\n" + "="*60)
print("ADVANCED EXPLORATORY ANALYSES")
print("="*60)

# 1. Learning/Adaptation Effects
print("\n1. LEARNING/ADAPTATION EFFECTS:")
# Split vowel trials into blocks
n_blocks = 4
for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    block_size = len(condition_data) // n_blocks
    condition_data['block'] = condition_data.index // block_size + 1
    block_acc = condition_data.groupby('block')['correct'].mean()
    print(f"\n{condition_name} - Accuracy by block:")
    for block, acc in block_acc.items():
        if block <= n_blocks:
            print(f"  Block {block}: {acc:.3f}")
    
    # Test for linear trend
    if len(block_acc) > 1:
        slope, intercept, r_value, p_value, std_err = stats.linregress(range(1, len(block_acc)+1), block_acc.values)
        print(f"  Linear trend: slope={slope:.4f}, r={r_value:.3f}, p={p_value:.4f}")

# 2. Response Bias Analysis
print("\n2. RESPONSE BIAS ANALYSIS:")
for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    response_freq = condition_data['response'].value_counts()
    target_freq = condition_data['target'].value_counts()
    bias = response_freq - target_freq
    print(f"\n{condition_name} - Response bias (response freq - target freq):")
    print(bias.sort_values(ascending=False).head().to_string())

# 3. Speed-Accuracy Trade-off
print("\n3. SPEED-ACCURACY TRADE-OFF:")
for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    # Median split on RT
    median_rt = condition_data['rt'].median()
    fast_trials = condition_data[condition_data['rt'] < median_rt]
    slow_trials = condition_data[condition_data['rt'] >= median_rt]
    print(f"\n{condition_name}:")
    print(f"  Fast trials: {fast_trials['correct'].mean():.3f} accuracy, {fast_trials['rt'].mean():.3f}s RT")
    print(f"  Slow trials: {slow_trials['correct'].mean():.3f} accuracy, {slow_trials['rt'].mean():.3f}s RT")
    print(f"  Accuracy difference: {slow_trials['correct'].mean() - fast_trials['correct'].mean():.3f}")

# 4. Spectral Distance Analysis (simplified vowel space)
print("\n4. SPECTRAL DISTANCE EFFECTS:")
# Define approximate vowel positions (simplified 2D space)
vowel_positions = {
    1: (0, 0), 2: (1, 0), 3: (2, 0),  # Front vowels
    4: (0, 1), 5: (1, 1), 6: (2, 1),  # Central vowels
    7: (0, 2), 8: (1, 2), 9: (2, 2)   # Back vowels
}

for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    errors = condition_data[condition_data['correct'] == 0]
    if len(errors) > 0:
        distances = []
        for _, row in errors.iterrows():
            if row['target'] in vowel_positions and row['response'] in vowel_positions:
                pos1 = vowel_positions[row['target']]
                pos2 = vowel_positions[row['response']]
                dist = np.sqrt((pos1[0]-pos2[0])**2 + (pos1[1]-pos2[1])**2)
                distances.append(dist)
        if distances:
            print(f"\n{condition_name} - Error distances in vowel space:")
            print(f"  Mean distance: {np.mean(distances):.3f}")
            print(f"  Median distance: {np.median(distances):.3f}")

# 5. Information Transfer Analysis
print("\n5. INFORMATION TRANSFER:")
from scipy.stats import entropy

for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    # Calculate mutual information
    confusion = pd.crosstab(condition_data['target'], condition_data['response'], normalize='index')
    
    # Marginal probabilities
    p_target = condition_data['target'].value_counts(normalize=True)
    p_response = condition_data['response'].value_counts(normalize=True)
    
    # Information measures
    h_target = entropy(p_target)
    h_response = entropy(p_response)
    
    # Percent information transmitted
    percent_correct = condition_data['correct'].mean()
    print(f"\n{condition_name}:")
    print(f"  Target entropy: {h_target:.3f} bits")
    print(f"  Response entropy: {h_response:.3f} bits")
    print(f"  Percent correct: {percent_correct:.3f}")
    print(f"  Estimated information transfer: {percent_correct * 100:.1f}%")

## 9. Predictive Modeling of Performance

In [None]:
print("\n" + "="*60)
print("PREDICTIVE MODELING")
print("="*60)

# Feature engineering for vowel data
for condition_data in [vowel_bm, vowel_ci]:
    # Add trial number
    condition_data['trial_num'] = condition_data.index + 1
    # Add normalized trial number
    condition_data['trial_norm'] = condition_data['trial_num'] / len(condition_data)
    # Add RT z-score
    condition_data['rt_zscore'] = (condition_data['rt'] - condition_data['rt'].mean()) / condition_data['rt'].std()

# Logistic regression for accuracy prediction
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

print("\nLogistic Regression - Predicting Vowel Accuracy:")
for condition_name, condition_data in [('Bimodal', vowel_bm), ('CI', vowel_ci)]:
    # Features
    features = ['target', 'trial_norm', 'rt_zscore']
    X = condition_data[features].fillna(0)
    y = condition_data['correct']
    
    # Model
    model = LogisticRegression(random_state=42)
    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
    
    print(f"\n{condition_name}:")
    print(f"  Cross-validated accuracy: {scores.mean():.3f} ± {scores.std():.3f}")
    
    # Feature importance
    model.fit(X, y)
    print("  Feature importance (coefficients):")
    for feat, coef in zip(features, model.coef_[0]):
        print(f"    {feat}: {coef:.3f}")

## 10. Comprehensive Summary and Clinical Implications

In [None]:
print("\n" + "="*60)
print("COMPREHENSIVE SUMMARY")
print("="*60)

# Key findings summary
summary = {}

# VGRM
if 'Overall' in vgrm_results:
    summary['VGRM_benefit'] = vgrm_results['Overall']['vgrm']
    summary['VGRM_interpretation'] = 'PRESENT' if vgrm_results['Overall']['vgrm'] > 0 else 'ABSENT'

# Vowel performance
summary['vowel_bm_accuracy'] = vowel_bm['correct'].mean()
summary['vowel_ci_accuracy'] = vowel_ci['correct'].mean()
summary['vowel_ci_benefit'] = vowel_ci['correct'].mean() - vowel_bm['correct'].mean()

# RT
summary['rt_bm'] = vowel_bm['rt'].mean()
summary['rt_ci'] = vowel_ci['rt'].mean()
summary['rt_difference'] = vowel_ci['rt'].mean() - vowel_bm['rt'].mean()

print("\nKEY FINDINGS:")
print(f"1. Voice-Gender Release from Masking (VGRM):")
if 'VGRM_benefit' in summary:
    print(f"   - Overall VGRM: {summary['VGRM_benefit']:.2f} dB")
    print(f"   - Interpretation: {summary['VGRM_interpretation']}")
    print(f"   - Clinical implication: {'Patient may benefit from acoustic pitch cues' if summary['VGRM_benefit'] > 0 else 'Limited use of acoustic pitch cues'}")

print(f"\n2. Vowel Perception:")
print(f"   - Bimodal accuracy: {summary['vowel_bm_accuracy']:.3f}")
print(f"   - CI accuracy: {summary['vowel_ci_accuracy']:.3f}")
print(f"   - CI benefit/deficit: {summary['vowel_ci_benefit']:.3f}")
print(f"   - Clinical implication: {'CI provides clearer vowel perception' if summary['vowel_ci_benefit'] > 0 else 'Bimodal provides better vowel perception'}")

print(f"\n3. Processing Speed:")
print(f"   - Bimodal RT: {summary['rt_bm']:.3f} s")
print(f"   - CI RT: {summary['rt_ci']:.3f} s")
print(f"   - Difference: {summary['rt_difference']:.3f} s")
print(f"   - Clinical implication: {'CI requires more processing time' if summary['rt_difference'] > 0 else 'Bimodal requires more processing time'}")

# Generate clinical recommendations
print("\nCLINICAL RECOMMENDATIONS:")
recommendations = []

if 'VGRM_benefit' in summary and summary['VGRM_benefit'] > 2:
    recommendations.append("- Strong VGRM benefit suggests good use of acoustic pitch cues")
    recommendations.append("- Consider preserving acoustic hearing if possible")
elif 'VGRM_benefit' in summary and summary['VGRM_benefit'] < 0.5:
    recommendations.append("- Minimal VGRM suggests limited acoustic pitch processing")
    recommendations.append("- Bilateral CI may not result in significant pitch-based segregation loss")

if summary['vowel_ci_benefit'] > 0.1:
    recommendations.append("- CI provides clearer vowel perception than bimodal")
elif summary['vowel_ci_benefit'] < -0.1:
    recommendations.append("- Bimodal configuration provides better vowel perception")

for rec in recommendations:
    print(rec)

# Export summary
summary_df = pd.DataFrame([summary])
summary_df.to_csv('CI148_comprehensive_summary.csv', index=False)
print("\nSummary exported to: CI148_comprehensive_summary.csv")

## 11. Additional Exploratory Analyses Suggestions

Based on the data, here are additional analyses that could provide valuable insights:

### Suggested Analyses:

1. **Temporal Dynamics of Fusion**
   - Analyze if confusion patterns change over the course of the experiment
   - Look for evidence of perceptual learning or fatigue

2. **Cross-Modal Correlations**
   - Correlate CRM performance with vowel accuracy
   - Test if good speech-in-noise ability predicts vowel identification

3. **Error Pattern Clustering**
   - Use clustering algorithms to identify systematic error patterns
   - Group vowels by confusion patterns

4. **Asymmetric Confusion Analysis**
   - Test if confusion is bidirectional (A→B same as B→A)
   - Identify one-way confusion patterns

5. **Psychometric Function Fitting**
   - Fit sigmoid functions to CRM performance vs SNR
   - Extract slope parameters as measure of perceptual precision

6. **Sequential Dependencies**
   - Analyze if previous trial affects current trial performance
   - Look for priming or contrast effects

7. **Individual Difference Markers**
   - Create composite scores for different abilities
   - Develop a perceptual profile

8. **Signal Detection Theory Analysis**
   - Calculate d' and criterion for each vowel
   - Separate sensitivity from response bias

9. **Spectral Weighting Analysis**
   - Infer which acoustic features drive confusions
   - Model feature weights from error patterns

10. **Bimodal Integration Efficiency**
    - Calculate theoretical optimal integration
    - Compare to observed performance

Would you like me to implement any of these additional analyses?