# Topological Feature Comparison Pipeline

This notebook compares aggregated topological features between medOn and medOff states for Parkinson's disease patients.

## Analysis Goals
- Compare medOn vs medOff states using topological features
- Examine differences between dominant and non-dominant hemispheres
- Identify which features (H0, H1, H2, H3) are most discriminative
- Analyze hold vs resting state differences

## 1. Import Libraries

In [1]:
# Standard libraries
import pickle
import numpy as np
import pandas as pd
from pathlib import Path
import os

# Statistical analysis
from scipy import stats
from scipy.stats import ttest_rel, wilcoxon, mannwhitneyu

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.precision', 4)

print("✓ Libraries imported successfully")

✓ Libraries imported successfully


## 2. Define Patient IDs and Data Paths

In [2]:
# Base directory
BASE_DIR = Path('.')

# Patient IDs organized by hold type
# holdL: Subject raised LEFT arm (RIGHT hemisphere is dominant)
# holdR: Subject raised RIGHT arm (LEFT hemisphere is dominant)
PATIENTS_HOLD_L = ['0cGdk9', '2IU8mi', 'AB2PeX', 'AbzsOg', 'FYbcap', 
                   'PuPVlx', 'QZTsn6', 'dCsWjQ', 'gNX5yb', 'i4oK0F']

PATIENTS_HOLD_R = ['2IhVOz', 'BYJoWR', 'VopvKx', 'jyC0j3']

# All patients
ALL_PATIENTS = PATIENTS_HOLD_L + PATIENTS_HOLD_R

print(f"Total patients: {len(ALL_PATIENTS)}")
print(f"  - holdL (left arm raised): {len(PATIENTS_HOLD_L)}")
print(f"  - holdR (right arm raised): {len(PATIENTS_HOLD_R)}")

Total patients: 14
  - holdL (left arm raised): 10
  - holdR (right arm raised): 4


## 3. Load Aggregated Features - Single Patient Example

First, let's load one patient's data to understand the structure.

In [3]:
# Load aggregated features for one patient (example: i4oK0F)
patient_id = 'i4oK0F'
patient_dir = BASE_DIR / patient_id

# Find aggregated features file
agg_files = list(patient_dir.glob('aggregated_features*.pkl'))
if agg_files:
    agg_file = agg_files[0]
    print(f"Loading: {agg_file}")
    
    with open(agg_file, 'rb') as f:
        patient_data = pickle.load(f)
    
    # Check structure
    print(f"\nTop-level keys: {list(patient_data.keys())}")
    
    if '_metadata' in patient_data:
        print(f"\nMetadata: {patient_data['_metadata']}")
    
    if 'medOn' in patient_data:
        print(f"\nmedOn keys: {list(patient_data['medOn'].keys())}")
        
        # Show example feature keys
        if 'left_hold' in patient_data['medOn']:
            print(f"\nleft_hold feature keys (sample):")
            feature_keys = list(patient_data['medOn']['left_hold'].keys())
            print(feature_keys[:10])  # Show first 10
    
    print("\n✓ Data loaded successfully")
else:
    print(f"No aggregated features found for {patient_id}")

Loading: i4oK0F/aggregated_features_holdL.pkl

Top-level keys: ['medOn', 'medOff', '_metadata']

Metadata: {'hold_type': 'holdL', 'hold_suffix': '_holdL'}

medOn keys: ['left_hold', 'left_resting', 'right_hold', 'right_resting', 'dominant_hold', 'nondominant_hold', 'dominant_resting', 'nondominant_resting']

left_hold feature keys (sample):
['hemisphere', 'condition', 'persistence_entropy_mean', 'persistence_entropy_std', 'h0_feature_count_mean', 'h0_feature_count_std', 'h0_avg_lifespan_mean', 'h0_avg_lifespan_std', 'h0_max_lifespan_mean', 'h0_max_lifespan_std']

✓ Data loaded successfully


## 4. Extract Features - medOn State

In [4]:
# Extract medOn features using recommended dominant/nondominant mapping
if 'medOn' in patient_data:
    # Dominant hemisphere (active during hold task)
    medOn_dominant_hold = patient_data['medOn'].get('dominant_hold', {})
    medOn_dominant_resting = patient_data['medOn'].get('dominant_resting', {})
    
    # Non-dominant hemisphere
    medOn_nondominant_hold = patient_data['medOn'].get('nondominant_hold', {})
    medOn_nondominant_resting = patient_data['medOn'].get('nondominant_resting', {})
    
    # Original left/right naming (for reference)
    medOn_left_hold = patient_data['medOn'].get('left_hold', {})
    medOn_left_resting = patient_data['medOn'].get('left_resting', {})
    medOn_right_hold = patient_data['medOn'].get('right_hold', {})
    medOn_right_resting = patient_data['medOn'].get('right_resting', {})
    
    print("✓ medOn features extracted")
    print(f"  Dominant hold features: {len(medOn_dominant_hold)}")
    print(f"  Non-dominant hold features: {len(medOn_nondominant_hold)}")

✓ medOn features extracted
  Dominant hold features: 59
  Non-dominant hold features: 59


## 5. Extract Features - medOff State

In [5]:
# Extract medOff features using recommended dominant/nondominant mapping
if 'medOff' in patient_data:
    # Dominant hemisphere (active during hold task)
    medOff_dominant_hold = patient_data['medOff'].get('dominant_hold', {})
    medOff_dominant_resting = patient_data['medOff'].get('dominant_resting', {})
    
    # Non-dominant hemisphere
    medOff_nondominant_hold = patient_data['medOff'].get('nondominant_hold', {})
    medOff_nondominant_resting = patient_data['medOff'].get('nondominant_resting', {})
    
    # Original left/right naming (for reference)
    medOff_left_hold = patient_data['medOff'].get('left_hold', {})
    medOff_left_resting = patient_data['medOff'].get('left_resting', {})
    medOff_right_hold = patient_data['medOff'].get('right_hold', {})
    medOff_right_resting = patient_data['medOff'].get('right_resting', {})
    
    print("✓ medOff features extracted")
    print(f"  Dominant hold features: {len(medOff_dominant_hold)}")
    print(f"  Non-dominant hold features: {len(medOff_nondominant_hold)}")

✓ medOff features extracted
  Dominant hold features: 59
  Non-dominant hold features: 59


## 6. Helper Functions for Safe Formatting

In [8]:
def safe_format(value, format_spec='.4f', default='N/A'):
    """
    Safely format a value, returning default if value is None or not numeric.
    
    Args:
        value: Value to format
        format_spec: Format specification (e.g., '.4f', '.2f')
        default: Default value if formatting fails
    
    Returns:
        str: Formatted string
    """
    if value is None or value == 'N/A':
        return default
    try:
        # Use format() with the spec string
        return format(value, format_spec)
    except (ValueError, TypeError):
        return default

def safe_diff(val1, val2, format_spec='.4f', default='N/A'):
    """
    Safely compute and format difference between two values.
    
    Args:
        val1: First value
        val2: Second value
        format_spec: Format specification
        default: Default value if computation fails
    
    Returns:
        str: Formatted difference
    """
    if val1 is None or val2 is None or val1 == 'N/A' or val2 == 'N/A':
        return default
    try:
        diff = val1 - val2
        return format(diff, format_spec)
    except (ValueError, TypeError):
        return default

print("✓ Helper functions defined")

✓ Helper functions defined


In [9]:
# Compare persistence entropy between medOn and medOff
if medOn_dominant_hold and medOff_dominant_hold:
    print(f"Patient: {patient_id}")
    print(f"Hold type: {patient_data.get('_metadata', {}).get('hold_type', 'unknown')}\n")
    
    # Persistence entropy comparison
    print("Persistence Entropy (Mean):")
    medOn_pe = medOn_dominant_hold.get('persistence_entropy_mean')
    medOff_pe = medOff_dominant_hold.get('persistence_entropy_mean')
    print(f"  medOn  dominant hold:  {safe_format(medOn_pe, '.4f')}")
    print(f"  medOff dominant hold:  {safe_format(medOff_pe, '.4f')}")
    print(f"  Difference: {safe_diff(medOn_pe, medOff_pe, '.4f')}\n")
    
    # Feature count comparison (H1 dimension)
    print("H1 Feature Count (Mean):")
    medOn_h1 = medOn_dominant_hold.get('h1_feature_count_mean')
    medOff_h1 = medOff_dominant_hold.get('h1_feature_count_mean')
    print(f"  medOn  dominant hold:  {safe_format(medOn_h1, '.2f')}")
    print(f"  medOff dominant hold:  {safe_format(medOff_h1, '.2f')}")
    print(f"  Difference: {safe_diff(medOn_h1, medOff_h1, '.2f')}\n")
    
    # Lifespan comparison (H1 dimension)
    print("H1 Average Lifespan (Mean):")
    medOn_h1_life = medOn_dominant_hold.get('h1_avg_lifespan_mean')
    medOff_h1_life = medOff_dominant_hold.get('h1_avg_lifespan_mean')
    print(f"  medOn  dominant hold:  {safe_format(medOn_h1_life, '.4f')}")
    print(f"  medOff dominant hold:  {safe_format(medOff_h1_life, '.4f')}")
    print(f"  Difference: {safe_diff(medOn_h1_life, medOff_h1_life, '.4f')}")
    
    # Show all available H1 features
    h1_features = [k for k in medOn_dominant_hold.keys() if 'h1_' in k.lower()]
    if h1_features:
        print(f"\nAvailable H1 features: {h1_features}")

Patient: i4oK0F
Hold type: holdL

Persistence Entropy (Mean):
  medOn  dominant hold:  6.4704
  medOff dominant hold:  6.4688
  Difference: 0.0016

H1 Feature Count (Mean):
  medOn  dominant hold:  219.40
  medOff dominant hold:  203.40
  Difference: 16.00

H1 Average Lifespan (Mean):
  medOn  dominant hold:  0.2259
  medOff dominant hold:  0.1816
  Difference: 0.0442

Available H1 features: ['h1_feature_count_mean', 'h1_feature_count_std', 'h1_avg_lifespan_mean', 'h1_avg_lifespan_std', 'h1_max_lifespan_mean', 'h1_max_lifespan_std', 'h1_std_lifespan_mean', 'h1_std_lifespan_std', 'h1_avg_birth_mean', 'h1_avg_birth_std', 'h1_avg_death_mean', 'h1_avg_death_std']


## 7. Compare Key Features - Single Patient

In [None]:
## 8. Load All Patients - Batch Loading Function

## 7. Load All Patients - Batch Loading Function

In [None]:
## 9. Load All Available Patients

## 8. Load All Available Patients

In [None]:
## 10. Create Comparison DataFrame - Dominant Hemisphere Hold Task

# Extract features for dominant hemisphere during hold task
comparison_data_dominant_hold = []

for patient_id, data in all_patient_data.items():
    hold_type = data.get('_metadata', {}).get('hold_type', 'unknown')
    
    # medOn features
    if 'medOn' in data and 'dominant_hold' in data['medOn']:
        medOn_features = data['medOn']['dominant_hold']
        row = {
            'patient_id': patient_id,
            'hold_type': hold_type,
            'medication_state': 'medOn',
            'hemisphere': 'dominant',
            'condition': 'hold',
            **medOn_features
        }
        comparison_data_dominant_hold.append(row)
    
    # medOff features
    if 'medOff' in data and 'dominant_hold' in data['medOff']:
        medOff_features = data['medOff']['dominant_hold']
        row = {
            'patient_id': patient_id,
            'hold_type': hold_type,
            'medication_state': 'medOff',
            'hemisphere': 'dominant',
            'condition': 'hold',
            **medOff_features
        }
        comparison_data_dominant_hold.append(row)

df_dominant_hold = pd.DataFrame(comparison_data_dominant_hold)

print(f"✓ Created comparison dataframe: {df_dominant_hold.shape}")
print(f"\nColumns (first 15): {df_dominant_hold.columns.tolist()[:15]}")

# Display first few rows with available columns
display_cols = ['patient_id', 'hold_type', 'medication_state', 'hemisphere', 'persistence_entropy_mean']
# Add h1 feature count if available
if 'h1_feature_count_mean' in df_dominant_hold.columns:
    display_cols.append('h1_feature_count_mean')

print(f"\nFirst few rows:")
display(df_dominant_hold[display_cols].head(10))

In [None]:
## 11. Create Comparison DataFrame - Non-Dominant Hemisphere Hold Task

# Extract features for non-dominant hemisphere during hold task
comparison_data_nondominant_hold = []

for patient_id, data in all_patient_data.items():
    hold_type = data.get('_metadata', {}).get('hold_type', 'unknown')
    
    # medOn features
    if 'medOn' in data and 'nondominant_hold' in data['medOn']:
        medOn_features = data['medOn']['nondominant_hold']
        row = {
            'patient_id': patient_id,
            'hold_type': hold_type,
            'medication_state': 'medOn',
            'hemisphere': 'nondominant',
            'condition': 'hold',
            **medOn_features
        }
        comparison_data_nondominant_hold.append(row)
    
    # medOff features
    if 'medOff' in data and 'nondominant_hold' in data['medOff']:
        medOff_features = data['medOff']['nondominant_hold']
        row = {
            'patient_id': patient_id,
            'hold_type': hold_type,
            'medication_state': 'medOff',
            'hemisphere': 'nondominant',
            'condition': 'hold',
            **medOff_features
        }
        comparison_data_nondominant_hold.append(row)

df_nondominant_hold = pd.DataFrame(comparison_data_nondominant_hold)

print(f"✓ Created comparison dataframe: {df_nondominant_hold.shape}")

# Display first few rows with available columns
display_cols = ['patient_id', 'hold_type', 'medication_state', 'hemisphere', 'persistence_entropy_mean']
# Add h1 feature count if available
if 'h1_feature_count_mean' in df_nondominant_hold.columns:
    display_cols.append('h1_feature_count_mean')

print(f"\nFirst few rows:")
display(df_nondominant_hold[display_cols].head(10))

In [None]:
## 12. Create Combined DataFrame - All Conditions

## 11. Create Combined DataFrame - All Conditions

In [None]:
## 13. Quick Statistical Comparison - Persistence Entropy

## 12. Quick Statistical Comparison - Persistence Entropy

In [None]:
## 14. Summary Statistics Table

# Create summary statistics for key features
if not df_all_hold.empty:
    # Try both naming conventions (lowercase with underscores from aggregate_features.py)
    key_features = [
        'persistence_entropy_mean',
        'h0_feature_count_mean', 'h1_feature_count_mean', 'h2_feature_count_mean', 'h3_feature_count_mean',
        'h1_avg_lifespan_mean', 'h2_avg_lifespan_mean',
        'h1_max_lifespan_mean', 'h2_max_lifespan_mean'
    ]
    
    # Filter to features that exist
    available_features = [f for f in key_features if f in df_all_hold.columns]
    
    if available_features:
        summary = df_all_hold.groupby(['medication_state', 'hemisphere'])[available_features].agg(['mean', 'std', 'count'])
        print("\nSummary Statistics by Medication State and Hemisphere")
        print("="*80)
        display(summary)
        
        print(f"\n✓ Showing statistics for {len(available_features)} features")
    else:
        print("⚠ No key features available in dataframe")
        print(f"Available columns: {df_all_hold.columns.tolist()[:20]}")
else:
    print("⚠ No data available for summary")

In [None]:
## 15. Save Processed DataFrames

## 14. Save Processed DataFrames

## 26. Summary: Available Comparisons

This notebook provides multiple comparison pipelines:

### 1. **Multi-Patient Analysis** (Sections 8-15)
- Loads all available patient data
- Creates comprehensive DataFrames for all patients
- Compares features across the entire cohort
- Exports aggregated comparison tables

### 2. **Two-Patient Direct Comparison** (Sections 17-21)
- Compare any two patients side-by-side
- Includes both dominant and non-dominant hemispheres
- Covers medOn and medOff states
- Highlights largest feature differences

### 3. **Single Patient Medication State Comparison** (Sections 23-24)
- Compare medOn vs medOff within a single patient
- Includes both hemispheres
- Visualizes differences between medication states

### 4. **Custom Flexible Comparison** (Section 25)
- Fully customizable parameters:
  - Any two patients
  - Any medication state (medOn/medOff)
  - Any hemisphere (dominant/nondominant/left/right)
  - Any condition (hold/resting)
  - Custom feature selection

### Key Variables to Modify:
- `patient1`, `patient2`: Set patient IDs for comparison
- `selected_patient`: Choose patient for medOn vs medOff analysis
- `custom_patient1`, `custom_patient2`: Custom comparison patient IDs
- `custom_med_state`, `custom_hemisphere`, `custom_condition`: Custom comparison parameters

### Available Features:
- **Persistence Entropy**: Overall topological complexity
- **H0, H1, H2, H3 Feature Counts**: Number of topological features per dimension
- **Average Lifespans**: Mean persistence of features
- **Max Lifespans**: Longest-living features
- **Standard Deviations**: Variability measures for all features

### Next Steps:
1. Run statistical tests (t-tests, Wilcoxon) on feature differences
2. Apply dimensionality reduction (PCA) for multi-feature analysis
3. Create correlation matrices between features
4. Implement distance-based comparisons using Wasserstein/Bottleneck metrics
5. Apply analysis pathways from ANALYSIS_METHODOLOGY.md

In [10]:
# Custom comparison - modify these parameters as needed
custom_patient1 = 'i4oK0F'
custom_patient2 = '0cGdk9'
custom_med_state = 'medOn'      # 'medOn' or 'medOff'
custom_hemisphere = 'dominant'   # 'dominant', 'nondominant', 'left', or 'right'
custom_condition = 'hold'        # 'hold' or 'resting'

# Custom feature list (optional - set to None for all features)
custom_features = [
    'persistence_entropy_mean',
    'h0_feature_count_mean',
    'h1_feature_count_mean',
    'h2_feature_count_mean',
    'h3_feature_count_mean',
    'h1_avg_lifespan_mean',
    'h1_max_lifespan_mean'
]

print(f"Custom Comparison: {custom_patient1} vs {custom_patient2}")
print(f"Parameters: {custom_med_state} | {custom_hemisphere} hemisphere | {custom_condition} task")
print("="*80)

custom_comp = compare_two_patients(
    custom_patient1, 
    custom_patient2,
    med_state=custom_med_state,
    hemisphere=custom_hemisphere,
    condition=custom_condition,
    feature_list=custom_features
)

if custom_comp is not None:
    display(custom_comp)
else:
    print("Comparison failed - check that both patients have data for the specified parameters")

Custom Comparison: i4oK0F vs 0cGdk9
Parameters: medOn | dominant hemisphere | hold task


NameError: name 'compare_two_patients' is not defined

## 25. Custom Flexible Comparison

Use this cell to create custom comparisons with different parameters.

In [None]:
# Visualize medOn vs medOff for single patient
if not df_single_patient.empty:
    features_to_plot = ['persistence_entropy', 'h1_count', 'h2_count', 'h1_avg_lifespan']
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    axes = axes.flatten()
    
    for idx, feature in enumerate(features_to_plot):
        ax = axes[idx]
        
        # Prepare data for grouped bar chart
        dominant_medOn = df_single_patient[(df_single_patient['hemisphere'] == 'dominant') & 
                                           (df_single_patient['med_state'] == 'medOn')][feature].values
        dominant_medOff = df_single_patient[(df_single_patient['hemisphere'] == 'dominant') & 
                                            (df_single_patient['med_state'] == 'medOff')][feature].values
        nondominant_medOn = df_single_patient[(df_single_patient['hemisphere'] == 'nondominant') & 
                                              (df_single_patient['med_state'] == 'medOn')][feature].values
        nondominant_medOff = df_single_patient[(df_single_patient['hemisphere'] == 'nondominant') & 
                                               (df_single_patient['med_state'] == 'medOff')][feature].values
        
        x = np.arange(2)  # Dominant, Nondominant
        width = 0.35
        
        # Plot bars
        if len(dominant_medOn) > 0 and len(dominant_medOff) > 0:
            ax.bar(x - width/2, [dominant_medOn[0], nondominant_medOn[0] if len(nondominant_medOn) > 0 else 0], 
                   width, label='medOn', color='#2ecc71', alpha=0.8)
            ax.bar(x + width/2, [dominant_medOff[0], nondominant_medOff[0] if len(nondominant_medOff) > 0 else 0], 
                   width, label='medOff', color='#e74c3c', alpha=0.8)
        
        ax.set_xlabel('Hemisphere')
        ax.set_ylabel('Value')
        ax.set_title(feature.replace('_', ' ').title(), fontweight='bold')
        ax.set_xticks(x)
        ax.set_xticklabels(['Dominant', 'Non-Dominant'])
        ax.legend()
        ax.grid(axis='y', alpha=0.3)
    
    plt.suptitle(f'{selected_patient} - medOn vs medOff Comparison (Hold Task)', 
                 fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
else:
    print("No data available for visualization")

## 24. Visualize Single Patient: medOn vs medOff

In [None]:
# Compare medOn vs medOff for a single patient across both hemispheres
selected_patient = patient1  # Change to patient1 or patient2

if selected_patient in all_patient_data:
    print(f"Patient: {selected_patient}")
    print(f"Hold type: {all_patient_data[selected_patient].get('_metadata', {}).get('hold_type', 'unknown')}")
    print("="*80)
    
    # Create comparison dataframe
    comparison_data = []
    
    for hemisphere in ['dominant', 'nondominant']:
        for med_state in ['medOn', 'medOff']:
            features = all_patient_data[selected_patient].get(med_state, {}).get(f'{hemisphere}_hold', {})
            
            if features:
                row = {
                    'hemisphere': hemisphere,
                    'med_state': med_state,
                    'persistence_entropy': features.get('persistence_entropy_mean'),
                    'h1_count': features.get('h1_feature_count_mean'),
                    'h2_count': features.get('h2_feature_count_mean'),
                    'h1_avg_lifespan': features.get('h1_avg_lifespan_mean'),
                    'h2_avg_lifespan': features.get('h2_avg_lifespan_mean')
                }
                comparison_data.append(row)
    
    df_single_patient = pd.DataFrame(comparison_data)
    
    if not df_single_patient.empty:
        print(f"\n{selected_patient} - Feature Comparison Table:")
        display(df_single_patient)
        
        # Calculate differences (medOn - medOff)
        print(f"\nDifferences (medOn - medOff):")
        for hemisphere in ['dominant', 'nondominant']:
            print(f"\n{hemisphere.upper()} Hemisphere:")
            medOn_data = df_single_patient[(df_single_patient['hemisphere'] == hemisphere) & 
                                           (df_single_patient['med_state'] == 'medOn')]
            medOff_data = df_single_patient[(df_single_patient['hemisphere'] == hemisphere) & 
                                            (df_single_patient['med_state'] == 'medOff')]
            
            if not medOn_data.empty and not medOff_data.empty:
                for col in ['persistence_entropy', 'h1_count', 'h2_count', 'h1_avg_lifespan']:
                    medOn_val = medOn_data[col].values[0]
                    medOff_val = medOff_data[col].values[0]
                    if medOn_val is not None and medOff_val is not None:
                        diff = medOn_val - medOff_val
                        print(f"  {col}: {safe_format(diff, '.4f')}")
    else:
        print(f"No data available for {selected_patient}")
else:
    print(f"Patient {selected_patient} not loaded")

## 23. Single Patient: medOn vs medOff Comparison (Both Hemispheres)

In [None]:
# Visualize comparison for dominant hemisphere, medOn
if comp_dominant_medOn is not None and not comp_dominant_medOn.empty:
    # Select features for visualization
    viz_features = ['persistence_entropy_mean', 'h1_feature_count_mean', 
                    'h2_feature_count_mean', 'h1_avg_lifespan_mean']
    
    df_viz = comp_dominant_medOn[comp_dominant_medOn['feature'].isin(viz_features)].copy()
    
    if not df_viz.empty:
        # Create subplots for each feature
        n_features = len(df_viz)
        fig, axes = plt.subplots(2, 2, figsize=(12, 8))
        axes = axes.flatten()
        
        for idx, (_, row) in enumerate(df_viz.iterrows()):
            if idx < len(axes):
                ax = axes[idx]
                feature_name = row['feature']
                values = [row[patient1], row[patient2]]
                colors = ['#3498db', '#e74c3c']
                
                ax.bar([patient1, patient2], values, color=colors, alpha=0.7)
                ax.set_title(feature_name, fontsize=10, fontweight='bold')
                ax.set_ylabel('Value')
                ax.tick_params(axis='x', rotation=45)
                ax.grid(axis='y', alpha=0.3)
        
        # Hide unused subplots
        for idx in range(n_features, len(axes)):
            axes[idx].set_visible(False)
        
        plt.suptitle(f'Feature Comparison: {patient1} vs {patient2}\nDominant Hemisphere, medOn, Hold Task',
                     fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
    else:
        print("No features available for visualization")

## 22. Visualize Two-Patient Comparison

In [None]:
# Compare dominant hemisphere features during hold task (medOff)
print(f"Comparing {patient1} vs {patient2} - Dominant Hemisphere, medOff, Hold Task")
print("="*80)

comp_dominant_medOff = compare_two_patients(
    patient1, patient2, 
    med_state='medOff', 
    hemisphere='dominant', 
    condition='hold',
    feature_list=key_features
)

if comp_dominant_medOff is not None:
    display(comp_dominant_medOff)
    
    # Highlight largest differences
    if 'difference' in comp_dominant_medOff.columns:
        comp_dominant_medOff['abs_diff'] = comp_dominant_medOff['difference'].abs()
        top_diff = comp_dominant_medOff.nlargest(3, 'abs_diff')
        print(f"\n✓ Top 3 largest differences:")
        display(top_diff[['feature', f'{patient1}', f'{patient2}', 'difference']])

## 21. Compare Dominant Hemisphere - medOff Hold Task

In [None]:
# Compare non-dominant hemisphere features during hold task (medOn)
print(f"Comparing {patient1} vs {patient2} - Non-Dominant Hemisphere, medOn, Hold Task")
print("="*80)

comp_nondominant_medOn = compare_two_patients(
    patient1, patient2, 
    med_state='medOn', 
    hemisphere='nondominant', 
    condition='hold',
    feature_list=key_features
)

if comp_nondominant_medOn is not None:
    display(comp_nondominant_medOn)
    
    # Highlight largest differences
    if 'difference' in comp_nondominant_medOn.columns:
        comp_nondominant_medOn['abs_diff'] = comp_nondominant_medOn['difference'].abs()
        top_diff = comp_nondominant_medOn.nlargest(3, 'abs_diff')
        print(f"\n✓ Top 3 largest differences:")
        display(top_diff[['feature', f'{patient1}', f'{patient2}', 'difference']])

## 20. Compare Non-Dominant Hemisphere - medOn Hold Task

In [None]:
# Compare dominant hemisphere features during hold task (medOn)
print(f"Comparing {patient1} vs {patient2} - Dominant Hemisphere, medOn, Hold Task")
print("="*80)

# Select key features for comparison
key_features = [
    'persistence_entropy_mean',
    'h0_feature_count_mean', 'h1_feature_count_mean', 'h2_feature_count_mean',
    'h1_avg_lifespan_mean', 'h2_avg_lifespan_mean',
    'h1_max_lifespan_mean', 'h2_max_lifespan_mean'
]

comp_dominant_medOn = compare_two_patients(
    patient1, patient2, 
    med_state='medOn', 
    hemisphere='dominant', 
    condition='hold',
    feature_list=key_features
)

if comp_dominant_medOn is not None:
    display(comp_dominant_medOn)
    
    # Highlight largest differences
    if 'difference' in comp_dominant_medOn.columns:
        comp_dominant_medOn['abs_diff'] = comp_dominant_medOn['difference'].abs()
        top_diff = comp_dominant_medOn.nlargest(3, 'abs_diff')
        print(f"\n✓ Top 3 largest differences:")
        display(top_diff[['feature', f'{patient1}', f'{patient2}', 'difference']])

## 19. Compare Dominant Hemisphere - medOn Hold Task

In [11]:
def compare_two_patients(pid1, pid2, med_state='medOn', hemisphere='dominant', condition='hold', 
                          feature_list=None):
    """
    Compare features between two patients for a specific condition.
    
    Args:
        pid1, pid2: Patient IDs
        med_state: 'medOn' or 'medOff'
        hemisphere: 'dominant', 'nondominant', 'left', or 'right'
        condition: 'hold' or 'resting'
        feature_list: List of features to compare (None = all features)
    
    Returns:
        pandas.DataFrame: Comparison table
    """
    if pid1 not in all_patient_data or pid2 not in all_patient_data:
        print(f"Error: One or both patients not available")
        return None
    
    # Get the key for accessing features
    key = f"{hemisphere}_{condition}"
    
    # Extract features for both patients
    features1 = all_patient_data[pid1].get(med_state, {}).get(key, {})
    features2 = all_patient_data[pid2].get(med_state, {}).get(key, {})
    
    if not features1 or not features2:
        print(f"Error: Features not available for {med_state}/{hemisphere}/{condition}")
        return None
    
    # If no feature list provided, use all common features
    if feature_list is None:
        feature_list = [k for k in features1.keys() if k in features2 and k not in ['hemisphere', 'condition']]
    
    # Build comparison table
    comparison = []
    for feature in feature_list:
        val1 = features1.get(feature)
        val2 = features2.get(feature)
        
        # Compute difference if both values are numeric
        diff = None
        if val1 is not None and val2 is not None and isinstance(val1, (int, float)) and isinstance(val2, (int, float)):
            diff = val2 - val1
        
        comparison.append({
            'feature': feature,
            f'{pid1}': val1,
            f'{pid2}': val2,
            'difference': diff
        })
    
    df_comp = pd.DataFrame(comparison)
    return df_comp

print("✓ Comparison function defined")

✓ Comparison function defined


## 18. Helper Function for Two-Patient Comparison

In [None]:
# Select two patients to compare
# Change these IDs to compare different patients
patient1 = 'i4oK0F'  # First patient
patient2 = 'QZTsn6'  # Second patient

# Verify patients are loaded
print(f"Available patients: {list(all_patient_data.keys())}\n")

if patient1 in all_patient_data:
    print(f"✓ {patient1} loaded")
    print(f"  Hold type: {all_patient_data[patient1].get('_metadata', {}).get('hold_type', 'unknown')}")
else:
    print(f"✗ {patient1} not available")

if patient2 in all_patient_data:
    print(f"✓ {patient2} loaded")
    print(f"  Hold type: {all_patient_data[patient2].get('_metadata', {}).get('hold_type', 'unknown')}")
else:
    print(f"✗ {patient2} not available")

## 17. Select Patients for Comparison

---

# Part 2: Two-Patient Comparison Pipeline

This section allows flexible comparison between any two patients across different states and conditions.

In [None]:
## 16. Next Steps

Now that the data is loaded and organized, you can:

1. **Visual Analysis**: Create plots comparing medOn vs medOff
2. **Statistical Testing**: Apply the 10 analysis pathways from ANALYSIS_METHODOLOGY.md
3. **Feature Selection**: Identify most discriminative features
4. **Lateralization**: Analyze dominant vs non-dominant hemisphere differences
5. **Homology Dimension Analysis**: Compare H0, H1, H2, H3 contributions

See `ANALYSIS_METHODOLOGY.md` for detailed analysis strategies.

## Next Steps

Now that the data is loaded and organized, you can:

1. **Visual Analysis**: Create plots comparing medOn vs medOff
2. **Statistical Testing**: Apply the 10 analysis pathways from ANALYSIS_METHODOLOGY.md
3. **Feature Selection**: Identify most discriminative features
4. **Lateralization**: Analyze dominant vs non-dominant hemisphere differences
5. **Homology Dimension Analysis**: Compare H0, H1, H2, H3 contributions

See `ANALYSIS_METHODOLOGY.md` for detailed analysis strategies.