# Tutorial 3: Multi-Modal Processing

Welcome to the multi-modal processing tutorial! In this notebook, you'll learn how to work with multiple data streams simultaneously—a key capability for modern neuroscience research.

## What You'll Learn

1. Understanding multi-modal data (EEG + video + physiological signals)
2. Creating multi-modal pipelines
3. Synchronizing data streams
4. Fusion strategies for combining modalities
5. Multi-modal classification
6. Analyzing cross-modal relationships

## Why Multi-Modal?

Combining multiple data sources provides:
- **Richer representations**: Capture complementary information
- **Better accuracy**: Multiple evidence sources improve predictions
- **Robustness**: Redundancy helps when one modality is noisy
- **Deeper insights**: Understand cross-modal relationships

## 1. Setup and Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix
import asyncio

# NeurOS imports
from neuros.agents.multimodal_orchestrator import MultiModalOrchestrator
from neuros.agents.fusion_agent import FusionAgent
from neuros.drivers.mock_driver import MockDriver
from neuros.drivers.video_driver import VideoDriver
from neuros.drivers.gsr_driver import GSRDriver
from neuros.models.simple_classifier import SimpleClassifier
from neuros.datasets.allen_loader import load_simulated_allen_data

# Set random seed
np.random.seed(42)

print("✓ Imports successful!")

## 2. Understanding Multi-Modal Data

Let's simulate a multi-modal experiment where we're studying emotional responses:
- **EEG**: Brain activity (64 channels, 250 Hz)
- **Video**: Facial expressions (30 fps)
- **GSR**: Galvanic skin response (physiological arousal, 10 Hz)

**Task**: Classify emotional states (neutral, happy, sad) using all modalities.

In [None]:
# Generate simulated multi-modal data
n_samples = 150
n_classes = 3  # neutral, happy, sad

# EEG data: 64 channels, 5 time bins (250 Hz, 20ms window)
eeg_data = load_simulated_allen_data(
    n_samples=n_samples,
    n_neurons=64,
    n_classes=n_classes,
    noise_level=0.3
)
X_eeg = eeg_data['features']
y_labels = eeg_data['labels']

# Video features: Facial action units (12 features)
X_video = np.random.randn(n_samples, 12)
# Add class-specific patterns
for i in range(n_classes):
    mask = (y_labels == i)
    X_video[mask] += np.random.randn(12) * 2  # Class-specific shift

# GSR features: Skin conductance level and response (2 features)
X_gsr = np.random.randn(n_samples, 2)
for i in range(n_classes):
    mask = (y_labels == i)
    X_gsr[mask, 0] += i * 1.5  # Higher arousal for certain emotions

print(f"EEG shape: {X_eeg.shape}")
print(f"Video shape: {X_video.shape}")
print(f"GSR shape: {X_gsr.shape}")
print(f"Labels shape: {y_labels.shape}")
print(f"\nClass distribution: {np.bincount(y_labels)}")

## 3. Visualize Multi-Modal Data

In [None]:
# Plot sample from each modality for each class
fig, axes = plt.subplots(3, 3, figsize=(15, 12))
class_names = ['Neutral', 'Happy', 'Sad']

for class_idx in range(3):
    # Get first sample of this class
    sample_idx = np.where(y_labels == class_idx)[0][0]
    
    # EEG
    ax = axes[class_idx, 0]
    eeg_sample = X_eeg[sample_idx].reshape(64, -1)
    im = ax.imshow(eeg_sample, aspect='auto', cmap='RdBu_r')
    ax.set_title(f'{class_names[class_idx]} - EEG')
    ax.set_ylabel('Channel')
    ax.set_xlabel('Time')
    plt.colorbar(im, ax=ax)
    
    # Video (facial action units)
    ax = axes[class_idx, 1]
    ax.bar(range(12), X_video[sample_idx])
    ax.set_title(f'{class_names[class_idx]} - Facial AUs')
    ax.set_xlabel('Action Unit')
    ax.set_ylabel('Activation')
    ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    
    # GSR
    ax = axes[class_idx, 2]
    ax.bar(['SCL', 'SCR'], X_gsr[sample_idx])
    ax.set_title(f'{class_names[class_idx]} - GSR')
    ax.set_ylabel('Conductance (μS)')

plt.tight_layout()
plt.show()

## 4. Train/Test Split

In [None]:
from sklearn.model_selection import train_test_split

# Split data
indices = np.arange(n_samples)
train_idx, test_idx = train_test_split(
    indices, test_size=0.3, random_state=42, stratify=y_labels
)

X_eeg_train, X_eeg_test = X_eeg[train_idx], X_eeg[test_idx]
X_video_train, X_video_test = X_video[train_idx], X_video[test_idx]
X_gsr_train, X_gsr_test = X_gsr[train_idx], X_gsr[test_idx]
y_train, y_test = y_labels[train_idx], y_labels[test_idx]

print(f"Training samples: {len(train_idx)}")
print(f"Test samples: {len(test_idx)}")

## 5. Single-Modality Baselines

First, let's see how well each modality performs individually:

In [None]:
from neuros.pipeline import Pipeline

results = {}

# EEG-only
eeg_model = SimpleClassifier(model_type='svm')
eeg_pipeline = Pipeline(driver=MockDriver(channels=64), model=eeg_model)
eeg_pipeline.train(X_eeg_train, y_train)
y_pred_eeg = eeg_pipeline.predict(X_eeg_test)
results['EEG'] = accuracy_score(y_test, y_pred_eeg)

# Video-only
video_model = SimpleClassifier(model_type='random_forest')
video_pipeline = Pipeline(driver=VideoDriver(), model=video_model)
video_pipeline.train(X_video_train, y_train)
y_pred_video = video_pipeline.predict(X_video_test)
results['Video'] = accuracy_score(y_test, y_pred_video)

# GSR-only
gsr_model = SimpleClassifier(model_type='svm')
gsr_pipeline = Pipeline(driver=GSRDriver(), model=gsr_model)
gsr_pipeline.train(X_gsr_train, y_train)
y_pred_gsr = gsr_pipeline.predict(X_gsr_test)
results['GSR'] = accuracy_score(y_test, y_pred_gsr)

# Display results
print("Single-Modality Baselines:")
for modality, acc in results.items():
    print(f"  {modality:10s}: {acc:.2%}")

## 6. Multi-Modal Fusion Strategies

### 6.1 Early Fusion (Feature-Level)

Concatenate all features before classification:

In [None]:
# Concatenate all features
X_train_early = np.concatenate([X_eeg_train, X_video_train, X_gsr_train], axis=1)
X_test_early = np.concatenate([X_eeg_test, X_video_test, X_gsr_test], axis=1)

print(f"Early fusion feature shape: {X_train_early.shape}")

# Train on fused features
early_fusion_model = SimpleClassifier(model_type='svm')
early_pipeline = Pipeline(driver=MockDriver(), model=early_fusion_model)
early_pipeline.train(X_train_early, y_train)
y_pred_early = early_pipeline.predict(X_test_early)
results['Early Fusion'] = accuracy_score(y_test, y_pred_early)

print(f"\nEarly Fusion Accuracy: {results['Early Fusion']:.2%}")

### 6.2 Late Fusion (Decision-Level)

Train separate models and combine predictions:

In [None]:
# Get probability predictions from each modality
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

# Retrain with probability predictions
eeg_clf = SVC(probability=True, random_state=42)
video_clf = RandomForestClassifier(random_state=42)
gsr_clf = SVC(probability=True, random_state=42)

eeg_clf.fit(X_eeg_train, y_train)
video_clf.fit(X_video_train, y_train)
gsr_clf.fit(X_gsr_train, y_train)

# Get probability predictions
proba_eeg = eeg_clf.predict_proba(X_eeg_test)
proba_video = video_clf.predict_proba(X_video_test)
proba_gsr = gsr_clf.predict_proba(X_gsr_test)

# Average probabilities (simple voting)
proba_late = (proba_eeg + proba_video + proba_gsr) / 3
y_pred_late = np.argmax(proba_late, axis=1)
results['Late Fusion (Voting)'] = accuracy_score(y_test, y_pred_late)

print(f"Late Fusion Accuracy: {results['Late Fusion (Voting)']:.2%}")

### 6.3 Weighted Fusion

Weight each modality by its individual performance:

In [None]:
# Calculate weights based on validation accuracy
weights = np.array([
    results['EEG'],
    results['Video'],
    results['GSR']
])
weights = weights / weights.sum()  # Normalize

print(f"Modality weights: EEG={weights[0]:.3f}, Video={weights[1]:.3f}, GSR={weights[2]:.3f}")

# Weighted average of probabilities
proba_weighted = (
    weights[0] * proba_eeg +
    weights[1] * proba_video +
    weights[2] * proba_gsr
)
y_pred_weighted = np.argmax(proba_weighted, axis=1)
results['Weighted Fusion'] = accuracy_score(y_test, y_pred_weighted)

print(f"\nWeighted Fusion Accuracy: {results['Weighted Fusion']:.2%}")

## 7. Compare All Approaches

In [None]:
# Create comparison plot
fig, ax = plt.subplots(figsize=(12, 6))

methods = list(results.keys())
accuracies = list(results.values())

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']
bars = ax.bar(methods, accuracies, color=colors)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.1%}',
            ha='center', va='bottom', fontsize=10, fontweight='bold')

ax.set_ylabel('Accuracy', fontsize=12)
ax.set_title('Multi-Modal Fusion Comparison', fontsize=14, fontweight='bold')
ax.set_ylim([0, 1])
ax.axhline(y=1/3, color='r', linestyle='--', label='Chance (33.3%)', alpha=0.5)
ax.legend()
plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.show()

print("\n📊 Summary:")
print(f"Best single modality: {max(results['EEG'], results['Video'], results['GSR']):.2%}")
print(f"Best fusion method: {max(results['Early Fusion'], results['Late Fusion (Voting)'], results['Weighted Fusion']):.2%}")
improvement = max(results['Early Fusion'], results['Late Fusion (Voting)'], results['Weighted Fusion']) - max(results['EEG'], results['Video'], results['GSR'])
print(f"Improvement from fusion: +{improvement:.2%}")

## 8. Using NeurOS FusionAgent

NeurOS provides a `FusionAgent` for sophisticated multi-modal processing:

In [None]:
from neuros.agents.fusion_agent import FusionAgent

# Create fusion agent
fusion_agent = FusionAgent(
    modalities=['eeg', 'video', 'gsr'],
    fusion_strategy='weighted'
)

# Prepare data dictionary
train_data = {
    'eeg': X_eeg_train,
    'video': X_video_train,
    'gsr': X_gsr_train
}

test_data = {
    'eeg': X_eeg_test,
    'video': X_video_test,
    'gsr': X_gsr_test
}

# Train fusion agent
fusion_agent.train(train_data, y_train)

# Predict
y_pred_agent = fusion_agent.predict(test_data)
agent_accuracy = accuracy_score(y_test, y_pred_agent)

print(f"FusionAgent Accuracy: {agent_accuracy:.2%}")

## 9. Analyze Cross-Modal Correlations

In [None]:
from scipy.stats import pearsonr

# Compute feature-level correlations between modalities
# Use average features per sample
eeg_avg = X_eeg_train.mean(axis=1)
video_avg = X_video_train.mean(axis=1)
gsr_avg = X_gsr_train.mean(axis=1)

# Compute correlations
corr_eeg_video, p1 = pearsonr(eeg_avg, video_avg)
corr_eeg_gsr, p2 = pearsonr(eeg_avg, gsr_avg)
corr_video_gsr, p3 = pearsonr(video_avg, gsr_avg)

# Create correlation matrix
corr_matrix = np.array([
    [1.0, corr_eeg_video, corr_eeg_gsr],
    [corr_eeg_video, 1.0, corr_video_gsr],
    [corr_eeg_gsr, corr_video_gsr, 1.0]
])

# Plot correlation matrix
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, fmt='.3f', 
            xticklabels=['EEG', 'Video', 'GSR'],
            yticklabels=['EEG', 'Video', 'GSR'],
            cmap='coolwarm', center=0, vmin=-1, vmax=1,
            square=True, ax=ax)
ax.set_title('Cross-Modal Correlations', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n🔗 Cross-Modal Correlations:")
print(f"  EEG ↔ Video: r={corr_eeg_video:.3f} (p={p1:.4f})")
print(f"  EEG ↔ GSR:   r={corr_eeg_gsr:.3f} (p={p2:.4f})")
print(f"  Video ↔ GSR: r={corr_video_gsr:.3f} (p={p3:.4f})")

## 10. Multi-Modal Orchestrator for Real-Time

For real-time multi-modal processing, use the `MultiModalOrchestrator`:

In [None]:
# Create multi-modal orchestrator
orchestrator = MultiModalOrchestrator(
    modalities=['eeg', 'video', 'gsr'],
    sync_window_ms=100  # 100ms synchronization window
)

# Register drivers
orchestrator.register_driver('eeg', MockDriver(channels=64))
orchestrator.register_driver('video', VideoDriver())
orchestrator.register_driver('gsr', GSRDriver())

# Register fusion agent
orchestrator.register_fusion(fusion_agent)

print("✓ Multi-modal orchestrator configured")
print(f"  Modalities: {orchestrator.modalities}")
print(f"  Sync window: {orchestrator.sync_window_ms}ms")

## Summary

Congratulations! You've learned multi-modal processing with NeurOS:

✅ Understanding multi-modal data types  
✅ Training single-modality baselines  
✅ Early fusion (feature-level)  
✅ Late fusion (decision-level)  
✅ Weighted fusion strategies  
✅ Using NeurOS FusionAgent  
✅ Analyzing cross-modal correlations  
✅ Real-time multi-modal orchestration  

## Key Takeaways

1. **Multi-modal fusion improves accuracy** - Combining modalities typically beats any single modality
2. **Different fusion strategies work for different tasks** - Try early, late, and weighted fusion
3. **Modality weighting matters** - Weight by individual performance for best results
4. **Cross-modal correlations provide insights** - Understand relationships between data streams
5. **Synchronization is critical** - Align timestamps across modalities for real-time

## Next Steps

- **Tutorial 4**: Custom Models - Build your own neural decoders
- **Tutorial 5**: Benchmarking - Compare methods systematically
- **Advanced**: Attention-based fusion, modal dropout, hierarchical processing

## Exercises

1. Try different fusion strategies (max pooling, attention mechanisms)
2. Add a fourth modality (e.g., heart rate)
3. Implement modal dropout for robustness testing
4. Analyze which modalities contribute most to each class
5. Build a hierarchical fusion model (early + late fusion combined)

Happy multi-modal processing! 🧠🎥📊