# Multi-Modal Fusion with Attention

This notebook demonstrates how to combine multiple data modalities (EEG, video, motion sensors) using NeurOS's attention-based fusion model.

**What you'll learn:**
- Combining EEG, video, and motion sensor data
- Using AttentionFusionModel for intelligent modality weighting
- Interpreting attention weights to understand model decisions
- Comparing fusion strategies (concatenation vs. attention)
- Running multi-modal pipelines in real-time

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from neuros.models import AttentionFusionModel, CompositeModel, ModelRegistry
from neuros.pipeline import MultiModalPipeline
from neuros.drivers import MockDriver, VideoDriver, MotionSensorDriver

print("✓ Imports successful")

## 1. Generate Synthetic Multi-Modal Data

We'll simulate a scenario where we're trying to detect emotional states using:
- **EEG** (8 channels, 40 features after band-power extraction)
- **Video** (facial features, 128-dim embedding)
- **Motion** (accelerometer/gyroscope, 12 features)

Classes:
- 0: Neutral
- 1: Happy  
- 2: Stressed

In [None]:
def generate_multimodal_data(n_samples=300):
    """
    Generate synthetic multi-modal emotion recognition data.
    
    Returns:
    --------
    X_eeg : np.ndarray (n_samples, 40)
        EEG band-power features
    X_video : np.ndarray (n_samples, 128)
        Video facial features  
    X_motion : np.ndarray (n_samples, 12)
        Motion sensor features
    y : np.ndarray (n_samples,)
        Labels (0=neutral, 1=happy, 2=stressed)
    """
    X_eeg = []
    X_video = []
    X_motion = []
    y = []
    
    for _ in range(n_samples):
        label = np.random.randint(0, 3)
        
        if label == 0:  # Neutral
            # Low arousal in EEG
            eeg = np.random.randn(40) * 0.5
            # Neutral facial expression (low variance)
            video = np.random.randn(128) * 0.3
            # Minimal movement
            motion = np.random.randn(12) * 0.2
            
        elif label == 1:  # Happy
            # Increased frontal alpha in EEG
            eeg = np.random.randn(40) * 0.7
            eeg[:8] += 1.5  # Frontal channels boosted
            # Smile in video (positive values in specific features)
            video = np.random.randn(128) * 0.5
            video[50:70] += 2.0  # "Smile" features
            # Moderate movement
            motion = np.random.randn(12) * 0.5 + 0.3
            
        else:  # Stressed (label == 2)
            # High beta activity in EEG
            eeg = np.random.randn(40) * 1.0
            eeg[24:32] += 2.0  # Beta band boosted
            # Tense facial features
            video = np.random.randn(128) * 0.6
            video[20:40] += 1.5  # "Tension" features
            # Fidgeting (high variance in motion)
            motion = np.random.randn(12) * 1.2
        
        X_eeg.append(eeg)
        X_video.append(video)
        X_motion.append(motion)
        y.append(label)
    
    return (np.array(X_eeg), np.array(X_video), 
            np.array(X_motion), np.array(y))

# Generate dataset
print("Generating synthetic multi-modal data...")
X_eeg, X_video, X_motion, y = generate_multimodal_data(n_samples=300)

print(f"✓ Generated {len(y)} samples")
print(f"  EEG features: {X_eeg.shape[1]}")
print(f"  Video features: {X_video.shape[1]}")
print(f"  Motion features: {X_motion.shape[1]}")
print(f"  Total features: {X_eeg.shape[1] + X_video.shape[1] + X_motion.shape[1]}")
print(f"  Class distribution: {np.bincount(y)}")

## 2. Prepare Data for Fusion Models

Concatenate all modality features into a single feature vector.

In [None]:
from sklearn.model_selection import train_test_split

# Concatenate all modalities
X_concat = np.concatenate([X_eeg, X_video, X_motion], axis=1)
print(f"Concatenated feature shape: {X_concat.shape}")

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X_concat, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTraining set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")

## 3. Train Baseline Model (Simple Concatenation)

First, let's try a baseline approach: just concatenate all features and train a classifier.

In [None]:
from neuros.models import SimpleClassifier
from sklearn.metrics import accuracy_score, classification_report

# Train baseline model
baseline_model = SimpleClassifier()
baseline_model.train(X_train, y_train)

# Evaluate
y_pred_baseline = baseline_model.predict(X_test)
baseline_accuracy = accuracy_score(y_test, y_pred_baseline)

print(f"Baseline Model (Simple Concatenation)")
print(f"Test Accuracy: {baseline_accuracy:.2%}\n")
print(classification_report(y_test, y_pred_baseline, 
                          target_names=['Neutral', 'Happy', 'Stressed']))

## 4. Train Attention Fusion Model

Now let's use the AttentionFusionModel which learns to weight each modality's contribution.

In [None]:
# Create attention fusion model
attention_model = AttentionFusionModel(
    modality_dims=[40, 128, 12],  # EEG, Video, Motion
    n_classes=3,
    fusion_dim=64,
    attention_type="learned",
    dropout=0.3,
)

print("Training Attention Fusion Model...\n")
attention_model.train(X_train, y_train)

# Evaluate
y_pred_attention = attention_model.predict(X_test)
attention_accuracy = accuracy_score(y_test, y_pred_attention)

print(f"\nAttention Fusion Model")
print(f"Test Accuracy: {attention_accuracy:.2%}\n")
print(classification_report(y_test, y_pred_attention,
                          target_names=['Neutral', 'Happy', 'Stressed']))

# Compare with baseline
improvement = (attention_accuracy - baseline_accuracy) * 100
print(f"\nImprovement over baseline: {improvement:+.1f} percentage points")

## 5. Interpret Attention Weights

One key advantage of attention-based fusion is interpretability. Let's see which modalities the model relies on.

In [None]:
# Get attention weights for test samples
attention_weights = attention_model.get_attention_weights(X_test)

# Average attention per class
class_names = ['Neutral', 'Happy', 'Stressed']
modality_names = ['EEG', 'Video', 'Motion']

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for cls in range(3):
    cls_mask = y_test == cls
    cls_attention = attention_weights[cls_mask].mean(axis=0)
    
    axes[cls].bar(modality_names, cls_attention, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
    axes[cls].set_title(f"{class_names[cls]}\nAverage Attention Weights")
    axes[cls].set_ylabel('Attention Weight')
    axes[cls].set_ylim(0, 1)
    axes[cls].grid(axis='y', alpha=0.3)
    
    # Add value labels on bars
    for i, v in enumerate(cls_attention):
        axes[cls].text(i, v + 0.02, f"{v:.2f}", ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- Higher weights indicate the model relies more on that modality")
print("- Different classes may show different modality preferences")
print("- This helps us understand what signals are most informative")

## 6. Analyze Per-Sample Attention

Let's look at how attention varies across individual samples.

In [None]:
# Plot attention distribution
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for mod_idx, mod_name in enumerate(modality_names):
    for cls in range(3):
        cls_mask = y_test == cls
        cls_attention = attention_weights[cls_mask, mod_idx]
        axes[mod_idx].hist(cls_attention, alpha=0.5, label=class_names[cls], bins=15)
    
    axes[mod_idx].set_title(f"{mod_name} Attention Distribution")
    axes[mod_idx].set_xlabel('Attention Weight')
    axes[mod_idx].set_ylabel('Frequency')
    axes[mod_idx].legend()
    axes[mod_idx].grid(alpha=0.3)

plt.tight_layout()
plt.show()

## 7. Save Models to Registry

In [None]:
registry = ModelRegistry()

# Save baseline model
registry.save(
    baseline_model,
    name="emotion_baseline",
    version="1.0.0",
    metrics={"accuracy": float(baseline_accuracy)},
    tags=["emotion", "baseline", "multimodal"],
)

# Save attention model
registry.save(
    attention_model,
    name="emotion_attention_fusion",
    version="1.0.0",
    metrics={
        "accuracy": float(attention_accuracy),
        "improvement_over_baseline": float(improvement),
    },
    hyperparameters={
        "fusion_dim": 64,
        "attention_type": "learned",
        "dropout": 0.3,
    },
    tags=["emotion", "attention", "multimodal", "production"],
)

print("✓ Models saved to registry")

# List all multimodal models
multimodal_models = registry.search(tags=["multimodal"])
print(f"\nMulti-modal models in registry: {len(multimodal_models)}")
for m in multimodal_models:
    print(f"  {m.name} v{m.version}: {m.metrics.get('accuracy', 0):.2%}")

## 8. Real-Time Multi-Modal Pipeline

Demonstrate how to run a real-time multi-modal pipeline.

In [None]:
import asyncio

# Create multi-modal pipeline
# Note: In production, replace mock drivers with real hardware
pipeline = MultiModalPipeline(
    drivers=[
        MockDriver(sampling_rate=250.0, channels=8),  # EEG
        MockDriver(sampling_rate=30.0, channels=128), # Video (30 fps)
        MockDriver(sampling_rate=100.0, channels=12), # Motion sensors
    ],
    model=attention_model,
)

print("Running multi-modal pipeline for 3 seconds...")
print("(Using mock drivers for demonstration)\n")

metrics = await pipeline.run(duration=3.0)

print("\n✓ Multi-modal pipeline complete!")
print(f"  Throughput: {metrics['throughput']:.1f} samples/sec")
print(f"  Mean latency: {metrics['mean_latency']*1000:.2f} ms")
print(f"  Total samples: {metrics['samples']}")

## 9. Modality Ablation Study

Test what happens when we remove individual modalities.

In [None]:
ablation_results = {}

# Test with individual modalities
modality_data = {
    'EEG only': X_eeg,
    'Video only': X_video,
    'Motion only': X_motion,
}

for name, X_modality in modality_data.items():
    # Split data
    X_tr, X_te, y_tr, y_te = train_test_split(
        X_modality, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Train model
    model = SimpleClassifier()
    model.train(X_tr, y_tr)
    
    # Evaluate
    y_pred = model.predict(X_te)
    acc = accuracy_score(y_te, y_pred)
    ablation_results[name] = acc

# Add full model results
ablation_results['All modalities (baseline)'] = baseline_accuracy
ablation_results['All modalities (attention)'] = attention_accuracy

# Plot results
plt.figure(figsize=(10, 6))
names = list(ablation_results.keys())
accs = list(ablation_results.values())
colors = ['#d62728', '#d62728', '#d62728', '#1f77b4', '#2ca02c']
bars = plt.barh(names, accs, color=colors, alpha=0.7)

# Highlight best result
best_idx = np.argmax(accs)
bars[best_idx].set_color('#2ca02c')
bars[best_idx].set_alpha(1.0)

plt.xlabel('Test Accuracy')
plt.title('Modality Ablation Study')
plt.xlim(0, 1)
plt.grid(axis='x', alpha=0.3)

# Add value labels
for i, (name, acc) in enumerate(zip(names, accs)):
    plt.text(acc + 0.01, i, f"{acc:.2%}", va='center')

plt.tight_layout()
plt.show()

print("\nKey Findings:")
print(f"- Best single modality: {max(ablation_results.items(), key=lambda x: x[1] if 'only' in x[0] else 0)}")
print(f"- Fusion improves over best single modality by: {(attention_accuracy - max([v for k, v in ablation_results.items() if 'only' in k]))*100:.1f} pp")
print(f"- Attention fusion is best overall: {attention_accuracy:.2%}")

## 10. Next Steps

Now that you've mastered multi-modal fusion:

1. **Add more modalities:** Try combining EEG + fNIRS + EOG + EMG
2. **Experiment with attention types:** Try "self" attention for cross-modality interactions
3. **Real-world data:** Apply to actual multi-modal BCI datasets
4. **Online adaptation:** Update attention weights in real-time
5. **Explainability:** Use attention weights to understand model decisions

See other notebooks:
- `03_advanced_attention.ipynb` - Cross-modal attention mechanisms
- `04_real_time_fusion.ipynb` - Production deployment strategies
- `05_interpretability.ipynb` - Understanding fusion decisions