# LLM-EEG Framework - Phase 3: Feature Extraction & Classification

This notebook demonstrates the complete Phase 3 implementation for the LLM-EEG framework,
focused on feature extraction and classification for motor imagery EEG signals.

## Overview

**Phase 3 Components:**
- **Feature Extractors**: CSP, Band Power, Time Domain
- **Feature Pipeline**: Modular feature extraction with multiple extractors
- **Classifiers**: LDA, SVM, EEGNet
- **Evaluation**: Cross-validation, metrics, model comparison

**Building on Phase 2:**
- Data loading (BCICIV2aLoader)
- Preprocessing (Bandpass, Notch, Normalization)
- PyTorch datasets

**Performance Targets:**
- Subject-dependent accuracy: >85%
- Subject-independent accuracy: >70%
- Cohen's Kappa: >0.80

---

## Step 1: Environment Setup

In [None]:
# Step 1.1: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Step 1.2: Clone the repository
!git clone https://github.com/erlika/llm-eeg.git
%cd llm-eeg

In [None]:
# Step 1.3: Install dependencies
!pip install -q numpy scipy mne torch scikit-learn matplotlib seaborn

In [None]:
# Step 1.4: Add src to Python path and verify imports
import sys
sys.path.insert(0, '/content/llm-eeg')

# Verify Phase 3 imports
from src.features import (
    CSPExtractor, BandPowerExtractor, TimeDomainExtractor,
    FeatureExtractorFactory, FeatureExtractionPipeline,
    create_csp_extractor, create_motor_imagery_pipeline
)
from src.classifiers import (
    LDAClassifier, SVMClassifier, EEGNetClassifier,
    ClassifierFactory, create_lda_classifier, create_svm_classifier,
    create_eegnet_classifier, list_available_classifiers
)

print("✅ Phase 3 modules imported successfully!")
print(f"\nAvailable classifiers: {list_available_classifiers()}")

## Step 2: Load and Preprocess Data (Phase 2 Review)

In [None]:
# Step 2.1: Configure data paths
import os
import numpy as np

# Update this path to your Google Drive location
DATA_DIR = '/content/drive/MyDrive/BCI_Data/dataset_2a'

# Alternative paths:
# DATA_DIR = '/content/drive/MyDrive/BCI_Competition_IV_2a'

if os.path.exists(DATA_DIR):
    files = os.listdir(DATA_DIR)
    mat_files = [f for f in files if f.endswith('.mat')]
    print(f"✅ Found {len(mat_files)} MAT files in {DATA_DIR}")
else:
    print(f"❌ Directory not found: {DATA_DIR}")
    print("Please update DATA_DIR to your dataset location")

In [None]:
# Step 2.2: Load data using Phase 2 BCICIV2aLoader
from scipy.io import loadmat
from src.core.data_types import EEGData, EventMarker

class BCICIV2aLoader:
    """Data loader for BCI Competition IV-2a dataset."""
    
    def __init__(self, sampling_rate=250, include_eog=False, trial_duration=4.0, trial_offset=0.0):
        self.sampling_rate = sampling_rate
        self.n_eeg_channels = 22
        self.include_eog = include_eog
        self.trial_duration = trial_duration
        self.trial_offset = trial_offset
        self.class_mapping = {1: 'left_hand', 2: 'right_hand', 3: 'feet', 4: 'tongue'}
        self.eeg_channel_names = [
            'Fz', 'FC3', 'FC1', 'FCz', 'FC2', 'FC4',
            'C5', 'C3', 'C1', 'Cz', 'C2', 'C4', 'C6',
            'CP3', 'CP1', 'CPz', 'CP2', 'CP4', 'P1', 'Pz', 'P2', 'POz'
        ]
        
    def load(self, file_path):
        mat_data = loadmat(file_path, struct_as_record=False, squeeze_me=True)
        data_array = mat_data['data']
        
        all_signals = []
        all_events = []
        sample_offset = 0
        
        for run_idx in range(len(data_array)):
            run = data_array[run_idx]
            signals = run.X
            n_samples = signals.shape[0]
            all_signals.append(signals)
            
            if hasattr(run, 'y') and hasattr(run.y, '__len__') and len(run.y) > 0:
                for start, label in zip(run.trial, run.y):
                    event = EventMarker(
                        sample=int(start) + sample_offset,
                        code=768 + int(label),
                        label=self.class_mapping.get(int(label), f'class_{label}')
                    )
                    all_events.append(event)
            sample_offset += n_samples
        
        signals = np.vstack(all_signals).T[:self.n_eeg_channels, :]
        
        return EEGData(
            signals=signals,
            sampling_rate=self.sampling_rate,
            channel_names=self.eeg_channel_names,
            events=all_events
        )
    
    def extract_trials(self, eeg_data, duration=None, offset=None):
        duration = duration or self.trial_duration
        offset = offset or self.trial_offset
        samples_per_trial = int(duration * self.sampling_rate)
        offset_samples = int(offset * self.sampling_rate)
        
        trials, labels = [], []
        for event in eeg_data.events:
            start = event.sample + offset_samples
            end = start + samples_per_trial
            if start < 0 or end > eeg_data.signals.shape[1]:
                continue
            trials.append(eeg_data.signals[:, start:end])
            labels.append(event.code - 769)
        
        return np.array(trials), np.array(labels)

print("✅ BCICIV2aLoader ready")

In [None]:
# Step 2.3: Load Subject A01 Training Data
loader = BCICIV2aLoader()

subject_file = os.path.join(DATA_DIR, 'A01T.mat')
eeg_data = loader.load(subject_file)
X, y = loader.extract_trials(eeg_data)

print(f"\n=== Subject A01 Data ===")
print(f"Trials shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Classes: {np.unique(y)}")
print(f"Samples per class: {[np.sum(y==c) for c in range(4)]}")

In [None]:
# Step 2.4: Preprocess Data
from src.preprocessing import create_standard_pipeline

pipeline = create_standard_pipeline(
    sampling_rate=250,
    notch_freq=50.0,
    low_freq=8.0,
    high_freq=30.0,
    normalize_method='zscore'
)
pipeline.initialize()

X_processed = pipeline.process(X)
print(f"\n=== Preprocessed Data ===")
print(f"Shape: {X_processed.shape}")
print(f"Range: [{X_processed.min():.2f}, {X_processed.max():.2f}]")

In [None]:
# Step 2.5: Split Data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_processed, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\n=== Data Split ===")
print(f"Train: {X_train.shape}")
print(f"Test: {X_test.shape}")

## Step 3: CSP Feature Extraction

Common Spatial Pattern (CSP) is the most effective spatial filtering technique for motor imagery EEG classification.

**How CSP works:**
1. Learn spatial filters that maximize variance for one class while minimizing for another
2. Project EEG data through these filters
3. Compute log-variance features

In [None]:
# Step 3.1: Create and fit CSP extractor
from src.features import CSPExtractor, create_csp_extractor

# Create CSP with 6 components (3 per class for binary)
csp = create_csp_extractor(n_components=6, sampling_rate=250)

# Fit and extract features
X_train_csp = csp.fit_extract(X_train, y_train)
X_test_csp = csp.extract(X_test)

print(f"\n=== CSP Feature Extraction ===")
print(f"Original shape: {X_train.shape}")
print(f"CSP features shape: {X_train_csp.shape}")
print(f"Feature names: {csp.get_feature_names()[:6]}")
print(f"\nFilters shape: {csp.get_spatial_filters().shape}")
print(f"Patterns shape: {csp.get_spatial_patterns().shape}")

In [None]:
# Step 3.2: Visualize CSP Spatial Patterns
import matplotlib.pyplot as plt
import numpy as np

patterns = csp.get_spatial_patterns()
n_patterns = min(6, patterns.shape[0])

fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()

for i in range(n_patterns):
    ax = axes[i]
    pattern = patterns[i]
    
    # Simple bar plot of channel weights
    ax.bar(range(len(pattern)), pattern)
    ax.set_title(f'CSP Pattern {i+1}')
    ax.set_xlabel('Channel')
    ax.set_ylabel('Weight')
    ax.axhline(y=0, color='k', linestyle='-', linewidth=0.5)

plt.tight_layout()
plt.suptitle('CSP Spatial Patterns', y=1.02, fontsize=14)
plt.show()

In [None]:
# Step 3.3: CSP Feature Visualization
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature distribution by class
ax1 = axes[0]
for class_idx in range(4):
    class_mask = y_train == class_idx
    ax1.scatter(
        X_train_csp[class_mask, 0], 
        X_train_csp[class_mask, 1],
        label=f'Class {class_idx}',
        alpha=0.6
    )
ax1.set_xlabel('CSP Feature 1')
ax1.set_ylabel('CSP Feature 2')
ax1.set_title('CSP Features: First 2 Components')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Box plot of features per class
ax2 = axes[1]
feature_data = [X_train_csp[y_train == c, 0] for c in range(4)]
ax2.boxplot(feature_data, labels=['Left Hand', 'Right Hand', 'Feet', 'Tongue'])
ax2.set_xlabel('Class')
ax2.set_ylabel('CSP Feature 1')
ax2.set_title('CSP Feature 1 Distribution by Class')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 4: Band Power Feature Extraction

Band power features extract the spectral energy in specific frequency bands relevant to motor imagery:
- **Mu (8-12 Hz)**: Sensorimotor rhythm
- **Beta (12-30 Hz)**: Motor planning and execution

In [None]:
# Step 4.1: Band Power Extraction
from src.features import BandPowerExtractor, create_band_power_extractor

# Create band power extractor for motor imagery bands
bands = {
    'mu': (8, 12),
    'beta_low': (12, 20),
    'beta_high': (20, 30)
}

bp = create_band_power_extractor(
    bands=bands,
    sampling_rate=250,
    average_channels=False,
    log=True
)

# Extract features
X_train_bp = bp.extract(X_train)
X_test_bp = bp.extract(X_test)

print(f"\n=== Band Power Features ===")
print(f"Feature shape: {X_train_bp.shape}")
print(f"Features per trial: {X_train_bp.shape[1]} (22 channels x 3 bands)")

In [None]:
# Step 4.2: Visualize Band Power
import matplotlib.pyplot as plt

# Average band power across trials per class
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
class_names = ['Left Hand', 'Right Hand', 'Feet', 'Tongue']

for class_idx, ax in enumerate(axes.flatten()):
    class_mask = y_train == class_idx
    class_bp = X_train_bp[class_mask].mean(axis=0)
    
    # Reshape to (channels, bands)
    n_channels = 22
    n_bands = 3
    bp_reshaped = class_bp.reshape(n_channels, n_bands)
    
    im = ax.imshow(bp_reshaped.T, aspect='auto', cmap='RdBu_r')
    ax.set_xlabel('Channel')
    ax.set_ylabel('Frequency Band')
    ax.set_yticks([0, 1, 2])
    ax.set_yticklabels(['Mu', 'Beta Low', 'Beta High'])
    ax.set_title(f'{class_names[class_idx]}')
    plt.colorbar(im, ax=ax)

plt.suptitle('Average Band Power by Class', y=1.02, fontsize=14)
plt.tight_layout()
plt.show()

## Step 4.3: Time Domain Feature Extraction

Time domain features capture statistical and temporal properties of EEG signals:
- **Statistical**: Mean, variance, skewness, kurtosis, RMS
- **Hjorth Parameters**: Activity, mobility, complexity

In [None]:
# Step 4.3: Time Domain Features
from src.features import TimeDomainExtractor, create_time_domain_extractor

# Create time domain extractor with Hjorth parameters
td = create_time_domain_extractor(
    features=['mean', 'variance', 'rms', 'hjorth_activity', 'hjorth_mobility', 'hjorth_complexity'],
    sampling_rate=250
)

# Extract features
X_train_td = td.extract(X_train)
X_test_td = td.extract(X_test)

print(f"\n=== Time Domain Features ===")
print(f"Feature shape: {X_train_td.shape}")
print(f"Features: 22 channels × 6 features = 132")

## Step 5: Feature Pipeline

Combine multiple feature extractors into a single pipeline for comprehensive feature representation.

In [None]:
# Step 5.1: Create Feature Pipeline
from src.features import FeatureExtractionPipeline, create_motor_imagery_pipeline

# Create motor imagery optimized pipeline
pipeline = create_motor_imagery_pipeline(
    n_csp_components=6,
    sampling_rate=250
)

# Fit and extract
X_train_features = pipeline.fit_extract(X_train, y_train)
X_test_features = pipeline.extract(X_test)

print(f"\n=== Feature Pipeline ===")
print(f"Combined features shape: {X_train_features.shape}")
print(pipeline.summary())

## Step 5.2: Feature Pipeline - Modular Design & Usage Scenarios

The Feature Pipeline supports multiple modes for combining extractors:

| Scenario | Description | Example |
|----------|-------------|---------|
| Single Extractor | Only CSP | `CSPExtractor(n_components=6)` |
| Multiple Extractors | CSP + Band Power | Concatenate features |
| Sequential | Filter → CSP | Process in sequence |
| Parallel | CSP ∥ Band Power | Extract simultaneously, then combine |
| Conditional | Different extractor based on SNR | For APA Agent (Phase 4) |

### Pipeline Modes
- `'concatenate'`: Combine features side by side (default)
- `'sequential'`: Output of one extractor feeds into next
- `'parallel'`: Extract in parallel, then merge

In [None]:
# Step 5.2: Feature Pipeline Usage Scenarios
from src.features import FeatureExtractionPipeline, FeatureExtractorFactory

# ============================================
# Scenario 1: Manual Pipeline Construction
# ============================================
pipeline_manual = FeatureExtractionPipeline(mode='concatenate')
pipeline_manual.add_extractor(
    CSPExtractor(n_components=6, sampling_rate=250), 
    name='csp'
)
pipeline_manual.add_extractor(
    BandPowerExtractor(
        bands={'mu': (8, 12), 'beta': (12, 30)},
        sampling_rate=250
    ), 
    name='band_power'
)

# Fit and extract
X_train_combined = pipeline_manual.fit_extract(X_train, y_train)
X_test_combined = pipeline_manual.extract(X_test)

print("=== Scenario 1: Manual Pipeline ===")
print(f"CSP features: 6")
print(f"Band Power features: 22 channels × 2 bands = 44")
print(f"Combined shape: {X_train_combined.shape}")

# ============================================
# Scenario 2: Factory-based Creation
# ============================================
csp_from_factory = FeatureExtractorFactory.create('csp', n_components=4)
bp_from_factory = FeatureExtractorFactory.create('band_power', 
    bands={'mu': (8, 12)}, 
    sampling_rate=250
)

print("\n=== Scenario 2: Factory Creation ===")
print(f"Available extractors: {FeatureExtractorFactory.list_available()}")

# ============================================
# Scenario 3: Config-driven Pipeline (for Agents)
# ============================================
pipeline_config = {
    'extractors': [
        {'type': 'csp', 'params': {'n_components': 6}},
        {'type': 'band_power', 'params': {'bands': {'mu': (8, 12), 'beta': (12, 30)}}}
    ],
    'mode': 'concatenate'
}

pipeline_from_config = FeatureExtractionPipeline.from_config(pipeline_config)
print("\n=== Scenario 3: Config-driven ===")
print(f"Pipeline created from config: {pipeline_from_config}")

# ============================================
# Scenario 4: Adaptive Pipeline (Phase 4 Preview)
# ============================================
def select_pipeline_by_quality(signal_quality):
    """
    Select feature pipeline based on signal quality.
    This is a preview of Phase 4 APA integration.
    
    Args:
        signal_quality: dict with 'snr' key
        
    Returns:
        Configured FeatureExtractionPipeline
    """
    if signal_quality.get('snr', 0) > 10:
        # High quality: use more components
        return create_motor_imagery_pipeline(n_csp_components=8, sampling_rate=250)
    else:
        # Low quality: fewer parameters, more robust
        return create_motor_imagery_pipeline(n_csp_components=4, sampling_rate=250)

print("\n=== Scenario 4: Adaptive Pipeline (Phase 4 Preview) ===")
print("Pipeline selection based on signal quality defined")
print("- High SNR (>10): 8 CSP components")
print("- Low SNR (≤10): 4 CSP components")

## Step 6: Classification with LDA

Linear Discriminant Analysis (LDA) is a classic and effective classifier for CSP features.

In [None]:
# Step 6.1: Train LDA Classifier
from src.classifiers import create_lda_classifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Create and train LDA on CSP features
lda = create_lda_classifier(n_classes=4)
lda.fit(X_train_csp, y_train)

# Predict
y_pred_lda = lda.predict(X_test_csp)
y_prob_lda = lda.predict_proba(X_test_csp)

# Evaluate
acc_lda = accuracy_score(y_test, y_pred_lda)

print(f"\n=== CSP + LDA Results ===")
print(f"Accuracy: {acc_lda:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_lda, target_names=['Left', 'Right', 'Feet', 'Tongue']))

In [None]:
# Step 6.2: Confusion Matrix
import seaborn as sns

cm = confusion_matrix(y_test, y_pred_lda)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Left', 'Right', 'Feet', 'Tongue'],
            yticklabels=['Left', 'Right', 'Feet', 'Tongue'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title(f'CSP + LDA Confusion Matrix (Accuracy: {acc_lda:.2%})')
plt.show()

## Step 7: Classification with SVM

Support Vector Machine with RBF kernel often achieves higher accuracy than LDA for complex decision boundaries.

In [None]:
# Step 7.1: Train SVM Classifier
from src.classifiers import create_svm_classifier

# Create and train SVM with RBF kernel
svm = create_svm_classifier(kernel='rbf', C=1.0, gamma='scale', n_classes=4)
svm.fit(X_train_csp, y_train)

# Predict
y_pred_svm = svm.predict(X_test_csp)
acc_svm = accuracy_score(y_test, y_pred_svm)

print(f"\n=== CSP + SVM Results ===")
print(f"Accuracy: {acc_svm:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_svm, target_names=['Left', 'Right', 'Feet', 'Tongue']))
print(f"\nSupport Vectors per class: {svm.n_support_}")

## Step 8: EEGNet Deep Learning Classifier

EEGNet is a compact CNN designed specifically for EEG classification. It can learn directly from raw/preprocessed EEG without manual feature extraction.

In [None]:
# Step 8.1: Create and Train EEGNet
import torch
from src.classifiers import create_eegnet_classifier

# Check device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Create EEGNet
eegnet = create_eegnet_classifier(
    n_classes=4,
    n_channels=22,
    n_samples=1000,  # 4 seconds at 250 Hz
    F1=8,
    D=2,
    dropout_rate=0.5,
    learning_rate=0.001,
    device=device
)

print(f"\n=== EEGNet Model ===")
print(f"Parameters: {eegnet.count_parameters()}")

In [None]:
# Step 8.2: Train EEGNet
# Split training data for validation
X_train_dl, X_val_dl, y_train_dl, y_val_dl = train_test_split(
    X_train, y_train, test_size=0.15, random_state=42, stratify=y_train
)

# Train
print("\n=== Training EEGNet ===")
eegnet.fit(
    X_train_dl.astype(np.float32), 
    y_train_dl,
    validation_data=(X_val_dl.astype(np.float32), y_val_dl),
    epochs=50,
    batch_size=32,
    verbose=1
)

In [None]:
# Step 8.3: Evaluate EEGNet
y_pred_eegnet = eegnet.predict(X_test.astype(np.float32))
acc_eegnet = accuracy_score(y_test, y_pred_eegnet)

print(f"\n=== EEGNet Results ===")
print(f"Accuracy: {acc_eegnet:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_eegnet, target_names=['Left', 'Right', 'Feet', 'Tongue']))

In [None]:
# Step 8.4: Plot Training History
history = eegnet.get_training_history()

if history:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Loss
    ax1 = axes[0]
    ax1.plot(history['train_loss'], label='Train Loss')
    ax1.plot(history['val_loss'], label='Val Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('EEGNet Training Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Accuracy
    ax2 = axes[1]
    ax2.plot(history['train_accuracy'], label='Train Acc')
    ax2.plot(history['val_accuracy'], label='Val Acc')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('EEGNet Training Accuracy')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## Step 9: Model Comparison

In [None]:
# Step 9.1: Compare All Models
from sklearn.metrics import cohen_kappa_score

results = {
    'Model': ['CSP + LDA', 'CSP + SVM', 'EEGNet'],
    'Accuracy': [acc_lda, acc_svm, acc_eegnet],
    'Kappa': [
        cohen_kappa_score(y_test, y_pred_lda),
        cohen_kappa_score(y_test, y_pred_svm),
        cohen_kappa_score(y_test, y_pred_eegnet)
    ]
}

import pandas as pd
results_df = pd.DataFrame(results)
results_df = results_df.sort_values('Accuracy', ascending=False)

print("\n=== Model Comparison ===")
print(results_df.to_string(index=False))

# Visualization
fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(results['Model']))
width = 0.35

bars1 = ax.bar(x - width/2, results['Accuracy'], width, label='Accuracy', color='steelblue')
bars2 = ax.bar(x + width/2, results['Kappa'], width, label='Kappa', color='darkorange')

ax.set_ylabel('Score')
ax.set_title('Model Comparison: Accuracy & Kappa')
ax.set_xticks(x)
ax.set_xticklabels(results['Model'])
ax.legend()
ax.axhline(y=0.85, color='green', linestyle='--', alpha=0.5, label='Target (85%)')
ax.set_ylim([0, 1])
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## Step 10: Cross-Subject Evaluation (LOSO)

In [None]:
# Step 10.1: Leave-One-Subject-Out Cross-Validation
def run_loso_cv(data_dir, subjects, use_csp_lda=True):
    """
    Run Leave-One-Subject-Out cross-validation.
    
    Args:
        data_dir: Path to dataset
        subjects: List of subject IDs
        use_csp_lda: If True, use CSP+LDA; else use EEGNet
        
    Returns:
        Dict with results per subject and aggregate metrics
    """
    loader = BCICIV2aLoader()
    preproc = create_standard_pipeline(sampling_rate=250, notch_freq=50.0, low_freq=8.0, high_freq=30.0)
    preproc.initialize()
    
    results = []
    
    for test_subject in subjects:
        print(f"\nTest Subject: {test_subject}")
        
        # Collect train and test data
        X_train_all, y_train_all = [], []
        X_test_sub, y_test_sub = None, None
        
        for subject in subjects:
            file_path = os.path.join(data_dir, f"{subject}T.mat")
            if not os.path.exists(file_path):
                continue
            
            eeg_data = loader.load(file_path)
            X, y = loader.extract_trials(eeg_data)
            X = preproc.process(X)
            
            if subject == test_subject:
                X_test_sub, y_test_sub = X, y
            else:
                X_train_all.append(X)
                y_train_all.append(y)
        
        X_train_all = np.concatenate(X_train_all, axis=0)
        y_train_all = np.concatenate(y_train_all, axis=0)
        
        # Train and evaluate
        if use_csp_lda:
            csp = create_csp_extractor(n_components=6, sampling_rate=250)
            X_train_feat = csp.fit_extract(X_train_all, y_train_all)
            X_test_feat = csp.extract(X_test_sub)
            
            clf = create_lda_classifier(n_classes=4)
            clf.fit(X_train_feat, y_train_all)
            y_pred = clf.predict(X_test_feat)
        else:
            clf = create_eegnet_classifier(
                n_classes=4, n_channels=22, n_samples=1000, device='cpu'
            )
            clf.fit(X_train_all.astype(np.float32), y_train_all, epochs=30, verbose=0)
            y_pred = clf.predict(X_test_sub.astype(np.float32))
        
        acc = accuracy_score(y_test_sub, y_pred)
        kappa = cohen_kappa_score(y_test_sub, y_pred)
        
        results.append({
            'subject': test_subject,
            'accuracy': acc,
            'kappa': kappa
        })
        print(f"  Accuracy: {acc:.4f}, Kappa: {kappa:.4f}")
    
    # Aggregate
    accs = [r['accuracy'] for r in results]
    kappas = [r['kappa'] for r in results]
    
    return {
        'results': results,
        'mean_accuracy': np.mean(accs),
        'std_accuracy': np.std(accs),
        'mean_kappa': np.mean(kappas),
        'std_kappa': np.std(kappas)
    }

print("✅ LOSO CV function defined")

In [None]:
# Step 10.2: Run LOSO Cross-Validation
# Uncomment to run (takes several minutes)
# subjects = ['A01', 'A02', 'A03', 'A04', 'A05', 'A06', 'A07', 'A08', 'A09']
# loso_results = run_loso_cv(DATA_DIR, subjects, use_csp_lda=True)
# print(f"\n=== LOSO Results (CSP + LDA) ===")
# print(f"Mean Accuracy: {loso_results['mean_accuracy']:.4f} ± {loso_results['std_accuracy']:.4f}")
# print(f"Mean Kappa: {loso_results['mean_kappa']:.4f} ± {loso_results['std_kappa']:.4f}")

## Step 11: Results Export & Sharing System

A comprehensive results export system for:
1. **AI Assistant Sharing**: Copy-paste ready markdown format
2. **Experiment Tracking**: Log and compare experiments
3. **Drive Export**: Save to Google Drive for persistence

In [None]:
# Step 11.1: ResultsExporter Class
from datetime import datetime
import json
import os
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, accuracy_score, cohen_kappa_score

class ResultsExporter:
    """
    Export results in shareable formats for AI assistant collaboration.
    
    Features:
    - Markdown format for copy-paste sharing
    - JSON format for programmatic parsing
    - CSV export for Drive storage
    - Confusion matrix visualization
    
    Usage:
        exporter = ResultsExporter('phase3_experiment_1')
        exporter.log_config({'csp_components': 6, 'classifier': 'lda'})
        exporter.log_metric('accuracy', 0.85, subset='test')
        exporter.log_confusion_matrix(y_true, y_pred, ['Left', 'Right', 'Feet', 'Tongue'])
        
        # Share with AI
        exporter.print_share_ready()
        
        # Save to Drive
        exporter.save_to_drive('/content/drive/MyDrive/llm-eeg/experiments')
    """
    
    def __init__(self, experiment_name: str):
        self.experiment_name = experiment_name
        self.timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        self.date_str = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        self.config = {}
        self.metrics = {'train': {}, 'val': {}, 'test': {}}
        self.confusion_matrices = {}
        self.observations = []
        
    def log_config(self, config: dict):
        """Log experiment configuration."""
        self.config.update(config)
        
    def log_metric(self, name: str, value: float, subset: str = 'test'):
        """Log a metric value for train/val/test subset."""
        if subset not in self.metrics:
            self.metrics[subset] = {}
        self.metrics[subset][name] = value
        
    def log_confusion_matrix(self, y_true, y_pred, class_names: list):
        """Log confusion matrix."""
        cm = confusion_matrix(y_true, y_pred)
        self.confusion_matrices['test'] = {
            'matrix': cm.tolist(),
            'class_names': class_names
        }
        
    def add_observation(self, observation: str):
        """Add an observation note."""
        self.observations.append(observation)
        
    def get_metric(self, name: str, subset: str = 'test'):
        """Get a logged metric value."""
        return self.metrics.get(subset, {}).get(name, None)
    
    def to_markdown(self) -> str:
        """
        Generate markdown format for AI assistant sharing.
        
        Returns:
            Markdown string ready for copy-paste
        """
        md = []
        md.append(f"## Experiment: {self.experiment_name}")
        md.append(f"**Date:** {self.date_str}")
        
        # Config
        if self.config:
            config_str = ', '.join([f"{k}={v}" for k, v in self.config.items()])
            md.append(f"**Config:** {config_str}")
        
        md.append("")
        md.append("### Results")
        
        # Results table
        if any(self.metrics.values()):
            md.append("| Metric | Train | Val | Test |")
            md.append("|--------|-------|-----|------|")
            
            all_metrics = set()
            for subset_metrics in self.metrics.values():
                all_metrics.update(subset_metrics.keys())
            
            for metric in sorted(all_metrics):
                train_val = self.metrics['train'].get(metric, '-')
                val_val = self.metrics['val'].get(metric, '-')
                test_val = self.metrics['test'].get(metric, '-')
                
                if isinstance(train_val, float):
                    train_val = f"{train_val:.4f}"
                if isinstance(val_val, float):
                    val_val = f"{val_val:.4f}"
                if isinstance(test_val, float):
                    test_val = f"{test_val:.4f}"
                    
                md.append(f"| {metric} | {train_val} | {val_val} | {test_val} |")
        
        # Confusion Matrix
        if 'test' in self.confusion_matrices:
            cm_data = self.confusion_matrices['test']
            cm = cm_data['matrix']
            names = cm_data['class_names']
            
            md.append("")
            md.append("### Confusion Matrix (Test)")
            header = "| | " + " | ".join(names) + " |"
            md.append(header)
            md.append("|--" + "|---" * len(names) + "|")
            
            for i, row in enumerate(cm):
                row_str = f"| **{names[i]}** | " + " | ".join(map(str, row)) + " |"
                md.append(row_str)
        
        # Observations
        if self.observations:
            md.append("")
            md.append("### Observations")
            for obs in self.observations:
                md.append(f"- {obs}")
        
        return "\n".join(md)
    
    def to_json(self) -> str:
        """Generate JSON format for programmatic parsing."""
        data = {
            'experiment_name': self.experiment_name,
            'timestamp': self.timestamp,
            'date': self.date_str,
            'config': self.config,
            'metrics': self.metrics,
            'confusion_matrices': self.confusion_matrices,
            'observations': self.observations
        }
        return json.dumps(data, indent=2)
    
    def to_csv(self, path: str):
        """Save metrics to CSV."""
        rows = []
        for subset, metrics in self.metrics.items():
            for metric, value in metrics.items():
                rows.append({
                    'experiment': self.experiment_name,
                    'subset': subset,
                    'metric': metric,
                    'value': value
                })
        df = pd.DataFrame(rows)
        df.to_csv(path, index=False)
        print(f"Saved to {path}")
    
    def save_to_drive(self, drive_path: str):
        """Save all results to Google Drive."""
        exp_dir = os.path.join(drive_path, f"exp_{self.timestamp}")
        os.makedirs(exp_dir, exist_ok=True)
        
        # Save JSON
        with open(os.path.join(exp_dir, 'results.json'), 'w') as f:
            f.write(self.to_json())
        
        # Save CSV
        self.to_csv(os.path.join(exp_dir, 'metrics.csv'))
        
        # Save Markdown
        with open(os.path.join(exp_dir, 'summary.md'), 'w') as f:
            f.write(self.to_markdown())
        
        print(f"\n✅ Results saved to: {exp_dir}")
        return exp_dir
    
    def print_share_ready(self):
        """Print copy-paste ready format for AI assistant."""
        print("=" * 60)
        print("📋 COPY THIS TO SHARE WITH AI ASSISTANT:")
        print("=" * 60)
        print(self.to_markdown())
        print("=" * 60)


# Demo usage
print("✅ ResultsExporter class defined")
print("\nExample usage:")
print('  exporter = ResultsExporter("phase3_csp_lda")')
print('  exporter.log_config({"csp_components": 6})')
print('  exporter.log_metric("accuracy", 0.85)')
print('  exporter.print_share_ready()')

In [None]:
# Step 11.2: ExperimentTracker Class
class ExperimentTracker:
    """
    Track and compare multiple experiments.
    
    Usage:
        tracker = ExperimentTracker('/content/drive/MyDrive/llm-eeg/experiments')
        exp_id = tracker.new_experiment('csp_lda_test', {'csp': 6, 'clf': 'lda'})
        tracker.log_results(exp_id, {'accuracy': 0.85, 'kappa': 0.80})
        tracker.compare_experiments(['exp1', 'exp2'])
    """
    
    def __init__(self, save_dir: str = '/content/drive/MyDrive/llm-eeg/experiments'):
        self.save_dir = save_dir
        self.experiments = {}
        self._load_existing()
        
    def _load_existing(self):
        """Load existing experiment log if available."""
        log_path = os.path.join(self.save_dir, 'experiment_log.json')
        if os.path.exists(log_path):
            with open(log_path, 'r') as f:
                self.experiments = json.load(f)
    
    def _save_log(self):
        """Save experiment log."""
        os.makedirs(self.save_dir, exist_ok=True)
        log_path = os.path.join(self.save_dir, 'experiment_log.json')
        with open(log_path, 'w') as f:
            json.dump(self.experiments, f, indent=2)
    
    def new_experiment(self, name: str, config: dict) -> str:
        """
        Start a new experiment.
        
        Args:
            name: Experiment name
            config: Configuration dict
            
        Returns:
            experiment_id: Unique ID for this experiment
        """
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        exp_id = f"{name}_{timestamp}"
        
        self.experiments[exp_id] = {
            'name': name,
            'config': config,
            'timestamp': timestamp,
            'date': datetime.now().isoformat(),
            'results': {}
        }
        self._save_log()
        return exp_id
    
    def log_results(self, experiment_id: str, results: dict):
        """Log results for an experiment."""
        if experiment_id in self.experiments:
            self.experiments[experiment_id]['results'].update(results)
            self._save_log()
    
    def compare_experiments(self, experiment_ids: list = None) -> pd.DataFrame:
        """
        Compare multiple experiments.
        
        Args:
            experiment_ids: List of experiment IDs to compare. 
                          If None, compare all.
        
        Returns:
            DataFrame with comparison
        """
        if experiment_ids is None:
            experiment_ids = list(self.experiments.keys())
        
        rows = []
        for exp_id in experiment_ids:
            if exp_id in self.experiments:
                exp = self.experiments[exp_id]
                row = {
                    'experiment_id': exp_id,
                    'name': exp['name'],
                    'date': exp.get('date', 'N/A')
                }
                row.update(exp.get('config', {}))
                row.update(exp.get('results', {}))
                rows.append(row)
        
        df = pd.DataFrame(rows)
        return df
    
    def get_best_experiment(self, metric: str = 'accuracy') -> dict:
        """Find the best experiment by a metric."""
        best_id = None
        best_value = -float('inf')
        
        for exp_id, exp in self.experiments.items():
            value = exp.get('results', {}).get(metric, -float('inf'))
            if value > best_value:
                best_value = value
                best_id = exp_id
        
        if best_id:
            return {'id': best_id, 'value': best_value, **self.experiments[best_id]}
        return None
    
    def export_all(self) -> str:
        """Export all experiments as markdown."""
        md = ["# Experiment History\n"]
        
        for exp_id, exp in self.experiments.items():
            md.append(f"## {exp['name']}")
            md.append(f"- **ID:** {exp_id}")
            md.append(f"- **Date:** {exp.get('date', 'N/A')}")
            md.append(f"- **Config:** {exp.get('config', {})}")
            md.append(f"- **Results:** {exp.get('results', {})}")
            md.append("")
        
        return "\n".join(md)


print("✅ ExperimentTracker class defined")

In [None]:
# Step 11.3: Export Current Results with ResultsExporter
from sklearn.metrics import cohen_kappa_score

# Create exporter for this experiment
exporter = ResultsExporter('phase3_subject_A01')

# Log configuration
exporter.log_config({
    'subject': 'A01',
    'csp_components': 6,
    'classifiers': 'LDA, SVM, EEGNet',
    'preprocessing': 'Notch(50Hz) + Bandpass(8-30Hz) + ZScore'
})

# Log metrics for best model (CSP + LDA as example)
exporter.log_metric('accuracy', acc_lda, 'test')
exporter.log_metric('kappa', cohen_kappa_score(y_test, y_pred_lda), 'test')

# Log confusion matrix
exporter.log_confusion_matrix(y_test, y_pred_lda, ['Left', 'Right', 'Feet', 'Tongue'])

# Add observations
best_model = 'CSP+LDA' if acc_lda >= max(acc_svm, acc_eegnet) else ('CSP+SVM' if acc_svm >= acc_eegnet else 'EEGNet')
exporter.add_observation(f"Best model: {best_model}")
exporter.add_observation(f"Target (85%): {'Achieved ✅' if max(acc_lda, acc_svm, acc_eegnet) >= 0.85 else 'Not yet ❌'}")

# Print shareable format
exporter.print_share_ready()

In [None]:
# Step 11.4: Model Comparison Helper
def compare_models(results_dict: dict) -> pd.DataFrame:
    """
    Compare multiple models and generate shareable table.
    
    Args:
        results_dict: Dict of {model_name: {'y_pred': array, 'accuracy': float, ...}}
        
    Returns:
        Comparison DataFrame
    """
    comparison = []
    for name, data in results_dict.items():
        row = {'Model': name}
        row.update({k: v for k, v in data.items() if k not in ['y_pred']})
        comparison.append(row)
    
    df = pd.DataFrame(comparison)
    df = df.sort_values('accuracy', ascending=False)
    
    # Print markdown format
    print("\n📊 MODEL COMPARISON (Copy-paste ready):\n")
    print(df.to_markdown(index=False))
    
    return df

# Compare all models
all_results = {
    'CSP + LDA': {
        'accuracy': acc_lda,
        'kappa': cohen_kappa_score(y_test, y_pred_lda),
        'train_time': 'Fast'
    },
    'CSP + SVM': {
        'accuracy': acc_svm,
        'kappa': cohen_kappa_score(y_test, y_pred_svm),
        'train_time': 'Fast'
    },
    'EEGNet': {
        'accuracy': acc_eegnet,
        'kappa': cohen_kappa_score(y_test, y_pred_eegnet),
        'train_time': 'Medium'
    }
}

comparison_df = compare_models(all_results)

## Step 12: Quick Reference & Summary

### Results Summary

Print a comprehensive summary to share with AI assistant for analysis.

In [None]:
# Step 12.1: Generate Summary for AI Assistant
def generate_summary():
    """Generate shareable summary."""
    summary = f"""
# Phase 3 Results Summary

## Dataset
- Subject: A01
- Trials: {X.shape[0]} ({X_train.shape[0]} train, {X_test.shape[0]} test)
- Channels: {X.shape[1]}
- Samples: {X.shape[2]} (4s @ 250Hz)
- Classes: 4 (Left Hand, Right Hand, Feet, Tongue)

## Feature Extraction
- CSP: {X_train_csp.shape[1]} features ({csp._n_components} components)
- Band Power: 3 bands (Mu, Beta-Low, Beta-High)

## Classification Results
| Model | Accuracy | Kappa |
|-------|----------|-------|
| CSP + LDA | {acc_lda:.4f} | {cohen_kappa_score(y_test, y_pred_lda):.4f} |
| CSP + SVM | {acc_svm:.4f} | {cohen_kappa_score(y_test, y_pred_svm):.4f} |
| EEGNet | {acc_eegnet:.4f} | {cohen_kappa_score(y_test, y_pred_eegnet):.4f} |

## Observations
- Best performing model: {'CSP+LDA' if acc_lda > max(acc_svm, acc_eegnet) else 'CSP+SVM' if acc_svm > acc_eegnet else 'EEGNet'}
- Target (85%): {'Achieved ✅' if max(acc_lda, acc_svm, acc_eegnet) >= 0.85 else 'Not yet achieved ❌'}
"""
    return summary

print(generate_summary())

In [None]:
# Step 12.2: Quick Reference Card
print("""
╔══════════════════════════════════════════════════════════════╗
║                    QUICK REFERENCE CARD                       ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  📊 VIEW RESULTS:                                            ║
║     exporter.print_share_ready()                             ║
║                                                              ║
║  📋 COPY FOR AI ASSISTANT:                                   ║
║     print(exporter.to_markdown())                            ║
║                                                              ║
║  💾 SAVE TO DRIVE:                                           ║
║     exporter.save_to_drive('/content/drive/MyDrive/llm-eeg') ║
║                                                              ║
║  🔄 COMPARE MODELS:                                          ║
║     compare_models({'lda': lda_res, 'eegnet': eegnet_res})   ║
║                                                              ║
║  📈 EXPERIMENT TRACKING:                                     ║
║     tracker = ExperimentTracker(save_dir)                    ║
║     tracker.compare_experiments(['exp1', 'exp2'])            ║
║                                                              ║
║  🎯 FEATURE EXTRACTION:                                      ║
║     csp = create_csp_extractor(n_components=6)               ║
║     X_csp = csp.fit_extract(X_train, y_train)                ║
║                                                              ║
║  🤖 CLASSIFICATION:                                          ║
║     lda = create_lda_classifier(n_classes=4)                 ║
║     lda.fit(X_csp, y_train)                                  ║
║     predictions = lda.predict(X_test_csp)                    ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

📁 GOOGLE DRIVE FOLDER STRUCTURE:
─────────────────────────────────
Google Drive/
└── MyDrive/
    └── llm-eeg/
        ├── data/
        │   └── BCI_Competition_IV_2a/   # Dataset
        ├── experiments/
        │   ├── experiment_log.json      # All experiments
        │   └── exp_YYYYMMDD_HHMMSS/     # Per experiment
        │       ├── results.json
        │       ├── metrics.csv
        │       └── summary.md
        ├── models/                       # Saved models
        │   ├── csp_model.pkl
        │   └── lda_model.pkl
        └── reports/
            └── comparison_report.md
""")

## Phase 3 Complete!

### Summary

You have successfully completed Phase 3: Feature Extraction & Classification:

1. **CSP Feature Extraction**: 6 components, spatial patterns visualization
2. **Band Power Features**: Mu, Beta frequency bands
3. **Feature Pipeline**: Modular multi-extractor pipeline
4. **LDA Classification**: Linear discriminant analysis on CSP features
5. **SVM Classification**: RBF kernel SVM for non-linear boundaries
6. **EEGNet**: End-to-end deep learning classifier
7. **Model Comparison**: Accuracy and Kappa metrics
8. **Cross-Validation**: LOSO setup for subject-independent evaluation
9. **Results Export**: JSON export for AI assistant sharing

### Next Steps: Phase 4 - Agent System

- Adaptive Preprocessing Agent (APA)
- Decision Validation Agent (DVA)
- Q-learning policy optimization
- Cross-trial learning

---

### Quick Reference

```python
# CSP Feature Extraction
csp = create_csp_extractor(n_components=6, sampling_rate=250)
X_csp = csp.fit_extract(X_train, y_train)
X_test_csp = csp.extract(X_test)

# LDA Classification
lda = create_lda_classifier(n_classes=4)
lda.fit(X_csp, y_train)
predictions = lda.predict(X_test_csp)

# EEGNet
eegnet = create_eegnet_classifier(n_classes=4, n_channels=22, n_samples=1000)
eegnet.fit(X_train, y_train, epochs=50)
predictions = eegnet.predict(X_test)
```