# 01 - Loading Data for Neural Decoding

This notebook demonstrates how to load pre-processed neuroimaging data
for classification-based decoding analysis.

**Contents:**
1. Loading fMRI data (NIfTI)
2. Loading EEG data (MNE Epochs)
3. Loading behavioral data (CSV)
4. Multimodal data fusion

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
import pandas as pd
from pathlib import Path

# Neural decoding imports
from io.loaders import FMRILoader, EEGLoader, BehaviorLoader, MultimodalLoader
from core.dataset import DecodingDataset

## 1. Loading fMRI Data

For fMRI, you need:
- Pre-processed 4D NIfTI file (one volume per trial/timepoint)
- Brain mask (binary NIfTI)
- Events file (CSV) with trial labels

In [None]:
# Example: Load fMRI data
# (Replace paths with your actual data)

fmri_loader = FMRILoader()

# Paths to your data
data_path = "sub-01_task_bold.nii.gz"      # 4D NIfTI
mask_path = "brain_mask.nii.gz"            # Binary mask
events_path = "events.csv"                 # Trial events

# Load dataset
# fmri_dataset = fmri_loader.load(
#     data_path=data_path,
#     mask_path=mask_path,
#     events_path=events_path,
#     label_column="condition",  # Column with class labels
#     run_column="run",          # Column for CV grouping
#     standardize=True
# )

# print(f"Loaded fMRI data:")
# print(f"  Samples: {fmri_dataset.n_samples}")
# print(f"  Features (voxels): {fmri_dataset.n_features}")
# print(f"  Classes: {fmri_dataset.class_names}")

### Expected Events File Format

```csv
trial,onset,duration,condition,run
1,0.0,2.0,face,1
2,4.0,2.0,house,1
3,8.0,2.0,face,1
...
```

### ROI-Based Loading

For ROI-based analysis, use an atlas to extract mean signals per region.

In [None]:
# Example: Load fMRI with ROI features

# roi_dataset = fmri_loader.load_roi(
#     data_path="sub-01_task_bold.nii.gz",
#     atlas_path="harvard_oxford.nii.gz",  # Atlas with integer labels
#     events_path="events.csv",
#     label_column="condition",
#     run_column="run",
#     aggregation="mean"  # "mean", "median", or "std"
# )

# print(f"ROI features: {roi_dataset.n_features}")

## 2. Loading EEG Data

For EEG, load MNE Epochs files (.fif format).

In [None]:
# Example: Load EEG epochs

eeg_loader = EEGLoader()

# Load epochs file
# eeg_dataset = eeg_loader.load(
#     epochs_path="sub-01-epo.fif",
#     time_window=(0.1, 0.5),  # Focus on 100-500ms
#     channels=None,           # All channels (or list of names)
#     flatten=True             # Flatten channels x time to 1D
# )

# print(f"EEG data:")
# print(f"  Samples: {eeg_dataset.n_samples}")
# print(f"  Features: {eeg_dataset.n_features}")
# print(f"  Classes: {eeg_dataset.class_names}")

### Time-Resolved Loading

For temporal decoding, load multiple time windows.

In [None]:
# Example: Load for time-resolved analysis

# datasets_by_time = eeg_loader.load_time_resolved(
#     epochs_path="sub-01-epo.fif",
#     time_points=np.arange(-0.1, 0.6, 0.05),  # Every 50ms
#     window_size=0.05  # 50ms windows
# )

# print(f"Created {len(datasets_by_time)} time-point datasets")

## 3. Loading Behavioral Data

For behavioral-only decoding or to add behavioral features.

In [None]:
# Example: Load behavioral data

behavior_loader = BehaviorLoader()

# behavior_dataset = behavior_loader.load(
#     csv_path="behavior.csv",
#     feature_columns=["reaction_time", "accuracy", "confidence"],
#     label_column="condition",
#     group_column="subject",  # For CV grouping
#     standardize=True
# )

# print(f"Behavioral features: {behavior_dataset.feature_names}")

## 4. Multimodal Data Fusion

Combine features from multiple modalities.

In [None]:
# Example: Early fusion (feature concatenation)

multimodal_loader = MultimodalLoader()

# Combine fMRI, EEG, and behavior
# fused_dataset = multimodal_loader.early_fusion(
#     datasets=[fmri_dataset, eeg_dataset, behavior_dataset],
#     normalize=True  # Z-score each modality before fusion
# )

# print(f"Fused features: {fused_dataset.n_features}")
# print(f"Modalities: {fused_dataset.metadata['modalities']}")

## 5. Creating Synthetic Data for Testing

Generate synthetic data to test the pipeline.

In [None]:
# Create synthetic fMRI-like data
from sklearn.datasets import make_classification

# Generate classification data
X, y = make_classification(
    n_samples=100,
    n_features=1000,  # ~voxels
    n_informative=50,
    n_redundant=50,
    n_classes=2,
    random_state=42
)

# Create groups (runs)
groups = np.repeat(np.arange(1, 6), 20)  # 5 runs, 20 trials each

# Create DecodingDataset
synthetic_dataset = DecodingDataset(
    X=X,
    y=y,
    groups=groups,
    feature_names=[f"voxel_{i}" for i in range(1000)],
    class_names=["class_A", "class_B"],
    metadata={"synthetic": True},
    modality="fmri"
)

print(f"Synthetic dataset created:")
print(f"  Shape: {synthetic_dataset.X.shape}")
print(f"  Classes: {synthetic_dataset.class_names}")
print(f"  Groups: {np.unique(synthetic_dataset.groups)}")

## 6. Inspecting DecodingDataset

In [None]:
# Dataset properties
print(f"Number of samples: {synthetic_dataset.n_samples}")
print(f"Number of features: {synthetic_dataset.n_features}")
print(f"Number of classes: {synthetic_dataset.n_classes}")
print(f"Class counts: {synthetic_dataset.class_counts}")
print(f"Is balanced: {synthetic_dataset.is_balanced}")

In [None]:
# Get subset by class
class_a = synthetic_dataset.get_subset(class_labels=[0])
print(f"Class A samples: {class_a.n_samples}")

In [None]:
# Split by group
train_data, test_data = synthetic_dataset.split_by_group(test_groups=[5])
print(f"Train samples: {train_data.n_samples}")
print(f"Test samples: {test_data.n_samples}")

In [None]:
# Convert to sklearn format
X_sk, y_sk, groups_sk = synthetic_dataset.to_sklearn()
print(f"Sklearn arrays: X={X_sk.shape}, y={y_sk.shape}")

## Next Steps

- **02_extract_features.ipynb**: Feature extraction and selection
- **03_train_classifier.ipynb**: Training decoders
- **04_cross_validation.ipynb**: Cross-validation strategies