# Tutorial 6: NWB Integration & Real-World Data

**Level**: Intermediate  
**Time**: 35-45 minutes  
**Prerequisites**: Tutorial 1, Tutorial 2

## Overview

In this tutorial, you'll learn how to:

1. **Work with NWB Format** - Neurodata Without Borders standard
2. **Load Real Datasets** - Import existing NWB files
3. **Export to NWB** - Save your neurOS data
4. **BIDS Compliance** - Brain Imaging Data Structure conventions
5. **Metadata Management** - Organize experimental data
6. **Share & Publish** - Prepare data for repositories

## Key Concepts

- **NWB (Neurodata Without Borders)**: Community standard for neurophysiology
- **BIDS**: Standardized folder structure
- **Metadata**: Experimental details, subject info, equipment
- **Data Provenance**: Tracking processing history
- **Interoperability**: Working with other tools

---

## Section 1: Understanding the NWB Format

NWB is built on HDF5 and provides a standardized way to store neurophysiology data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import tempfile
import os

# neurOS imports
from neuros.drivers import MockDriver
from neuros.models import SimpleClassifier
from neuros.pipeline import Pipeline

# NWB support (optional dependency)
try:
    from pynwb import NWBFile, NWBHDF5IO
    from pynwb.ecephys import ElectricalSeries, LFP
    from pynwb.behavior import BehavioralTimeSeries
    from pynwb.device import Device
    from pynwb.file import Subject
    PYNWB_AVAILABLE = True
except ImportError:
    PYNWB_AVAILABLE = False
    print("⚠ pynwb not installed. Install with: pip install pynwb")
    print("  This tutorial will show structure without actual NWB I/O.")

print(f"NWB Support: {'✓ Available' if PYNWB_AVAILABLE else '✗ Not installed'}")

### NWB File Structure

An NWB file contains:

```
NWB File
├── General Metadata
│   ├── Session info (experimenter, lab, institution)
│   ├── Subject info (species, age, sex)
│   └── Devices (hardware used)
├── Acquisition
│   ├── Raw data (ElectricalSeries)
│   └── Behavioral data
├── Processing
│   ├── Filtered data
│   ├── LFP (Local Field Potential)
│   └── Spike trains
└── Analysis
    ├── Processed results
    └── Model predictions
```

---

## Section 2: Creating an NWB File from Scratch

Let's create a complete NWB file with mock EEG data.

In [None]:
if PYNWB_AVAILABLE:
    def create_example_nwb_file(filepath: str) -> None:
        """
        Create a complete example NWB file.
        """
        # Create NWB file with metadata
        nwbfile = NWBFile(
            session_description='Motor imagery BCI session with 4-class classification',
            identifier='neuros_tutorial_session_001',
            session_start_time=datetime.now(),
            experimenter='Dr. Neural Researcher',
            lab='BCI Lab',
            institution='University of NeuroScience',
            experiment_description='Motor imagery classification using EEG',
            session_id='session_001',
            keywords=['BCI', 'motor imagery', 'EEG', 'classification']
        )
        
        # Add subject information
        nwbfile.subject = Subject(
            subject_id='P001',
            age='25',
            sex='M',
            species='Homo sapiens',
            description='Healthy volunteer'
        )
        
        # Add device information
        device = nwbfile.create_device(
            name='ActiChamp',
            description='BrainProducts ActiChamp 64-channel EEG',
            manufacturer='Brain Products GmbH'
        )
        
        # Create electrode group
        electrode_group = nwbfile.create_electrode_group(
            name='EEG_electrodes',
            description='64-channel 10-20 system',
            location='scalp',
            device=device
        )
        
        # Add electrode locations
        n_channels = 64
        for i in range(n_channels):
            nwbfile.add_electrode(
                id=i,
                x=0.0,  # In practice, use real coordinates
                y=0.0,
                z=0.0,
                imp=float('nan'),  # Impedance
                location=f'Channel_{i+1}',
                filtering='none',
                group=electrode_group
            )
        
        # Create electrode table region
        electrode_table_region = nwbfile.create_electrode_table_region(
            region=list(range(n_channels)),
            description='all electrodes'
        )
        
        # Generate mock EEG data
        sampling_rate = 250.0  # Hz
        duration = 10.0  # seconds
        n_samples = int(sampling_rate * duration)
        
        # Simulate 4 trials (one per class)
        data = np.random.randn(n_samples, n_channels) * 50  # microvolts
        
        # Add raw data as ElectricalSeries
        raw_eeg = ElectricalSeries(
            name='raw_eeg',
            data=data,
            electrodes=electrode_table_region,
            starting_time=0.0,
            rate=sampling_rate,
            description='Raw EEG data',
            comments='Acquired during motor imagery task',
            conversion=1e-6  # Convert to Volts
        )
        
        nwbfile.add_acquisition(raw_eeg)
        
        # Add trial information
        nwbfile.add_trial_column('trial_type', 'motor imagery class')
        nwbfile.add_trial_column('accuracy', 'classification accuracy')
        
        trial_types = ['left_hand', 'right_hand', 'feet', 'tongue']
        trial_duration = duration / 4
        
        for i, trial_type in enumerate(trial_types):
            nwbfile.add_trial(
                start_time=i * trial_duration,
                stop_time=(i + 1) * trial_duration,
                trial_type=trial_type,
                accuracy=0.75 + np.random.rand() * 0.2  # Mock accuracy
            )
        
        # Save to file
        with NWBHDF5IO(filepath, 'w') as io:
            io.write(nwbfile)
        
        print(f"✓ NWB file created: {filepath}")
        print(f"  Channels: {n_channels}")
        print(f"  Samples: {n_samples}")
        print(f"  Duration: {duration}s")
        print(f"  Trials: {len(trial_types)}")
    
    # Create example file
    temp_dir = tempfile.mkdtemp()
    nwb_filepath = os.path.join(temp_dir, 'example_session.nwb')
    create_example_nwb_file(nwb_filepath)
    
else:
    print("\nExample NWB file structure (conceptual):")
    print("""   
    NWBFile(
        session_description='Motor imagery BCI session',
        experimenter='Dr. Researcher',
        devices=['ActiChamp 64-ch EEG'],
        electrodes=[Ch1, Ch2, ..., Ch64],
        acquisition={'raw_eeg': ElectricalSeries},
        trials=[Trial1, Trial2, Trial3, Trial4]
    )
    """)

---

## Section 3: Reading and Exploring NWB Files

Load and inspect an existing NWB file.

In [None]:
if PYNWB_AVAILABLE:
    def explore_nwb_file(filepath: str) -> None:
        """
        Explore the contents of an NWB file.
        """
        with NWBHDF5IO(filepath, 'r') as io:
            nwbfile = io.read()
            
            print("NWB File Contents:")
            print("="*60)
            
            # Session info
            print("\n📋 Session Information:")
            print(f"  Description: {nwbfile.session_description}")
            print(f"  Start time: {nwbfile.session_start_time}")
            print(f"  Experimenter: {nwbfile.experimenter}")
            print(f"  Lab: {nwbfile.lab}")
            print(f"  Institution: {nwbfile.institution}")
            
            # Subject info
            if nwbfile.subject:
                print("\n👤 Subject Information:")
                print(f"  ID: {nwbfile.subject.subject_id}")
                print(f"  Age: {nwbfile.subject.age}")
                print(f"  Sex: {nwbfile.subject.sex}")
                print(f"  Species: {nwbfile.subject.species}")
            
            # Devices
            print("\n🔧 Devices:")
            for device_name, device in nwbfile.devices.items():
                print(f"  {device_name}: {device.description}")
            
            # Electrodes
            print(f"\n📡 Electrodes: {len(nwbfile.electrodes)} channels")
            
            # Acquisition
            print("\n📊 Acquisition Data:")
            for name, data in nwbfile.acquisition.items():
                if hasattr(data, 'data'):
                    shape = data.data.shape
                    rate = data.rate if hasattr(data, 'rate') else 'N/A'
                    print(f"  {name}: shape={shape}, rate={rate} Hz")
            
            # Trials
            if nwbfile.trials:
                print(f"\n🎯 Trials: {len(nwbfile.trials)} trials")
                print(f"  Columns: {list(nwbfile.trials.colnames)}")
    
    # Explore the file we just created
    explore_nwb_file(nwb_filepath)
    
else:
    print("Install pynwb to explore NWB files: pip install pynwb")

### Extract and Visualize Data

In [None]:
if PYNWB_AVAILABLE:
    with NWBHDF5IO(nwb_filepath, 'r') as io:
        nwbfile = io.read()
        
        # Extract EEG data
        raw_eeg = nwbfile.acquisition['raw_eeg']
        data = raw_eeg.data[:]
        sampling_rate = raw_eeg.rate
        
        # Time axis
        time = np.arange(data.shape[0]) / sampling_rate
        
        # Extract trial information
        trials_df = nwbfile.trials.to_dataframe()
    
    # Visualize
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))
    
    # Plot first 4 channels
    for i in range(4):
        ax1.plot(time, data[:, i] + i*100, linewidth=0.5, label=f'Ch {i+1}')
    
    # Mark trial boundaries
    colors = ['red', 'green', 'blue', 'orange']
    for idx, trial in trials_df.iterrows():
        ax1.axvline(trial['start_time'], color=colors[idx], linestyle='--', 
                    alpha=0.5, linewidth=2)
        ax1.text(trial['start_time'], 350, trial['trial_type'], 
                rotation=90, fontsize=9)
    
    ax1.set_xlabel('Time (s)')
    ax1.set_ylabel('Channels (offset for visibility)')
    ax1.set_title('Raw EEG Data with Trial Markers')
    ax1.legend(loc='upper right')
    ax1.grid(True, alpha=0.3)
    
    # Plot trial accuracies
    trial_types = trials_df['trial_type'].values
    accuracies = trials_df['accuracy'].values
    
    bars = ax2.bar(range(len(trial_types)), accuracies, 
                   color=colors[:len(trial_types)])
    ax2.set_xticks(range(len(trial_types)))
    ax2.set_xticklabels(trial_types)
    ax2.set_ylabel('Classification Accuracy')
    ax2.set_title('Per-Trial Classification Accuracy')
    ax2.set_ylim([0, 1])
    ax2.grid(True, alpha=0.3, axis='y')
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.1%}',
                ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nData Statistics:")
    print(f"  Shape: {data.shape}")
    print(f"  Sampling rate: {sampling_rate} Hz")
    print(f"  Duration: {len(time)/sampling_rate:.1f}s")
    print(f"  Mean accuracy: {accuracies.mean():.1%}")
    
else:
    print("Visualization requires pynwb installation")

---

## Section 4: Integrating neurOS with NWB

Use NWB data directly in neurOS pipelines.

In [None]:
class NWBDataLoader:
    """
    Load NWB data for use in neurOS pipelines.
    """
    
    def __init__(self, nwb_filepath: str):
        self.filepath = nwb_filepath
        self.nwbfile = None
    
    def load_eeg_data(self, series_name: str = 'raw_eeg') -> dict:
        """
        Load EEG data from NWB file.
        
        Returns
        -------
        dict
            Dictionary with 'data', 'sampling_rate', 'time', 'metadata'
        """
        if not PYNWB_AVAILABLE:
            raise ImportError("pynwb is required")
        
        with NWBHDF5IO(self.filepath, 'r') as io:
            nwbfile = io.read()
            
            # Extract EEG series
            eeg_series = nwbfile.acquisition[series_name]
            data = eeg_series.data[:]
            sampling_rate = eeg_series.rate
            
            # Create time axis
            time = np.arange(data.shape[0]) / sampling_rate
            
            # Extract metadata
            metadata = {
                'session_description': nwbfile.session_description,
                'experimenter': nwbfile.experimenter,
                'session_start_time': str(nwbfile.session_start_time),
                'n_channels': data.shape[1],
                'n_samples': data.shape[0],
                'duration_s': time[-1]
            }
            
            # Extract trials if available
            trials = None
            if nwbfile.trials:
                trials = nwbfile.trials.to_dataframe()
        
        return {
            'data': data,
            'sampling_rate': sampling_rate,
            'time': time,
            'metadata': metadata,
            'trials': trials
        }
    
    def prepare_for_pipeline(self, window_size: int = 250) -> tuple:
        """
        Prepare NWB data for neurOS pipeline.
        
        Returns
        -------
        tuple
            (X, y, metadata) ready for training/prediction
        """
        data_dict = self.load_eeg_data()
        data = data_dict['data']
        trials = data_dict['trials']
        sampling_rate = data_dict['sampling_rate']
        
        # Extract windows based on trials
        X_windows = []
        y_labels = []
        
        if trials is not None:
            for idx, trial in trials.iterrows():
                start_idx = int(trial['start_time'] * sampling_rate)
                end_idx = int(trial['stop_time'] * sampling_rate)
                
                # Extract window
                window = data[start_idx:end_idx, :]
                
                # Flatten for simple classifier
                features = window.mean(axis=0)  # Simple: mean across time
                
                X_windows.append(features)
                y_labels.append(idx)  # Use trial index as label
        
        X = np.array(X_windows)
        y = np.array(y_labels)
        
        return X, y, data_dict['metadata']

if PYNWB_AVAILABLE:
    # Load NWB data
    loader = NWBDataLoader(nwb_filepath)
    X, y, metadata = loader.prepare_for_pipeline()
    
    print("Prepared for neurOS Pipeline:")
    print(f"  Feature matrix shape: {X.shape}")
    print(f"  Labels shape: {y.shape}")
    print(f"  Unique classes: {np.unique(y)}")
    print(f"\nMetadata: {metadata}")
    
else:
    # Use mock data if NWB not available
    X = np.random.randn(4, 64)
    y = np.array([0, 1, 2, 3])
    print("Using mock data (pynwb not installed)")

### Train Pipeline on NWB Data

In [None]:
# Create pipeline
driver = MockDriver(n_channels=X.shape[1], sampling_rate=250)
model = SimpleClassifier(model_type='logistic')

pipeline = Pipeline(driver=driver, model=model)

# Train on NWB data (with cross-validation for small dataset)
from sklearn.model_selection import cross_val_score

if len(X) >= 3:  # Need at least 3 samples
    scores = cross_val_score(model, X, y, cv=min(3, len(X)))
    print(f"\nCross-validation accuracy: {scores.mean():.2%} (+/- {scores.std():.2%})")
else:
    model.train(X, y)
    print("\nModel trained on full dataset (too small for CV)")

print("\n✓ neurOS pipeline successfully trained on NWB data!")

---

## Section 5: Exporting neurOS Results to NWB

Save your processing pipeline results in NWB format.

In [None]:
if PYNWB_AVAILABLE:
    def export_results_to_nwb(
        original_nwb_path: str,
        output_path: str,
        predictions: np.ndarray,
        model_name: str = 'SimpleClassifier'
    ) -> None:
        """
        Export neurOS processing results back to NWB format.
        """
        # Load original file
        with NWBHDF5IO(original_nwb_path, 'r') as io:
            nwbfile_in = io.read()
            
            # Create new file with original metadata
            nwbfile_out = NWBFile(
                session_description=nwbfile_in.session_description + ' [Processed]',
                identifier=nwbfile_in.identifier + '_processed',
                session_start_time=nwbfile_in.session_start_time,
                experimenter=nwbfile_in.experimenter,
                lab=nwbfile_in.lab,
                institution=nwbfile_in.institution
            )
            
            # Copy subject info
            if nwbfile_in.subject:
                nwbfile_out.subject = Subject(
                    subject_id=nwbfile_in.subject.subject_id,
                    age=nwbfile_in.subject.age,
                    sex=nwbfile_in.subject.sex,
                    species=nwbfile_in.subject.species
                )
            
            # Create processing module
            processing_module = nwbfile_out.create_processing_module(
                name='neuros_analysis',
                description=f'neurOS pipeline analysis with {model_name}'
            )
            
            # Add predictions as BehavioralTimeSeries
            pred_series = BehavioralTimeSeries(
                name='model_predictions',
                description=f'Predictions from {model_name}',
                data=predictions.reshape(-1, 1),
                timestamps=np.arange(len(predictions)),
                unit='class_label'
            )
            
            processing_module.add(pred_series)
        
        # Write output file
        with NWBHDF5IO(output_path, 'w') as io:
            io.write(nwbfile_out)
        
        print(f"✓ Results exported to: {output_path}")
    
    # Get predictions
    predictions = model.predict(X)
    
    # Export
    output_nwb_path = os.path.join(temp_dir, 'processed_results.nwb')
    export_results_to_nwb(nwb_filepath, output_nwb_path, predictions)
    
else:
    print("NWB export requires pynwb installation")

---

## Section 6: BIDS-Compatible Organization

Organize your data following BIDS conventions.

### BIDS Directory Structure

```
my_bci_study/
├── dataset_description.json
├── participants.tsv
├── README
├── sub-01/
│   └── ses-001/
│       └── eeg/
│           ├── sub-01_ses-001_task-motorimagery_eeg.nwb
│           ├── sub-01_ses-001_task-motorimagery_events.tsv
│           └── sub-01_ses-001_task-motorimagery_channels.tsv
├── sub-02/
│   └── ses-001/
│       └── eeg/
│           └── ...
└── derivatives/
    └── neuros/
        ├── sub-01/
        │   └── ses-001/
        │       └── sub-01_ses-001_predictions.nwb
        └── sub-02/
            └── ...
```

In [None]:
import json

def create_bids_structure(base_dir: str) -> None:
    """
    Create a BIDS-compliant directory structure.
    """
    # Create directories
    os.makedirs(os.path.join(base_dir, 'sub-01', 'ses-001', 'eeg'), exist_ok=True)
    os.makedirs(os.path.join(base_dir, 'derivatives', 'neuros'), exist_ok=True)
    
    # Create dataset_description.json
    dataset_desc = {
        "Name": "Motor Imagery BCI Study",
        "BIDSVersion": "1.6.0",
        "DatasetType": "raw",
        "Authors": ["Dr. Neural Researcher"],
        "License": "CC0",
        "ReferencesAndLinks": [
            "https://github.com/your-user/neuros2"
        ]
    }
    
    with open(os.path.join(base_dir, 'dataset_description.json'), 'w') as f:
        json.dump(dataset_desc, f, indent=2)
    
    # Create participants.tsv
    participants_data = (
        "participant_id\tage\tsex\tgroup\n"
        "sub-01\t25\tM\tcontrol\n"
        "sub-02\t28\tF\tcontrol\n"
    )
    
    with open(os.path.join(base_dir, 'participants.tsv'), 'w') as f:
        f.write(participants_data)
    
    # Create README
    readme = """
# Motor Imagery BCI Study

This dataset contains EEG recordings during motor imagery tasks.

## Task Description

Participants performed motor imagery of:
- Left hand movement
- Right hand movement
- Feet movement
- Tongue movement

## Data Acquisition

- Device: BrainProducts ActiChamp
- Channels: 64 (10-20 system)
- Sampling rate: 250 Hz

## Analysis

Data were processed using neurOS (https://github.com/your-user/neuros2).
    """
    
    with open(os.path.join(base_dir, 'README'), 'w') as f:
        f.write(readme.strip())
    
    print(f"✓ BIDS structure created at: {base_dir}")
    print("\nStructure:")
    for root, dirs, files in os.walk(base_dir):
        level = root.replace(base_dir, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f"{indent}{os.path.basename(root)}/")
        subindent = ' ' * 2 * (level + 1)
        for file in files:
            print(f"{subindent}{file}")

# Create BIDS structure
bids_dir = os.path.join(temp_dir, 'bids_dataset')
create_bids_structure(bids_dir)

---

## Summary

In this tutorial, you learned:

✅ **NWB Format** - Structure and benefits  
✅ **Create NWB Files** - From scratch with metadata  
✅ **Read NWB Files** - Load and explore existing data  
✅ **neurOS Integration** - Use NWB data in pipelines  
✅ **Export Results** - Save processing outputs to NWB  
✅ **BIDS Compliance** - Organize datasets properly  

## Key Benefits of NWB

1. **Standardization** - Common format across labs
2. **Metadata** - Rich experimental context
3. **Interoperability** - Works with many tools
4. **Versioning** - Track data provenance
5. **Sharing** - Easy publication to repositories

## Best Practices

- **Document everything** - Rich metadata is key
- **Use standard names** - Follow NWB conventions
- **Version control** - Track changes to data
- **Test early** - Validate NWB files frequently
- **Share openly** - Use platforms like DANDI Archive

## Resources

- **NWB Homepage**: https://www.nwb.org/
- **PyNWB Docs**: https://pynwb.readthedocs.io/
- **BIDS Specification**: https://bids.neuroimaging.io/
- **DANDI Archive**: https://dandiarchive.org/

## Next Steps

- Load real NWB datasets from DANDI
- Integrate with other tools (MNE-Python, FieldTrip)
- Publish your datasets
- Contribute to NWB extensions

---

**Questions or feedback?** Open an issue on GitHub or check the docs at https://neuros.readthedocs.io

**🎉 Congratulations!** You've completed all 6 neurOS tutorials!