### Automated Detection and Extraction of Seizure Events from Expert EEG Annotation
s
This code is useful for preparing structured seizure event datasets from raw annotation files, facilitating downstream EEG analysis, such as model training or clinical review. The approach ensures seizure timing is accurately mapped to EEG samples and maintains flexibility to incorporate annotations from multiple experts. The code processes EEG seizure annotation data, primarily focusing on detecting seizure segments from expert-labeled datasets (specifically "CSA" expert annotations). It uses **Pandas** for data manipulation and reads annotation CSV files containing per-second seizure indicators for multiple patients.

#### Key Functionalities:

1. **Data Loading and Preparation:**

   * Imports necessary libraries for data handling (`pandas`, `numpy`), EEG processing (`mne`), and visualization (`matplotlib`).
   * Reads seizure annotation CSV files, replacing missing values with zeros.
   * The data is organized as columns representing individual patients with per-second seizure labels (binary or character-coded).

2. **Seizure Detection Functions (`find_seizures` and `find_seizures_d`):**

   * These functions scan through the annotation arrays to detect continuous seizure periods based on the presence of seizure labels (either binary `1` or character `'d'`).
   * For each detected seizure segment, the code computes:

     * Duration in seconds,
     * Start and end times (in seconds and samples, assuming a sampling rate of 256 Hz),
     * The corresponding patient and expert source.
   * The functions output seizure segments as a DataFrame for further analysis or saving.

3. **Data Processing Pipeline:**

   * Iterates over provided data arrays (here only CSA expert data is processed).
   * Aggregates seizure segments from all patients into a single DataFrame.
   * Saves the detected seizure chunks into CSV files for external use or later analysis.
   * Displays the final seizure summary DataFrames.



In [None]:
import pandas as pd
import os
import h5py
import mne
import pandas as pd
import numpy as np
import glob
from pathlib import Path
import matplotlib.pyplot as plt
import re
from scipy import signal
import scipy.io
from mne import Epochs
from mne.preprocessing import ICA  # Import ICA here

## Updated AnnotationsPreprocessing

In [None]:
csa = pd.read_csv("/Users/geletawsahle/Desktop/mlflow_eeg/scripts/CSA.csv")
# data = [Exp_A_data, Exp_A_data, Exp_A_data]
csa = csa.fillna(0.0)
csa.head(2)
csa.head(2)

In [None]:
data = [csa]
csa

In [14]:
def find_seizures(expert, data):
    """
    A function to find the seizure chunks from the annotations of expert A, B, C as well as CSA
    """
    seizures = pd.DataFrame(columns=["expert", "infant", "seizure_duration", "from_sec", "to_sec", "from_sample", "to_sample"])
    
    for patient in data.columns:  # Loop through each patient column
        patient_data = data[patient]
        # Convert patient_data to numeric and ensure it's a pandas Series
        patient_data = pd.to_numeric(pd.Series(patient_data), errors='coerce')
        seizure_start = None
        
        for i in range(len(patient_data)):
            if patient_data[i] == 1 and seizure_start is None:
                seizure_start = i
            elif patient_data[i] == 0 and seizure_start is not None:
                seizure_end = i - 1
                duration = seizure_end - seizure_start + 1
                
                new_seizure = pd.DataFrame({
                    "expert": [expert],
                    "infant": [patient],
                    "seizure_duration": [duration],
                    "from_sec": [seizure_start],  # Start from raw 0
                    "to_sec": [seizure_end],
                    "from_sample": [(seizure_start * 256)],   # Start from sample 0
                    "to_sample": [((seizure_end + 1) * 256)]  # +1 to include the last full second
                })
                
                seizures = pd.concat([seizures, new_seizure], ignore_index=True)
                seizure_start = None
        
        # Check if seizure continues to the end of the recording
        if seizure_start is not None:
            seizure_end = len(patient_data) - 1
            duration = seizure_end - seizure_start + 1
            
            new_seizure = pd.DataFrame({
                "expert": [expert],
                "infant": [patient],
                "seizure_duration": [duration],
                "from_sec": [seizure_start],
                "to_sec": [seizure_end],
                "from_sample": [(seizure_start * 256)],
                "to_sample": [((seizure_end + 1) * 256)]
            })
            
            seizures = pd.concat([seizures, new_seizure], ignore_index=True)
    
    return seizures

In [None]:
# Loop through the data assuming zero-based indexing (already is)
for i in range(0, len(data)):
    if i == 0:
        expert = "CSA"
        seizures_df_expert_CSA = find_seizures(expert, data[i])
    # Uncomment and update the following if needed for other experts
    # elif i == 1:
    #     expert = "A"
    #     seizures_df_expert_A = find_seizures(expert, data[i])
    # elif i == 2:
    #     expert = "B"
    #     seizures_df_expert_B = find_seizures(expert, data[i])
    # elif i == 3:
    #     expert = "C"
    #     seizures_df_expert_C = find_seizures(expert, data[i])

# Collect the seizure DataFrames into a list
dfs = [seizures_df_expert_CSA]  # Add others here if uncommented

# Concatenate the DataFrames vertically into one final DataFrame
seizures_df = pd.concat(dfs, axis=0, ignore_index=True)

# Save to CSV
seizures_df.to_csv("csa_seizures_chunks_zerosec_starts.csv", index=False)

# Display the final DataFrame
seizures_df

In [None]:
updated = pd.read_csv("csa_seizures_chunks_zerosec_starts.csv")
updated

## Annotation differences

In [None]:
csad = pd.read_csv("/Users/geletawsahle/Desktop/mlflow_eeg/scripts/xd.csv")
csad.head(2)

In [None]:
csad = csad.fillna(0.0)
csad

In [15]:
def find_seizures_d(expert, data):
    """
    A function to find the seizure chunks from the annotations of expert A, B, C as well as CSA.
    Zero-based indexing: 0 sec corresponds to the first second.
    """
    seizures = pd.DataFrame(columns=['Expert', 'Id', 'seizure_duration', 'from_sec', 'to_sec', 'from_sample', 'to_sample', 'status'])
    
    for patient in data.columns:  # Loop through each patient column
        patient_data = data[patient]
        patient_data = pd.Series(patient_data).astype(str)
        seizure_start = None
        
        for i in range(len(patient_data)):
            if patient_data[i].strip() == 'd' and seizure_start is None:
                seizure_start = i
            elif patient_data[i].strip() != 'd' and seizure_start is not None:
                seizure_end = i - 1
                duration = seizure_end - seizure_start + 1
                
                new_seizure = pd.DataFrame({
                    "Expert": [expert],
                    "Id": [patient],
                    "status": ['d'],
                    "seizure_duration": [duration],
                    "from_sec": [seizure_start],             # Now starts from 0
                    "to_sec": [seizure_end],                 # Ends at current second
                    "from_sample": [seizure_start * 256],    # Sample starts at 0
                    "to_sample": [(seizure_end + 1) * 256]    # Include full duration
                })
                
                seizures = pd.concat([seizures, new_seizure], ignore_index=True)
                seizure_start = None
        
        # If seizure continues to the end
        if seizure_start is not None:
            seizure_end = len(patient_data) - 1
            duration = seizure_end - seizure_start + 1
            
            new_seizure = pd.DataFrame({
                "Expert": [expert],
                "Id": [patient],
                "status": ['d'],
                "seizure_duration": [duration],
                "from_sec": [seizure_start],
                "to_sec": [seizure_end],
                "from_sample": [seizure_start * 256],
                "to_sample": [(seizure_end + 1) * 256]
            })
            
            seizures = pd.concat([seizures, new_seizure], ignore_index=True)
    
    return seizures


In [16]:
data = [csad]

In [None]:
# Process data using updated 0-based logic
for i in range(len(data)):
    if i == 0:
        expert = "CSA"
        seizures_df_expert_CSA = find_seizures_d(expert, data[i])

# Combine DataFrames
dfs = [seizures_df_expert_CSA]
seizures_dff = pd.concat(dfs, axis=0, ignore_index=True)

# Save the updated dataset to CSV
seizures_dff.to_csv("csa_seizures_chunks_with_d.csv", index=False)

# Display the final DataFrame
seizures_dff