# Sleep Spindle Detection
## Data Preparation

This notebook contains the preprocessing steps required for a sleep study dataset focusing on sleep spindles. The labels for sleep spindles are processed first, with the aim of identifying periods where sleep spindles are marked as present (`1`) or non-present (`0`). Various preprocessing steps, which are included in the `prepare_data()` function, can be applied to the data before training the model. Epochs are created around the marked events, each lasting for 2.5 seconds to match the reported duration of sleep spindles in the literature.


In [None]:
import mne  
import pandas as pd  
import numpy as np
import utils
import feature_extraction
from scipy.signal import detrend


#### Data Preparation Function

The `prepare_data` function is responsible for preparing EEG data by loading raw EEG recordings, applying preprocessing steps, and extracting epochs based on event markers. This process usually includes filtering to remove noise and artifacts, normalizing the data, and aligning the data with event markers.


In [1]:
def prepare_labels_events(labels_df, labels=None):
    if labels:
        labels_df = labels_df[['Timestamp','Epochs'] + labels]
        labels_events = np.zeros(len(labels_df), dtype=int)
        labels_events[labels_df[labels[0]] == 1] = 0  # Assign 0 when labels[0] is 1
        labels_events[labels_df[labels[1]] == 1] = 1  # Assign 1 when labels[1] is 1
        time_events = labels_df['Timestamp'].values
        events = np.column_stack((time_events, np.full(len(time_events), 1, dtype=int), labels_events))
        return {'Labels': labels_events, 'Events': events}
    else:
        raise ValueError("Choose a label to include in the analysis")

def prepare_data(filepath_raw, filepath_labels, labels=None):
    # Loading raw EEG data and creating Raw object
    raw, raw_data, labels_df = utils.load_and_preprocess_data(filepath_raw, filepath_labels)

    #Optional feature engineering

    # raw._data = detrend(raw.get_data())
    # raw._data = utils.vdm_raw(raw)
    # raw = raw.notch_filter(50)
    # raw.filter(11, 15)
    # raw = raw.copy().crop(tmin=start_crop, tmax=end_crop)
    # labels_df = process_eeg_events(labels_df, start_crop, end_crop, raw.info['sfreq'])

    raw_data = raw.get_data()
    labels_events, events = prepare_labels_events(labels_df, labels)

    #Creating epochs around the events
    epochs = mne.Epochs(raw,
                        events, 
                        tmin=-0.5, tmax=2,
                        baseline=None, preload=True)
    
    epochs = utils.normalize_epochs(epochs)
        
    return {'Epochs': epochs, 'Labels': labels_events}


#### Combining Data

The epochs data is combined to the features extracted in the previous section. We need to combine them respecting the input model format (n_samples, n_features, n_timestamps)

In [None]:
def processed_data(raw_filepaths, label_filepaths, labels, fmin, fmax):
    all_epochs = []
    all_labels = []

    # Loop through each file path, prepare data, and collect epochs and labels
    for raw_filepath, label_filepath in zip(raw_filepaths, label_filepaths):
        data = utils.prepare_data(raw_filepath, label_filepath, labels)
        all_epochs.append(data['Epochs'])
        all_labels.append(data['Labels'])

    # Combine epochs and labels from all datasets
    combined_epochs = mne.concatenate_epochs(all_epochs)
    combined_labels = np.concatenate(all_labels)

    # Extract features for all combined epochs
    X = feature_extraction.get_raw_feature_all(all_epochs, fmin, fmax)

    return X, combined_labels