
# Sleep Spindle Study

## Building Model

In this notebook, we build a model to detect the presence of sleep spindles in EEG data. This is a critical step in sleep analysis as sleep spindles are characteristic features of certain sleep stages.
        


## Imports

We will import the necessary libraries that are needed for processing the data, building the model, and evaluating its performance.
        

In [1]:

import mne
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.models import Sequential
from keras.callbacks import EarlyStopping
from sklearn.model_selection import KFold
import json
import utils
import feature_extraction
import data_preparation
        

2023-12-23 18:41:23.592153: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-23 18:41:23.638597: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-23 18:41:23.638643: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-23 18:41:23.639814: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-23 18:41:23.647032: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-23 18:41:23.647892: I tensorflow/core/platform/cpu_feature_guard.cc:1

### Download data

Using the `processed_data` function from the previous step to download our concatenated raw with its correspondent preprocessing and features.

In [2]:
X, labels = data_preparation.processed_data(["../dataset/train_S002_night1_hackathon_raw.mat",
                                            "../dataset/train_S003_night5_hackathon_raw.mat"],
                                            ["../dataset/train_S002_labeled.csv",
                                            "../dataset/train_S003_labeled.csv"],
                                            labels=["SS0", "SS1"], fmin=11, fmax=15)
        

Creating RawArray with float64 data, n_channels=1, n_times=4965399
    Range : 0 ... 4965398 =      0.000 ... 19861.592 secs
Ready.
['SS0', 'SS1']
Not setting metadata
1191 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 1191 events and 626 original time points ...
0 bad epochs dropped


  data = epochs.get_data()


Creating RawArray with float64 data, n_channels=1, n_times=5772730
    Range : 0 ... 5772729 =      0.000 ... 23090.916 secs
Ready.
['SS0', 'SS1']
Not setting metadata
1050 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 1050 events and 626 original time points ...
0 bad epochs dropped


  data = epochs.get_data()


Not setting metadata
2241 matching events found
No baseline correction applied


  for epoch in epochs.get_data():
  for epoch in epochs.get_data():
  eeg_data = epochs.get_data()
  raw_data = epochs.get_data()
  for epoch in epochs.get_data():
  for epoch in epochs.get_data():
  eeg_data = epochs.get_data()
  raw_data = epochs.get_data()



#### Model

The chosen model is an LSTM, since we are dealing with timeframes, LSTM are known to deal well with time depending samples. A k-cross validation is implemented, partitioning the data into 5 parts and alterning between the 4 parts for training and the 1 for testing.
        

In [3]:
fold_no = 1
kfold = KFold(n_splits=5)
for train, test in kfold.split(X):
    # Define the model architecture
    model = Sequential()
    model.add(LSTM(50, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
    model.add(LSTM(50, return_sequences=True))
    model.add(Dropout(0.4))
    model.add(LSTM(20, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(20))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='sigmoid'))

    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    # Generate class weights for current split
    class_weights = compute_class_weight('balanced', classes=np.unique(labels), y=labels)
    class_weight_dict = dict(enumerate(class_weights))

    # Early stopping to prevent overfitting
    early_stopping = EarlyStopping(monitor='loss', patience=5, verbose=1, restore_best_weights=True)
    
    # Fit data to model
    history = model.fit(X[train], labels[train], epochs=50, class_weight=class_weight_dict, callbacks=[early_stopping])

    perf_metrics = utils.evaluate_model(model, X[test], labels[test])
    utils.save_model(model, history, perf_metrics, fold_no)
    # Increase fold number for next split
    fold_no = fold_no + 1

2023-12-23 18:42:21.440217: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 179486720 exceeds 10% of free system memory.


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
 1/56 [..............................] - ETA: 7s - loss: 0.0790 - accuracy: 0.9688

KeyboardInterrupt: 

### Visualize plots and metrics

Determining the performance of the model

Plot accuracies and loss for training and validation

In [None]:
# filename = "SS_bp4_35Pre_0Features_LSTM_"
# filename = "SS_0Pre_0Features_LSTM_"
# filename = "SS_detrend_Pre_0Features_LSTM_"
# filename = "SS_bp11_15Pre_0Features_LSTM_"
# filename = "SS_VDM1_3Pre_0Features_LSTM_"
import os
print(os.listdir("./ressources/models/metrics"))
filenames = [
    "SS_bp4_35Pre_0Features_LSTM_",
    "SS_0Pre_0Features_LSTM_",
    "SS_detrend_Pre_0Features_LSTM_",
    "SS_bp11_15Pre_0Features_LSTM_",
    "SS_VDM1_3Pre_0Features_LSTM_"
]
for filename in filenames:
    utils.plot_fold_history(filename, 5)

Performance of each fold will be printed along with the average performance of the cross validation

In [None]:
performance = utils.print_performances("SS_bp4_35Pre_0Features_LSTM_", 1)
print(performance)