# Decoding Pipeline: Alpha Activity
This decoding pipeline focuses on classifying alpha band activity and includes several optimization techniques. To accommodate this, the notebook is divided into multiple parts.

Within each cell, a parameter is included to control the ICA condition, whether to apply it or leave it out. The following steps are carried out in the same order: first, Power Spectral Density (PSD) is used for feature extraction combined with shrinkage LDA (sLDA) for classification. Then, Common Spatial Patterns (CSP) is used for feature extraction, again followed by sLDA classification. Afterward, both PSD and CSP feature extraction methods are repeated, this time using BT-LDA for classification.

The code for this alpha decoding pipeline builds upon an original implementation by Radovan Vodila, who developed the CSP-based approach. Juliette van Lohuizen extended the pipeline by implementing the PSD-based method and incorporating ICA, and applying BT-LDA along with making several minor adjustments.

In [1]:
# -- GENERAL FUNCTIONS AND IMPORT --
import numpy as np
from scipy.signal import butter, sosfilt, hilbert
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from mne.time_frequency import psd_array_multitaper
from sklearn.covariance import LedoitWolf
from sklearn.metrics import accuracy_score
from mne.decoding import CSP
from os.path import join
import sys
import mne
import os
import pandas as pd
from scipy.signal import welch
from toeplitzlda.classification import ToeplitzLDA

mne.set_log_level('warning')

# Set directories
decoding_results_dir = '/Users/juliette/Desktop/thesis/results/alpha'

# Predefine functions for bandpass filtering and Hilbert transform (when using CSP)
def bandpass_filter(data, lowcut, highcut, fs, order=4):
    """
    Apply a bandpass filter to the data.
    """
    sos = butter(order, [lowcut, highcut], btype='band', fs=fs, output='sos')
    return sosfilt(sos, data, axis=-1)

def compute_average_hilbert_amplitude(data):
    """
    Compute log-mean amplitude using Hilbert transform.
    """
    analytic = hilbert(data, axis=2)
    amplitude = np.abs(analytic)
    mean_amplitude = amplitude.mean(axis=2)
    return np.log(mean_amplitude)

# Define subjects
subjects = [
    "VPpdia", "VPpdib", "VPpdic", "VPpdid", "VPpdie", "VPpdif", "VPpdig", "VPpdih",
    "VPpdii", "VPpdij", "VPpdik", "VPpdil", "VPpdim", "VPpdin", "VPpdio", "VPpdip",
    "VPpdiq", "VPpdir", "VPpdis", "VPpdit", "VPpdiu", "VPpdiv", "VPpdiw", "VPpdix",
    "VPpdiy", "VPpdiz", "VPpdiza", "VPpdizb", "VPpdizc"
    ]

# sLDA pipelines
First, the pipelines using sLDA are presented. PSD is used as an feature extraction method and then CSP. The pipeline for CSP is based on the code provided by Radovan Vodila. First sLDA is applied to both feature extraction methods, and then BT-LDA. By setting the parameter $ica$ to either $True$ or $False$, the application of ICA can be controlled.

## PSD + sLDA

In [10]:
# For alpha we are only interested in covert
task = "covert"
ica = False

# Frequency bin for PSD
min_bin = 8
max_bin = 12

# Initialize results storage
results = []

# Loop through subjects
for subject in subjects:
    print("Subject:", subject)
    
    if ica is True:
        file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing/alpha_ICA')
        file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha_ICA.npz")
    else:
        file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing')
        file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha.npz")

    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        continue
    
    # Load data
    npz_data = np.load(file_path)
    X = npz_data['X']  # EEG data: trials x channels x samples
    y = npz_data['y']  # Labels: trials
    fs = npz_data['fs']  # Sampling frequency
    fs = fs.flatten()[0]   # turn array to integer
    
    # Apply LDA     
    lda = LDA(solver="lsqr", covariance_estimator=LedoitWolf())

    # Cross-validation
    fold_accuracies = []
    n_folds = 4
    n_trials = X.shape[0] // n_folds
    folds = np.repeat(np.arange(n_folds), n_trials)
    
    X = bandpass_filter(X, 8, 12, fs=fs)
    X = X[:, :, 120:-120] # Remove edge artifacts

    for i_fold in range(n_folds):
        # Train-test split
        X_trn, y_trn = X[folds != i_fold], y[folds != i_fold]
        X_tst, y_tst = X[folds == i_fold], y[folds == i_fold]
        
        nyquist_freq = fs //2

        # Compute PSD using Welch's method
        psd_features_trn = np.array([
            welch(trial, fs=fs, nperseg=nyquist_freq, scaling='density')[1] # Set number of data points in each segment to the Nyquist frequency
            [:, (min_bin <= freqs) & (freqs <= max_bin)].mean(axis=1) # Selects only the frequencies between min_bin and max_bin and averages over all channels
            for trial, freqs in [(X_trn[i], welch(X_trn[i][0], fs=fs, nperseg=fs//2)[0]) 
            for i in range(X_trn.shape[0])] # For each trial, pair its EEG data with frequency bins computed from the first channel's Welch PSD) to prepare for bandpower analysis
        ])
        
        
        psd_features_tst = np.array([
            welch(trial, fs=fs, nperseg=nyquist_freq, scaling='density')[1] # Set number of data points in each segment to the Nyquist frequency
            [:, (min_bin <= freqs) & (freqs <= max_bin)].mean(axis=1) # Selects only the frequencies between min_bin and max_bin and averages over all channels
            for trial, freqs in [(X_tst[i], welch(X_tst[i][0], fs=fs, nperseg=fs//2)[0])
            for i in range(X_tst.shape[0])] # For each trial, pair its EEG data with frequency bins computed from the first channel's Welch PSD) to prepare for bandpower analysis
        ])

        # Train LDA
        lda.fit(psd_features_trn, y_trn)
        
        # Predict and compute accuracy
        y_pred = lda.predict(psd_features_tst)
        accuracy = accuracy_score(y_tst, y_pred)
        fold_accuracies.append(accuracy)

    # Compute subject-level results
    accuracy = np.round(np.mean(fold_accuracies), 2)
    se = np.round(np.std(fold_accuracies) / np.sqrt(n_folds), 2)
    results.append((subject, accuracy, se))
    print("Accuracy:", accuracy)

# Convert results to a structured numpy array
results_array = np.array(
    results, dtype=[('subject', 'U10'), ('accuracy', 'f4'), ('standard_error', 'f4')]
)

# Save results
if not os.path.exists(decoding_results_dir):
        os.makedirs(decoding_results_dir)
        
if ica is True:
    results_save_path = os.path.join(decoding_results_dir, f"{task}_alpha_PSD_ICA_results.npy")     
else:
    results_save_path = join(decoding_results_dir, f"{task}_alpha_PSD_results.npy")    
    
np.save(results_save_path, results_array)

# Overall results
overall_accuracy = np.round(results_array['accuracy'].mean(), 2)
overall_se = np.round(results_array['standard_error'].mean(), 2)
print(f"Overall LDA accuracy with PSD: {overall_accuracy:.2f} ± {overall_se:.2f}")

Subject: VPpdia
Accuracy: 0.66
Subject: VPpdib
Accuracy: 0.84
Subject: VPpdic
Accuracy: 0.79
Subject: VPpdid
Accuracy: 0.65
Subject: VPpdie
Accuracy: 0.76
Subject: VPpdif
Accuracy: 0.88
Subject: VPpdig
Accuracy: 0.84
Subject: VPpdih
Accuracy: 0.88
Subject: VPpdii
Accuracy: 0.79
Subject: VPpdij
Accuracy: 0.79
Subject: VPpdik
Accuracy: 0.66
Subject: VPpdil
Accuracy: 0.96
Subject: VPpdim
Accuracy: 0.95
Subject: VPpdin
Accuracy: 0.88
Subject: VPpdio
Accuracy: 0.82
Subject: VPpdip
Accuracy: 0.6
Subject: VPpdiq
Accuracy: 0.84
Subject: VPpdir
Accuracy: 0.78
Subject: VPpdis
Accuracy: 0.9
Subject: VPpdit
Accuracy: 1.0
Subject: VPpdiu
Accuracy: 0.5
Subject: VPpdiv
Accuracy: 0.95
Subject: VPpdiw
Accuracy: 0.89
Subject: VPpdix
Accuracy: 0.88
Subject: VPpdiy
Accuracy: 0.76
Subject: VPpdiz
Accuracy: 0.7
Subject: VPpdiza
Accuracy: 0.78
Subject: VPpdizb
Accuracy: 0.92
Subject: VPpdizc
Accuracy: 0.59
Overall LDA accuracy with PSD: 0.80 ± 0.04


## CSP + sLDA

In [2]:
# For alpha we are only interested in covert
task = "covert"
ica = False # Still have to run this one

# Number of CSP components for feature extraction
n_comp = 4

# Initialize results storage
results = []

# Loop through subjects
for subject in subjects:
    print(subject)
    
    if ica is True:
        file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing/alpha_ICA')
        file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha_ICA.npz")
    else:
        file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing')
        file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha.npz")
    
    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        continue
    
    # Load data
    npz_data = np.load(file_path)
    X = npz_data['X']  # EEG data: trials x channels x samples
    y = npz_data['y']  # Labels: trials
    fs = npz_data['fs']  # Sampling frequency
    fs = fs.flatten()[0]   # turn array to integer

    X = bandpass_filter(X, 8, 12, fs=fs)  # Bandpass filter for alpha band
    X = X[:, :, 120:-120]  # Remove edge artifacts
 
    # Initialize CSP and LDA
    csp = CSP(n_components=n_comp, reg=0.01, log=None, transform_into='csp_space')
    lda = LDA(solver="lsqr", covariance_estimator=LedoitWolf())

    # Cross-validation
    fold_accuracies = []
    n_folds = 4
    n_trials = X.shape[0] // n_folds
    folds = np.repeat(np.arange(n_folds), n_trials)

    for i_fold in range(n_folds):
        # Train-test split
        X_trn, y_trn = X[folds != i_fold], y[folds != i_fold]
        X_tst, y_tst = X[folds == i_fold], y[folds == i_fold]
        
        # CSP and LDA
        csp.fit(X_trn, y_trn)
        X_trn_csp = compute_average_hilbert_amplitude(csp.transform(X_trn))
        lda.fit(X_trn_csp, y_trn)
        X_tst_csp = compute_average_hilbert_amplitude(csp.transform(X_tst))
        
        # Predict and compute accuracy
        y_pred = lda.predict(X_tst_csp)
        accuracy = accuracy_score(y_tst, y_pred)
        fold_accuracies.append(accuracy)

    # Compute subject-level results
    accuracy = np.round(np.mean(fold_accuracies), 2)
    se = np.round(np.std(fold_accuracies) / np.sqrt(n_folds), 2)
    results.append((subject, accuracy, se))
    print("Accuracy:", accuracy)

# Convert results to a structured numpy array
results_array = np.array(
    results, dtype=[('subject', 'U10'), ('accuracy', 'f4'), ('standard_error', 'f4')]
)    

# Save results
if not os.path.exists(decoding_results_dir):
        os.makedirs(decoding_results_dir)
        
if ica is True:
    results_save_path = join(decoding_results_dir, f"{task}_alpha_{n_comp}-comp_CSP_ICA_results.npy")     
else:
    results_save_path = join(decoding_results_dir, f"{task}_alpha_{n_comp}-comp_CSP_results.npy")  
    
np.save(results_save_path, results_array)

# Overall results
overall_accuracy = np.round(results_array['accuracy'].mean(), 2)
overall_se = np.round(results_array['standard_error'].mean(), 2)
print(f"Overall LDA accuracy with CSP: {overall_accuracy:.2f} ± {overall_se:.2f}")

VPpdia
Accuracy: 0.64
VPpdib
Accuracy: 0.98
VPpdic
Accuracy: 0.92
VPpdid
Accuracy: 0.72
VPpdie
Accuracy: 0.6
VPpdif
Accuracy: 0.95
VPpdig
Accuracy: 0.89
VPpdih
Accuracy: 0.96
VPpdii
Accuracy: 1.0
VPpdij
Accuracy: 0.71
VPpdik
Accuracy: 0.57
VPpdil
Accuracy: 0.98
VPpdim
Accuracy: 1.0
VPpdin
Accuracy: 0.86
VPpdio
Accuracy: 0.88
VPpdip
Accuracy: 0.78
VPpdiq
Accuracy: 0.94
VPpdir
Accuracy: 0.88
VPpdis
Accuracy: 0.98
VPpdit
Accuracy: 1.0
VPpdiu
Accuracy: 0.56
VPpdiv
Accuracy: 0.99
VPpdiw
Accuracy: 0.86
VPpdix
Accuracy: 1.0
VPpdiy
Accuracy: 0.81
VPpdiz
Accuracy: 0.82
VPpdiza
Accuracy: 0.98
VPpdizb
Accuracy: 0.94
VPpdizc
Accuracy: 0.5
Overall LDA accuracy with CSP: 0.85 ± 0.03


# BT-LDA pipelines
Now the pipelines using BT-LDA are presented. They follow the same order but ICA is now always applied to both pipelines. The pipelines remain very similar, the only difference is the fitting of LDA, which is now BT-LDA instead of sLDA. However, this is simply a single line of code. Therefore, this could be streamlined for future work.

## PSD + BT-LDA

In [8]:
# Parameters
task = "covert"
min_bin = 8
max_bin = 12

# Initialize results storage
results = []

# Loop through subjects
for subject in subjects:
    print("Subject:", subject)
    file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing/alpha_ICA')
    file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha_ICA.npz")

    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        continue

    # Load data
    npz_data = np.load(file_path)
    X = npz_data['X']  # EEG data: trials x channels x samples
    y = npz_data['y']  # Labels: trials
    fs = npz_data['fs']  # Sampling frequency
    fs = fs.flatten()[0]   # turn array to integer
    n_channels = X.shape[1] 
    
    # Initialize Toeplitz LDA
    toeplitz = ToeplitzLDA(n_channels=n_channels)

    # Cross-validation
    fold_accuracies = []
    n_folds = 4
    n_trials = X.shape[0] // n_folds
    folds = np.repeat(np.arange(n_folds), n_trials)
    
    # Preprocess data
    X = bandpass_filter(X, 8, 12, fs=fs)
    X = X[:, :, 120:-120] # Remove edge artifacts

    for i_fold in range(n_folds):
        # Train-test split
        X_trn, y_trn = X[folds != i_fold], y[folds != i_fold]
        X_tst, y_tst = X[folds == i_fold], y[folds == i_fold]
        
        nyquist_freq = fs//2
        
        # Compute PSD using Welch's method
        psd_trn = np.array([
            welch(trial, fs=fs, nperseg=nyquist_freq, scaling='density')[1] # Set number of data points in each segment to the Nyquist frequency
            [:, (min_bin <= freqs) & (freqs <= max_bin)].mean(axis=1) # Selects only the frequencies between min_bin and max_bin and averages over all channels
            for trial, freqs in [(X_trn[i], welch(X_trn[i][0], fs=fs, nperseg=fs//2)[0]) 
            for i in range(X_trn.shape[0])] # For each trial, pair its EEG data with frequency bins computed from the first channel's Welch PSD) to prepare for bandpower analysis
        ])
        
        psd_tst = np.array([
            welch(trial, fs=fs, nperseg=nyquist_freq, scaling='density')[1] # Set number of data points in each segment to the Nyquist frequency
            [:, (min_bin <= freqs) & (freqs <= max_bin)].mean(axis=1) # Selects only the frequencies between min_bin and max_bin and averages over all channels
            for trial, freqs in [(X_tst[i], welch(X_tst[i][0], fs=fs, nperseg=fs//2)[0])
            for i in range(X_tst.shape[0])] # For each trial, pair its EEG data with frequency bins computed from the first channel's Welch PSD) to prepare for bandpower analysis
        ])
        

        # Fit Toeplitz LDA
        toeplitz.fit(psd_trn, y_trn)
        
        # Predict and compute accuracy
        y_pred = toeplitz.predict(psd_tst)
        accuracy = accuracy_score(y_tst, y_pred)
        fold_accuracies.append(accuracy)

    # Compute subject-level results
    accuracy = np.round(np.mean(fold_accuracies), 2)
    se = np.round(np.std(fold_accuracies) / np.sqrt(n_folds), 2)
    results.append((subject, accuracy, se))
    print("Accuracy:", accuracy)

# Convert results to a structured numpy array
results_array = np.array(
    results, dtype=[('subject', 'U10'), ('accuracy', 'f4'), ('standard_error', 'f4')]
)

# Save results
if not os.path.exists(decoding_results_dir):
        os.makedirs(decoding_results_dir)
results_save_path = join(decoding_results_dir, f"{task}_alpha_PSD_BLT_ICA_results.npy")     
np.save(results_save_path, results_array)

# Overall results
overall_accuracy = np.round(results_array['accuracy'].mean(), 2)
overall_se = np.round(results_array['standard_error'].mean(), 2)
print(f"Overall LDA accuracy with PSD: {overall_accuracy:.2f} ± {overall_se:.2f}")

Subject: VPpdia




Accuracy: 0.72
Subject: VPpdib




Accuracy: 0.85
Subject: VPpdic




Accuracy: 0.77
Subject: VPpdid




Accuracy: 0.8
Subject: VPpdie




Accuracy: 0.76
Subject: VPpdif




Accuracy: 0.9
Subject: VPpdig




Accuracy: 0.86
Subject: VPpdih




Accuracy: 0.91
Subject: VPpdii




Accuracy: 0.91
Subject: VPpdij




Accuracy: 0.74
Subject: VPpdik




Accuracy: 0.66
Subject: VPpdil




Accuracy: 0.96
Subject: VPpdim




Accuracy: 0.96
Subject: VPpdin




Accuracy: 0.84
Subject: VPpdio




Accuracy: 0.85
Subject: VPpdip




Accuracy: 0.56
Subject: VPpdiq




Accuracy: 0.88
Subject: VPpdir




Accuracy: 0.8
Subject: VPpdis




Accuracy: 0.92
Subject: VPpdit




Accuracy: 0.98
Subject: VPpdiu




Accuracy: 0.56
Subject: VPpdiv




Accuracy: 0.91
Subject: VPpdiw




Accuracy: 0.89
Subject: VPpdix




Accuracy: 0.94
Subject: VPpdiy




Accuracy: 0.75
Subject: VPpdiz




Accuracy: 0.72
Subject: VPpdiza




Accuracy: 0.81
Subject: VPpdizb




Accuracy: 0.88
Subject: VPpdizc




Accuracy: 0.77
Overall LDA accuracy with PSD: 0.82 ± 0.04




## CSP + BT-LDA

In [9]:
task = "covert"
decoding_results_dir = '/Users/juliette/Desktop/thesis/results/alpha/alpha_ICA'
# Number of CSP components for feature extraction
n_comp = 4

# Initialize results storage
results = []

# Loop through subjects
for subject in subjects:
    print("Subject:", subject)
    file_dir = os.path.join('/Users/juliette/Desktop/thesis/preprocessing/alpha_preprocessing/alpha_ICA')
    file_path = os.path.join(file_dir, f"sub-{subject}_task-{task}_alpha_ICA.npz")
    
    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        continue
    
    # Load data
    npz_data = np.load(file_path)
    X = npz_data['X']  # EEG data: trials x channels x samples
    y = npz_data['y']  # Labels: trials
    fs = npz_data['fs']  # Sampling frequency
    fs = fs.flatten()[0]   # turn array to integer

    X = bandpass_filter(X, 8, 12, fs=fs)  # Bandpass filter for alpha band
    X = X[:, :, 120:-120]  # Remove edge artifacts
 
    # Initialize CSP and Block Toeplitz LDA, try reg=0.01 needed if this throws rank-defficiancy error 
    csp = CSP(n_components=n_comp, reg=0.01, log=None, transform_into='csp_space')
    toeplitz = ToeplitzLDA(n_channels=n_comp)
    
    # Cross-validation
    fold_accuracies = []
    n_folds = 4
    n_trials = X.shape[0] // n_folds
    folds = np.repeat(np.arange(n_folds), n_trials)

    for i_fold in range(n_folds):
        # Train-test split
        X_trn, y_trn = X[folds != i_fold], y[folds != i_fold] 
        X_tst, y_tst = X[folds == i_fold], y[folds == i_fold]

        # CSP and BlockToeplitz LDA
        csp.fit(X_trn, y_trn)
        X_trn_csp = compute_average_hilbert_amplitude(csp.transform(X_trn))
        toeplitz.fit(X_trn_csp, y_trn)
        X_tst_csp = compute_average_hilbert_amplitude(csp.transform(X_tst))
        
        # Predict and compute accuracy
        y_pred = toeplitz.predict(X_tst_csp)
        accuracy = accuracy_score(y_tst, y_pred)
        fold_accuracies.append(accuracy)

    # Compute subject-level results
    accuracy = np.round(np.mean(fold_accuracies), 2)
    se = np.round(np.std(fold_accuracies) / np.sqrt(n_folds), 2)
    results.append((subject, accuracy, se))
    print("Accuracy:", accuracy)

# Convert results to a structured numpy array
results_array = np.array(
    results, dtype=[('subject', 'U10'), ('accuracy', 'f4'), ('standard_error', 'f4')]
)

# Save results
if not os.path.exists(decoding_results_dir):
        os.makedirs(decoding_results_dir)
results_save_path = join(decoding_results_dir, f"{task}_alpha_{n_comp}-comp_CSP_BLT_ICA_results.npy")     
np.save(results_save_path, results_array)


# Overall results
overall_accuracy = np.round(results_array['accuracy'].mean(), 2)
overall_se = np.round(results_array['standard_error'].mean(), 2)
print(f"Overall LDA accuracy with CSP: {overall_accuracy:.2f} ± {overall_se:.2f}")

Subject: VPpdia




Accuracy: 0.6
Subject: VPpdib




Accuracy: 0.99
Subject: VPpdic




Accuracy: 0.92
Subject: VPpdid




Accuracy: 0.75
Subject: VPpdie












Accuracy: 0.61
Subject: VPpdif




Accuracy: 0.95
Subject: VPpdig




Accuracy: 0.89
Subject: VPpdih




Accuracy: 0.95
Subject: VPpdii




Accuracy: 0.99
Subject: VPpdij








Accuracy: 0.68
Subject: VPpdik




Accuracy: 0.57
Subject: VPpdil




Accuracy: 0.98
Subject: VPpdim




Accuracy: 1.0
Subject: VPpdin




Accuracy: 0.84
Subject: VPpdio




Accuracy: 0.86
Subject: VPpdip




Accuracy: 0.82
Subject: VPpdiq




Accuracy: 0.92
Subject: VPpdir




Accuracy: 0.82
Subject: VPpdis




Accuracy: 0.98
Subject: VPpdit




Accuracy: 1.0
Subject: VPpdiu




Accuracy: 0.55
Subject: VPpdiv




Accuracy: 0.96
Subject: VPpdiw




Accuracy: 0.94
Subject: VPpdix




Accuracy: 1.0
Subject: VPpdiy




Accuracy: 0.85
Subject: VPpdiz




Accuracy: 0.86
Subject: VPpdiza




Accuracy: 0.96
Subject: VPpdizb




Accuracy: 0.94
Subject: VPpdizc












Accuracy: 0.56
Overall LDA accuracy with CSP: 0.85 ± 0.03


