# Classifier Model with PSD Alpha Features from EEG Data

**Description**:\
Develop a prediction model with the training dataset obtained with the PSD Method from MNE, on this approach the intention is selecting only the features from the Alpha and Theta frequency bands for all the channels and windows. The objective is to evaluate different classifier models and measure the results to compare them like the work done before.

For this section, all the dataset required are stored in the Training Dataset directory, which are mainly the PSD Feature Extraction results. For futher understanding, you can take a look at the notebook `Feature Extraction PSD.ipynb` in Feature Extraction the `Feature Extraction` directory.

**Author**: Elmo Chavez\
**Date**: November 20, 2023

## Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import sys

path_eeg_mne = os.path.abspath(os.path.join(os.path.dirname('eeg_mne.py'), '..'))
sys.path.append(path_eeg_mne)
import eeg_mne

## Read the Dataset

In [2]:
path_training = '../Training Dataset/'

# Participants Dataset preselected in Feature Extraction step 1
file_participants_selected = 'Participants_Selected.csv'
df_participants_selected = pd.read_csv(path_training+file_participants_selected)

# PSD features with only FP1 channel
file_psd_features_all = 'PSD_Features-All_Channels.csv'
df_features_all = pd.read_csv(path_training+file_psd_features_all)

# PSD features with only FP1 channel
file_psd_features_fp1 = 'PSD_Features-FP1_Channel.csv'
df_features_fp1 = pd.read_csv(path_training+file_psd_features_fp1)

## Exploratory Data Analysis

**Brief Summary about the Participants Selected**

In [3]:
df_participants_selected

Unnamed: 0,participant_id,Gender,Age,Group,MMSE,time_max,points,sfreq,flag
0,sub-001,0,57,0,16,599.798,299900,500.0,True
1,sub-002,0,78,0,22,793.098,396550,500.0,True
2,sub-003,1,70,0,14,306.098,153050,500.0,False
3,sub-004,0,67,0,20,706.098,353050,500.0,True
4,sub-005,1,70,0,22,804.098,402050,500.0,True
...,...,...,...,...,...,...,...,...,...
83,sub-084,0,71,1,24,652.098,326050,500.0,True
84,sub-085,1,64,1,26,560.058,280030,500.0,True
85,sub-086,1,49,1,26,578.798,289400,500.0,True
86,sub-087,1,73,1,24,602.758,301380,500.0,True


Remove Participants not flagged:\
    - Participants with Healthy Control\
    - Participants with maximum recorded time less than 540 seconds\
    - Balancing classes to 22 samples for each group (Alzheimer Disease and Frototemporal Dementia)

In [4]:
df_participants_selected = df_participants_selected[df_participants_selected['flag']==True].reset_index(drop=True)
df_participants_selected.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44 entries, 0 to 43
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   participant_id  44 non-null     object 
 1   Gender          44 non-null     int64  
 2   Age             44 non-null     int64  
 3   Group           44 non-null     int64  
 4   MMSE            44 non-null     int64  
 5   time_max        44 non-null     float64
 6   points          44 non-null     int64  
 7   sfreq           44 non-null     float64
 8   flag            44 non-null     bool   
dtypes: bool(1), float64(2), int64(5), object(1)
memory usage: 2.9+ KB


In [5]:
df_participants_selected.groupby('Group')['participant_id'].count()

Group
0    22
1    22
Name: participant_id, dtype: int64

**Features Extracted using PSD Method from MNE for All the Channels**

In [6]:
eeg_mne.Dataset_Features_Summary(df_features_all)

Total Features: 6274
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 19 -> ['Cz', 'T5', 'F7', 'F4', 'T6', 'O1', 'F8', 'O2', 'C3', 'Fz', 'Fp2', 'T4', 'P3', 'Pz', 'T3', 'P4', 'F3', 'Fp1', 'C4']
Frequency Bands: 5 -> ['delta', 'theta', 'beta', 'alpha', 'gamma']
Features: 6 -> ['average power', 'total power', 'relative power', 'std dev', 'spectral entropy', 'peak to peak']


**Features Extracted using PSD Method from MNE for the _FP1_ Channel**

In [7]:
eeg_mne.Dataset_Features_Summary(df_features_fp1)

Total Features: 334
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 1 -> ['Fp1']
Frequency Bands: 5 -> ['delta', 'theta', 'beta', 'alpha', 'gamma']
Features: 6 -> ['average power', 'total power', 'relative power', 'std dev', 'spectral entropy', 'peak to peak']


## Preselect only the Alpha and Theta Freq. Bands Features

In [8]:
freq_bands_filtered = ['alpha']
features_filtered = []

In [9]:
df_features_all = eeg_mne.filter_Features(df_features_all, freq_bands_filtered, features_filtered)
eeg_mne.Dataset_Features_Summary(df_features_all)

Number of Features Returned: 1258
Total Features: 1258
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 19 -> ['Cz', 'T5', 'F7', 'F4', 'T6', 'O1', 'F8', 'O2', 'C3', 'Fz', 'Fp2', 'T4', 'P3', 'Pz', 'T3', 'P4', 'F3', 'Fp1', 'C4']
Frequency Bands: 1 -> ['alpha']
Features: 6 -> ['average power', 'total power', 'relative power', 'std dev', 'spectral entropy', 'peak to peak']


In [10]:
df_features_fp1 = eeg_mne.filter_Features(df_features_fp1, freq_bands_filtered, features_filtered)
eeg_mne.Dataset_Features_Summary(df_features_fp1)

Number of Features Returned: 70
Total Features: 70
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 1 -> ['Fp1']
Frequency Bands: 1 -> ['alpha']
Features: 6 -> ['average power', 'total power', 'relative power', 'std dev', 'spectral entropy', 'peak to peak']


## Predictions with Cross-Validation

### All Channels

In [11]:
df_results_cv_allch = eeg_mne.eeg_classifier_cv(df=df_features_all, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='All')
df_results_cv_allch.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,All,Support Vector,KFold,anova,0.547222,0.48498,0.545
1,PSD,All,Support Vector,KFold,mutual_info_classif,0.452778,0.371084,0.535
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,All,Support Vector,StratifiedKFold,anova,0.452778,0.401084,0.465
4,PSD,All,Support Vector,StratifiedKFold,mutual_info_classif,0.408333,0.34421,0.44
5,PSD,All,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,All,Support Vector,StratifiedShuffleSplit,anova,0.444444,0.327622,0.44
7,PSD,All,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.488889,0.377622,0.505
8,PSD,All,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,All,Random Forest,KFold,anova,0.433333,0.338205,0.515


Show the Top 20 results

In [12]:
df_results_cv_allch.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
21,PSD,All,XGBoost,StratifiedKFold,anova,0.680556,0.67368,0.68
24,PSD,All,XGBoost,StratifiedShuffleSplit,anova,0.644444,0.632944,0.65
40,PSD,All,AdaBoost,StratifiedKFold,mutual_info_classif,0.566667,0.529315,0.565
19,PSD,All,XGBoost,KFold,mutual_info_classif,0.525,0.515815,0.556667
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
0,PSD,All,Support Vector,KFold,anova,0.547222,0.48498,0.545
14,PSD,All,Random Forest,StratifiedKFold,chi2,0.544444,0.509784,0.545
38,PSD,All,AdaBoost,KFold,chi2,0.527778,0.50987,0.535
1,PSD,All,Support Vector,KFold,mutual_info_classif,0.452778,0.371084,0.535
39,PSD,All,AdaBoost,StratifiedKFold,anova,0.522222,0.441192,0.53


### FP1 Channel

In [13]:
df_results_cv_fp1 = eeg_mne.eeg_classifier_cv(df=df_features_fp1, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='Fp1')
df_results_cv_fp1.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,Fp1,Support Vector,KFold,anova,0.461111,0.369744,0.55
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.502778,0.46368,0.57
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,Fp1,Support Vector,StratifiedKFold,anova,0.480556,0.358881,0.525
4,PSD,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.594444,0.556322,0.61
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,Fp1,Support Vector,StratifiedShuffleSplit,anova,0.444444,0.307692,0.5
7,PSD,Fp1,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.488889,0.432772,0.505
8,PSD,Fp1,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,Fp1,Random Forest,KFold,anova,0.547222,0.527143,0.586667


In [14]:
df_results_cv_fp1.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
4,PSD,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.594444,0.556322,0.61
9,PSD,Fp1,Random Forest,KFold,anova,0.547222,0.527143,0.586667
18,PSD,Fp1,XGBoost,KFold,anova,0.566667,0.552222,0.58
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.502778,0.46368,0.57
17,PSD,Fp1,Random Forest,StratifiedShuffleSplit,chi2,0.577778,0.536984,0.57
15,PSD,Fp1,Random Forest,StratifiedShuffleSplit,anova,0.555556,0.552222,0.565
0,PSD,Fp1,Support Vector,KFold,anova,0.461111,0.369744,0.55
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
43,PSD,Fp1,AdaBoost,StratifiedShuffleSplit,mutual_info_classif,0.533333,0.514351,0.525


Best performance from each Classifier for the two approaches (All Channels and FP1 Channel)

In [17]:
df_results_cv = pd.concat([df_results_cv_allch, df_results_cv_fp1], ignore_index=True)
df_results_cv['feature_extraction'] = 'PSD - Alpha Features'
df_results_cv_sorted = df_results_cv.sort_values('AUC',ascending=False)
df_results_cv_sorted.groupby(['channels']).head(5)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
21,PSD - Alpha Features,All,XGBoost,StratifiedKFold,anova,0.680556,0.67368,0.68
24,PSD - Alpha Features,All,XGBoost,StratifiedShuffleSplit,anova,0.644444,0.632944,0.65
49,PSD - Alpha Features,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.594444,0.556322,0.61
54,PSD - Alpha Features,Fp1,Random Forest,KFold,anova,0.547222,0.527143,0.586667
63,PSD - Alpha Features,Fp1,XGBoost,KFold,anova,0.566667,0.552222,0.58
46,PSD - Alpha Features,Fp1,Support Vector,KFold,mutual_info_classif,0.502778,0.46368,0.57
62,PSD - Alpha Features,Fp1,Random Forest,StratifiedShuffleSplit,chi2,0.577778,0.536984,0.57
40,PSD - Alpha Features,All,AdaBoost,StratifiedKFold,mutual_info_classif,0.566667,0.529315,0.565
19,PSD - Alpha Features,All,XGBoost,KFold,mutual_info_classif,0.525,0.515815,0.556667
2,PSD - Alpha Features,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55


## Save Results

In [19]:
file_results_cv = 'Results PSD Alpha - Cross-Validation.csv'
df_results_cv.to_csv(path_training+file_results_cv)