# Classifier Model with PSD Total Power Features from EEG Data

**Description**:\
Develop a prediction model with the training dataset obtained with the PSD Method from MNE, on this approach the intention is selecting only the Total Power for all the channels, windows and Frequency Bands. The objective is to evaluate different classifier models and measure the results to compare them like the work done before.

For this section, all the dataset required are stored in the Training Dataset directory, which are mainly the PSD Feature Extraction results. For futher understanding, you can take a look at the notebook `Feature Extraction PSD.ipynb` in Feature Extraction the `Feature Extraction` directory.

**Author**: Elmo Chavez\
**Date**: November 20, 2023

## Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import sys

path_eeg_mne = os.path.abspath(os.path.join(os.path.dirname('eeg_mne.py'), '..'))
sys.path.append(path_eeg_mne)
import eeg_mne

## Read the Dataset

In [2]:
path_training = '../Training Dataset/'

# Participants Dataset preselected in Feature Extraction step 1
file_participants_selected = 'Participants_Selected.csv'
df_participants_selected = pd.read_csv(path_training+file_participants_selected)

# PSD features with only FP1 channel
file_psd_features_all = 'PSD_Features-All_Channels.csv'
df_features_all = pd.read_csv(path_training+file_psd_features_all)

# PSD features with only FP1 channel
file_psd_features_fp1 = 'PSD_Features-FP1_Channel.csv'
df_features_fp1 = pd.read_csv(path_training+file_psd_features_fp1)

## Exploratory Data Analysis

**Brief Summary about the Participants Selected**

In [3]:
df_participants_selected

Unnamed: 0,participant_id,Gender,Age,Group,MMSE,time_max,points,sfreq,flag
0,sub-001,0,57,0,16,599.798,299900,500.0,True
1,sub-002,0,78,0,22,793.098,396550,500.0,True
2,sub-003,1,70,0,14,306.098,153050,500.0,False
3,sub-004,0,67,0,20,706.098,353050,500.0,True
4,sub-005,1,70,0,22,804.098,402050,500.0,True
...,...,...,...,...,...,...,...,...,...
83,sub-084,0,71,1,24,652.098,326050,500.0,True
84,sub-085,1,64,1,26,560.058,280030,500.0,True
85,sub-086,1,49,1,26,578.798,289400,500.0,True
86,sub-087,1,73,1,24,602.758,301380,500.0,True


Remove Participants not flagged:\
    - Participants with Healthy Control\
    - Participants with maximum recorded time less than 540 seconds\
    - Balancing classes to 22 samples for each group (Alzheimer Disease and Frototemporal Dementia)

In [4]:
df_participants_selected = df_participants_selected[df_participants_selected['flag']==True].reset_index(drop=True)
df_participants_selected.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44 entries, 0 to 43
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   participant_id  44 non-null     object 
 1   Gender          44 non-null     int64  
 2   Age             44 non-null     int64  
 3   Group           44 non-null     int64  
 4   MMSE            44 non-null     int64  
 5   time_max        44 non-null     float64
 6   points          44 non-null     int64  
 7   sfreq           44 non-null     float64
 8   flag            44 non-null     bool   
dtypes: bool(1), float64(2), int64(5), object(1)
memory usage: 2.9+ KB


In [5]:
df_participants_selected.groupby('Group')['participant_id'].count()

Group
0    22
1    22
Name: participant_id, dtype: int64

**Features Extracted using PSD Method from MNE for All the Channels**

In [6]:
eeg_mne.Dataset_Features_Summary(df_features_all)

Total Features: 6274
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 19 -> ['F7', 'Pz', 'T3', 'P4', 'Fp2', 'O2', 'C3', 'F4', 'C4', 'Fp1', 'T5', 'O1', 'Fz', 'Cz', 'F3', 'F8', 'T6', 'P3', 'T4']
Frequency Bands: 5 -> ['alpha', 'beta', 'gamma', 'delta', 'theta']
Features: 6 -> ['total power', 'spectral entropy', 'average power', 'relative power', 'std dev', 'peak to peak']


**Features Extracted using PSD Method from MNE for the _FP1_ Channel**

In [7]:
eeg_mne.Dataset_Features_Summary(df_features_fp1)

Total Features: 334
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 1 -> ['Fp1']
Frequency Bands: 5 -> ['alpha', 'beta', 'gamma', 'delta', 'theta']
Features: 6 -> ['total power', 'spectral entropy', 'average power', 'relative power', 'std dev', 'peak to peak']


## Preselect only the Total Power Features

In [8]:
freq_bands_filtered = []
features_filtered = ['total_power', 'std']

In [9]:
df_features_all = eeg_mne.filter_Features(df_features_all, freq_bands_filtered, features_filtered)
eeg_mne.Dataset_Features_Summary(df_features_all)

Number of Features Returned: 2094
Total Features: 2094
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 19 -> ['F7', 'Pz', 'T3', 'P4', 'Fp2', 'O2', 'C3', 'F4', 'C4', 'Fp1', 'T5', 'O1', 'Fz', 'Cz', 'F3', 'F8', 'T6', 'P3', 'T4']
Frequency Bands: 5 -> ['alpha', 'beta', 'gamma', 'delta', 'theta']
Features: 2 -> ['total power', 'std dev']


In [10]:
df_features_fp1 = eeg_mne.filter_Features(df_features_fp1, freq_bands_filtered, features_filtered)
eeg_mne.Dataset_Features_Summary(df_features_fp1)

Number of Features Returned: 114
Total Features: 114
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 1 -> ['Fp1']
Frequency Bands: 5 -> ['alpha', 'beta', 'gamma', 'delta', 'theta']
Features: 2 -> ['total power', 'std dev']


## Predictions with Cross-Validation

### All Channels

In [11]:
df_results_cv_allch = eeg_mne.eeg_classifier_cv(df=df_features_fp1, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='All')
df_results_cv_allch.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,All,Support Vector,KFold,anova,0.455556,0.361322,0.521667
1,PSD,All,Support Vector,KFold,mutual_info_classif,0.455556,0.366667,0.525
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,All,Support Vector,StratifiedKFold,anova,0.522222,0.416667,0.54
4,PSD,All,Support Vector,StratifiedKFold,mutual_info_classif,0.5,0.428462,0.52
5,PSD,All,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,All,Support Vector,StratifiedShuffleSplit,anova,0.577778,0.495425,0.57
7,PSD,All,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.555556,0.473333,0.58
8,PSD,All,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,All,Random Forest,KFold,anova,0.388889,0.305766,0.461667


Show the Top 20 results

In [12]:
df_results_cv_allch.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
19,PSD,All,XGBoost,KFold,mutual_info_classif,0.613889,0.607013,0.626667
24,PSD,All,XGBoost,StratifiedShuffleSplit,anova,0.622222,0.620556,0.625
18,PSD,All,XGBoost,KFold,anova,0.633333,0.603162,0.618333
11,PSD,All,Random Forest,KFold,chi2,0.547222,0.512511,0.611667
25,PSD,All,XGBoost,StratifiedShuffleSplit,mutual_info_classif,0.6,0.58974,0.605
23,PSD,All,XGBoost,StratifiedKFold,chi2,0.588889,0.582006,0.605
20,PSD,All,XGBoost,KFold,chi2,0.566667,0.553117,0.588333
7,PSD,All,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.555556,0.473333,0.58
6,PSD,All,Support Vector,StratifiedShuffleSplit,anova,0.577778,0.495425,0.57
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55


### FP1 Channel

In [13]:
df_results_cv_fp1 = eeg_mne.eeg_classifier_cv(df=df_features_all, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='Fp1')
df_results_cv_fp1.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,Fp1,Support Vector,KFold,anova,0.5,0.441707,0.485
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.455556,0.381667,0.523333
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,Fp1,Support Vector,StratifiedKFold,anova,0.566667,0.521234,0.58
4,PSD,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.455556,0.362664,0.47
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,Fp1,Support Vector,StratifiedShuffleSplit,anova,0.688889,0.681494,0.68
7,PSD,Fp1,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.533333,0.452967,0.54
8,PSD,Fp1,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,Fp1,Random Forest,KFold,anova,0.411111,0.289744,0.5


In [14]:
df_results_cv_fp1.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
6,PSD,Fp1,Support Vector,StratifiedShuffleSplit,anova,0.688889,0.681494,0.68
24,PSD,Fp1,XGBoost,StratifiedShuffleSplit,anova,0.644444,0.629654,0.635
17,PSD,Fp1,Random Forest,StratifiedShuffleSplit,chi2,0.577778,0.558333,0.6
11,PSD,Fp1,Random Forest,KFold,chi2,0.525,0.479394,0.595
3,PSD,Fp1,Support Vector,StratifiedKFold,anova,0.566667,0.521234,0.58
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
7,PSD,Fp1,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.533333,0.452967,0.54
21,PSD,Fp1,XGBoost,StratifiedKFold,anova,0.522222,0.508889,0.535
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.455556,0.381667,0.523333


Best performance from each Classifier for the two approaches (All Channels and FP1 Channel)

In [15]:
df_results_cv = pd.concat([df_results_cv_allch, df_results_cv_fp1], ignore_index=True)
df_results_cv['feature_extraction'] = 'PSD only Alpha and Theta Bands'
df_results_cv_sorted = df_results_cv.sort_values('AUC',ascending=False)
df_results_cv_sorted.groupby(['channels']).head(5)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
51,PSD only Alpha and Theta Bands,Fp1,Support Vector,StratifiedShuffleSplit,anova,0.688889,0.681494,0.68
69,PSD only Alpha and Theta Bands,Fp1,XGBoost,StratifiedShuffleSplit,anova,0.644444,0.629654,0.635
19,PSD only Alpha and Theta Bands,All,XGBoost,KFold,mutual_info_classif,0.613889,0.607013,0.626667
24,PSD only Alpha and Theta Bands,All,XGBoost,StratifiedShuffleSplit,anova,0.622222,0.620556,0.625
18,PSD only Alpha and Theta Bands,All,XGBoost,KFold,anova,0.633333,0.603162,0.618333
11,PSD only Alpha and Theta Bands,All,Random Forest,KFold,chi2,0.547222,0.512511,0.611667
25,PSD only Alpha and Theta Bands,All,XGBoost,StratifiedShuffleSplit,mutual_info_classif,0.6,0.58974,0.605
62,PSD only Alpha and Theta Bands,Fp1,Random Forest,StratifiedShuffleSplit,chi2,0.577778,0.558333,0.6
56,PSD only Alpha and Theta Bands,Fp1,Random Forest,KFold,chi2,0.525,0.479394,0.595
48,PSD only Alpha and Theta Bands,Fp1,Support Vector,StratifiedKFold,anova,0.566667,0.521234,0.58


## Save Results

In [16]:
file_results_cv = 'Results PSD Total Power - Cross-Validation.csv'
df_results_cv.to_csv(path_training+file_results_cv)