# Classifier Model with PSD Features from EEG Data without using Frequency Bands

**Description**:\
Develop a prediction model with the training dataset obtained with the PSD Method from MNE. The objective is to evaluate different classifier models and measure the results to compare them between the training dataset with all channels and only frontopolar channels.

For this section, all the dataset required are stored in the Training Dataset directory, which are mainly the PSD Feature Extraction results. For futher understanding, you can take a look at the notebook [Feature Extraction PSD no Freq-Bands](https://github.com/sobieddch90/mcd-udg-tfm-eeg-classification/blob/main/Modeling/2.%20EEG%20Classifier%20-%20PSD%20Features%20no%20Freq-Bands.ipynb) in Feature Extraction the `Feature Extraction` directory.

**Author**: Elmo Chavez\
**Date**: November 25, 2023

## Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import sys

path_eeg_mne = os.path.abspath(os.path.join(os.path.dirname('eeg_mne.py'), '..'))
sys.path.append(path_eeg_mne)
import eeg_mne

## Read the Dataset

In [2]:
path_training = '../Training Dataset/'

# Participants Dataset preselected in Feature Extraction step 1
file_participants_selected = 'Participants_Selected.csv'
df_participants_selected = pd.read_csv(path_training+file_participants_selected)

# PSD features with All channel
file_psd_features_all = 'PSD_Features-All_Channels_no_FreqBands.csv'
df_features_all = pd.read_csv(path_training+file_psd_features_all)

# PSD features with only FP1 channel
file_psd_features_fp1 = 'PSD_Features-FP1_Channel_no_FreqBands.csv'
df_features_fp1 = pd.read_csv(path_training+file_psd_features_fp1)

## Exploratory Data Analysis

**Brief Summary about the Participants Selected**

In [3]:
df_participants_selected

Unnamed: 0,participant_id,Gender,Age,Group,MMSE,time_max,points,sfreq,flag
0,sub-001,0,57,0,16,599.798,299900,500.0,True
1,sub-002,0,78,0,22,793.098,396550,500.0,True
2,sub-003,1,70,0,14,306.098,153050,500.0,False
3,sub-004,0,67,0,20,706.098,353050,500.0,True
4,sub-005,1,70,0,22,804.098,402050,500.0,True
...,...,...,...,...,...,...,...,...,...
83,sub-084,0,71,1,24,652.098,326050,500.0,True
84,sub-085,1,64,1,26,560.058,280030,500.0,True
85,sub-086,1,49,1,26,578.798,289400,500.0,True
86,sub-087,1,73,1,24,602.758,301380,500.0,True


Remove Participants not flagged:\
    - Participants with Healthy Control\
    - Participants with maximum recorded time less than 540 seconds\
    - Balancing classes to 22 samples for each group (Alzheimer Disease and Frototemporal Dementia)

In [4]:
df_participants_selected = df_participants_selected[df_participants_selected['flag']==True].reset_index(drop=True)
df_participants_selected.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44 entries, 0 to 43
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   participant_id  44 non-null     object 
 1   Gender          44 non-null     int64  
 2   Age             44 non-null     int64  
 3   Group           44 non-null     int64  
 4   MMSE            44 non-null     int64  
 5   time_max        44 non-null     float64
 6   points          44 non-null     int64  
 7   sfreq           44 non-null     float64
 8   flag            44 non-null     bool   
dtypes: bool(1), float64(2), int64(5), object(1)
memory usage: 2.9+ KB


In [5]:
df_participants_selected.groupby('Group')['participant_id'].count()

Group
0    22
1    22
Name: participant_id, dtype: int64

**Features Extracted using PSD Method from MNE for All the Channels**

In [6]:
eeg_mne.Dataset_Features_Summary(df_features_all)

Total Features: 1049
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 19 -> ['Cz', 'Fp1', 'O2', 'C3', 'F4', 'T5', 'T6', 'Fz', 'F3', 'T4', 'Fp2', 'P4', 'F7', 'O1', 'T3', 'C4', 'Pz', 'P3', 'F8']
Frequency Bands: 5 -> ['std', 'average', 'spectral', 'total', 'peak']
Features: 5 -> ['power', 'dev', 'v', 'entropy', 'to peak']


**Features Extracted using PSD Method from MNE for the _FP1_ Channel**

In [7]:
eeg_mne.Dataset_Features_Summary(df_features_fp1)

Total Features: 59
Windows: 11 -> ['w0', 'w1', 'w10', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9']
Channels: 1 -> ['Fp1']
Frequency Bands: 5 -> ['std', 'average', 'spectral', 'total', 'peak']
Features: 4 -> ['power', 'dev', 'entropy', 'to peak']


## Predictions with Cross-Validation

### All Channels

In [8]:
df_results_cv_allch = eeg_mne.eeg_classifier_cv(df=df_features_all, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='All')
df_results_cv_allch.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,All,Support Vector,KFold,anova,0.519444,0.487532,0.52
1,PSD,All,Support Vector,KFold,mutual_info_classif,0.386111,0.277622,0.475
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,All,Support Vector,StratifiedKFold,anova,0.427778,0.377772,0.445
4,PSD,All,Support Vector,StratifiedKFold,mutual_info_classif,0.5,0.4379,0.525
5,PSD,All,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,All,Support Vector,StratifiedShuffleSplit,anova,0.533333,0.497989,0.54
7,PSD,All,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.488889,0.379487,0.535
8,PSD,All,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,All,Random Forest,KFold,anova,0.411111,0.289744,0.5


Show the Top 20 results

In [9]:
df_results_cv_allch.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
11,PSD,All,Random Forest,KFold,chi2,0.572222,0.535,0.645
24,PSD,All,XGBoost,StratifiedShuffleSplit,anova,0.6,0.57619,0.6
2,PSD,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
6,PSD,All,Support Vector,StratifiedShuffleSplit,anova,0.533333,0.497989,0.54
17,PSD,All,Random Forest,StratifiedShuffleSplit,chi2,0.533333,0.531111,0.54
7,PSD,All,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.488889,0.379487,0.535
18,PSD,All,XGBoost,KFold,anova,0.563889,0.511339,0.531667
25,PSD,All,XGBoost,StratifiedShuffleSplit,mutual_info_classif,0.533333,0.52132,0.525
4,PSD,All,Support Vector,StratifiedKFold,mutual_info_classif,0.5,0.4379,0.525
5,PSD,All,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525


### FP1 Channel

In [10]:
df_results_cv_fp1 = eeg_mne.eeg_classifier_cv(df=df_features_fp1, feature_id='participant_id', target='Group', feature_extraction='PSD', channels='Fp1')
df_results_cv_fp1.head(10)

Running: Support Vector
Running: Random Forest
Running: XGBoost
Running: LigthGBM
Running: AdaBoost


Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
0,PSD,Fp1,Support Vector,KFold,anova,0.386111,0.277622,0.475
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.475,0.417532,0.551667
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
3,PSD,Fp1,Support Vector,StratifiedKFold,anova,0.380556,0.268376,0.425
4,PSD,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.494444,0.428077,0.525
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
6,PSD,Fp1,Support Vector,StratifiedShuffleSplit,anova,0.444444,0.350849,0.47
7,PSD,Fp1,Support Vector,StratifiedShuffleSplit,mutual_info_classif,0.4,0.326538,0.425
8,PSD,Fp1,Support Vector,StratifiedShuffleSplit,chi2,0.444444,0.307692,0.5
9,PSD,Fp1,Random Forest,KFold,anova,0.366667,0.294228,0.44


In [11]:
df_results_cv_fp1.sort_values('AUC',ascending=False).head(20)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
11,PSD,Fp1,Random Forest,KFold,chi2,0.547222,0.501061,0.62
22,PSD,Fp1,XGBoost,StratifiedKFold,mutual_info_classif,0.563889,0.546775,0.58
18,PSD,Fp1,XGBoost,KFold,anova,0.547222,0.526912,0.555
1,PSD,Fp1,Support Vector,KFold,mutual_info_classif,0.475,0.417532,0.551667
2,PSD,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
17,PSD,Fp1,Random Forest,StratifiedShuffleSplit,chi2,0.533333,0.523117,0.54
4,PSD,Fp1,Support Vector,StratifiedKFold,mutual_info_classif,0.494444,0.428077,0.525
5,PSD,Fp1,Support Vector,StratifiedKFold,chi2,0.480556,0.358881,0.525
42,PSD,Fp1,AdaBoost,StratifiedShuffleSplit,anova,0.488889,0.422644,0.515
40,PSD,Fp1,AdaBoost,StratifiedKFold,mutual_info_classif,0.477778,0.361282,0.515


Best performance from each Classifier for the two approaches (All Channels and FP1 Channel)

In [12]:
df_results_cv = pd.concat([df_results_cv_allch, df_results_cv_fp1], ignore_index=True)
df_results_cv['feature_extraction'] = 'PSD without Freq-Bands'
df_results_cv_sorted = df_results_cv.sort_values('AUC',ascending=False)
df_results_cv_sorted.groupby(['channels']).head(5)

Unnamed: 0,feature_extraction,channels,classifier,cross-validation,feature-selection,accuracy,f1_score,AUC
11,PSD without Freq-Bands,All,Random Forest,KFold,chi2,0.572222,0.535,0.645
56,PSD without Freq-Bands,Fp1,Random Forest,KFold,chi2,0.547222,0.501061,0.62
24,PSD without Freq-Bands,All,XGBoost,StratifiedShuffleSplit,anova,0.6,0.57619,0.6
67,PSD without Freq-Bands,Fp1,XGBoost,StratifiedKFold,mutual_info_classif,0.563889,0.546775,0.58
63,PSD without Freq-Bands,Fp1,XGBoost,KFold,anova,0.547222,0.526912,0.555
46,PSD without Freq-Bands,Fp1,Support Vector,KFold,mutual_info_classif,0.475,0.417532,0.551667
47,PSD without Freq-Bands,Fp1,Support Vector,KFold,chi2,0.461111,0.369744,0.55
2,PSD without Freq-Bands,All,Support Vector,KFold,chi2,0.461111,0.369744,0.55
17,PSD without Freq-Bands,All,Random Forest,StratifiedShuffleSplit,chi2,0.533333,0.531111,0.54
6,PSD without Freq-Bands,All,Support Vector,StratifiedShuffleSplit,anova,0.533333,0.497989,0.54


## Save Results

In [13]:
file_results_cv = 'Results PSD no Freq-Bands - Cross-Validation.csv'
df_results_cv.to_csv(path_training+file_results_cv)