## Description
This Notebook contains the code used to classify four levels of cognitive load using feature obtained from the entire period
of WM encoding stage. The models used are KNN, Random forest, and SVM. Grid search was used to optimize the model. Dimension analysis was done by applying PCA on the data 

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC 
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, f1_score,accuracy_score
from numpy import load
import mne
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV


## Importing our data
our datasets include PSD data from theta, alpha, and beta bands well as PSD from horizontal stak of these frequency bands

In [3]:
data_stack=pd.read_csv(r"C:\Users\Felix\Documents\Summer 2020\Research_summer\Full datasetML\Time_segmented\First interval\stack_data.csv",header=None)
data_thetha=pd.read_csv(r"C:\Users\Felix\Documents\Summer 2020\Research_summer\Full datasetML\thetha_data_NO_20.csv",header=None)
data_alpha=pd.read_csv(r"C:\Users\Felix\Documents\Summer 2020\Research_summer\Full datasetML\alpha_data_NO_20.csv",header=None)
data_beta=pd.read_csv(r"C:\Users\Felix\Documents\Summer 2020\Research_summer\Full datasetML\beta_data_NO_20.csv",header=None)
labels=pd.read_csv(r"C:\Users\Felix\Documents\Summer 2020\Research_summer\Full datasetML\All_Cl_labels.csv")
labels=np.ravel(labels)
scaler= MinMaxScaler()
## Scaling our data
data_stack=pd.DataFrame(MinMaxScaler().fit_transform(data_stack))
data_thetha=pd.DataFrame(MinMaxScaler().fit_transform(data_thetha))
data_alpha=pd.DataFrame(MinMaxScaler().fit_transform(data_alpha))
data_beta=pd.DataFrame(MinMaxScaler().fit_transform(data_beta))


### KNN model optimization with gridsearch
CL classfication results from the best model is shown

In [13]:
## Grid search for KNN
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
x_train, x_test, y_train, y_test = train_test_split(data_stack, labels, test_size=0.20)
param_grid= {"n_neighbors":range(1,100)}
grid=GridSearchCV(KNeighborsClassifier(),param_grid,refit=True,cv=5, verbose=2,n_jobs=-1)
grid.fit(x_train, y_train)
modelknn_192=grid.best_estimator_
print("knn_192 model:\n",modelknn_192)
modelknn_192.fit(x_train,y_train)

y_pred=modelknn_192.predict(x_test)

F1=classification_report(y_test,y_pred)
print(F1)

Fitting 5 folds for each of 99 candidates, totalling 495 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:  3.4min
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed: 33.2min
[Parallel(n_jobs=-1)]: Done 333 tasks      | elapsed: 84.8min
[Parallel(n_jobs=-1)]: Done 495 out of 495 | elapsed: 128.1min finished


knn_192 model:
 KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=13, p=2,
                     weights='uniform')
              precision    recall  f1-score   support

           1       0.84      0.89      0.86      2185
           2       0.86      0.87      0.86      2199
           3       0.87      0.87      0.87      2194
           4       0.88      0.83      0.86      2222

    accuracy                           0.86      8800
   macro avg       0.86      0.86      0.86      8800
weighted avg       0.86      0.86      0.86      8800



### Gridsearch random forest
1. Individual frquency bands

In [14]:
from sklearn.ensemble import RandomForestClassifier,BaggingClassifier
from sklearn.model_selection import GridSearchCV
x_train, x_test, y_train, y_test = train_test_split(data_beta, labels, test_size=0.20)
# Create the 
# Create the parameter grid based on the results of random search 
param_grid = {
    'bootstrap': [True],
    'max_depth': [80, 90, 100, 110],
    'max_features': [2, 3],
    'min_samples_leaf': [3, 4, 5],
    'min_samples_split': [8, 10, 12],
    'n_estimators': [100, 200, 300, 1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid= GridSearchCV(estimator = rf, param_grid = param_grid,
                          cv = 5, n_jobs = -1, verbose = 2)
grid.fit(x_train,y_train)
modelrf_64=grid.best_estimator_
print("rf_64 model:\n",modelrf_64)
modelrf_64.fit(x_train,y_train)

y_pred=modelrf_64.predict(x_test)

F1=classification_report(y_test,y_pred)
print(F1)


Fitting 5 folds for each of 288 candidates, totalling 1440 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:   36.3s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:  9.0min
[Parallel(n_jobs=-1)]: Done 333 tasks      | elapsed: 27.1min
[Parallel(n_jobs=-1)]: Done 616 tasks      | elapsed: 49.5min
[Parallel(n_jobs=-1)]: Done 981 tasks      | elapsed: 79.7min
[Parallel(n_jobs=-1)]: Done 1440 out of 1440 | elapsed: 119.9min finished


rf_64 model:
 RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=100, max_features=3,
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=3, min_samples_split=8,
                       min_weight_fraction_leaf=0.0, n_estimators=1000,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)
              precision    recall  f1-score   support

           1       0.85      0.82      0.83      2179
           2       0.83      0.84      0.84      2227
           3       0.85      0.85      0.85      2194
           4       0.82      0.84      0.83      2200

    accuracy                           0.84      8800
   macro avg       0.84      0.84      0.84      8800
weighted avg       0.84      0.84      0.84      8800



### Grid Search for Random forest
2. Stacked band powers

In [15]:
from sklearn.ensemble import RandomForestClassifier,BaggingClassifier
from sklearn.model_selection import GridSearchCV
x_train, x_test, y_train, y_test = train_test_split(data_stack, labels, test_size=0.20)
# Create the 
# Create the parameter grid based on the results of random search 
param_grid = {
    'bootstrap': [True],
    'max_depth': [80, 90, 100, 110],
    'max_features': [2, 3],
    'min_samples_leaf': [3, 4, 5],
    'min_samples_split': [8, 10, 12],
    'n_estimators': [100, 200, 300, 1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid= GridSearchCV(estimator = rf, param_grid = param_grid,
                          cv = 5, n_jobs = -1, verbose = 2)
grid.fit(x_train,y_train)
modelrf_192=grid.best_estimator_
print("rf_64 model:\n",modelrf_192)
modelrf_192.fit(x_train,y_train)

y_pred=modelrf_192.predict(x_test)

F1=classification_report(y_test,y_pred)
print(F1)


Fitting 5 folds for each of 288 candidates, totalling 1440 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:   34.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:  8.4min
[Parallel(n_jobs=-1)]: Done 333 tasks      | elapsed: 25.7min
[Parallel(n_jobs=-1)]: Done 616 tasks      | elapsed: 47.5min
[Parallel(n_jobs=-1)]: Done 981 tasks      | elapsed: 76.3min
[Parallel(n_jobs=-1)]: Done 1440 out of 1440 | elapsed: 115.3min finished


rf_64 model:
 RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=80, max_features=3,
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=3, min_samples_split=8,
                       min_weight_fraction_leaf=0.0, n_estimators=1000,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)
              precision    recall  f1-score   support

           1       0.91      0.91      0.91      2249
           2       0.90      0.90      0.90      2204
           3       0.91      0.89      0.90      2204
           4       0.90      0.91      0.91      2143

    accuracy                           0.91      8800
   macro avg       0.91      0.91      0.91      8800
weighted avg       0.91      0.91      0.91      8800



## Grid search for SVM

In [26]:
x_train_15, x_test_15, y_train_15, y_test_15 = train_test_split(data_stack, labels, test_size=0.20)
param_grid= {"C":[0.001,0.1,1,10,100],"gamma":[1,0.5,0.1,0.01],
            "kernel":["rbf","linear","poly"]}
grid=GridSearchCV(SVC(),param_grid,refit=True,verbose=2,n_jobs=-1)
grid.fit(x_train_15, y_train_15)
y_pred_15=grid.predict(x_test_15)
print(classification_report(y_test_15,y_pred_15))
print(accuracy_score(y_test_15,y_pred_15))

Fitting 5 folds for each of 60 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed: 41.1min
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed: 326.6min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed: 592.5min finished


              precision    recall  f1-score   support

           1       0.93      0.94      0.94      2228
           2       0.94      0.94      0.94      2179
           3       0.93      0.93      0.93      2262
           4       0.95      0.94      0.94      2131

    accuracy                           0.94      8800
   macro avg       0.94      0.94      0.94      8800
weighted avg       0.94      0.94      0.94      8800

0.9384090909090909


## PCA and Dimension reduction results
The following codes show the classfication results when PCA with different number of components is applied to our dataset.

In [18]:
def pca_modeling(data,n_features,model):
    data_new=PCA(n_features).fit_transform(data)
    x_train, x_test, y_train, y_test = train_test_split(data_new, labels, test_size=0.20)
    model.fit(x_train,y_train)
    y_pred=model.predict(x_test)
    return classification_report(y_test,y_pred)
         

### PCA on thetha band PSD for SVM classfier

In [None]:

svm= model=SVC(C=10000, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False)
F1_SVM5=pca_modeling(data_alpha,5,svm)
F1_SVM16=pca_modeling(data_alpha,16,svm)
F1_SVM32=pca_modeling(data_alpha,32,svm)
F1_SVM50=pca_modeling(data_alpha,50,svm)
F1_SVM64=pca_modeling(data_alpha,64,svm)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

In [20]:
F1_SVM5=pca_modeling(data_thetha,5,svm)
F1_SVM16=pca_modeling(data_thetha,16,svm)
F1_SVM32=pca_modeling(data_thetha,32,svm)
F1_SVM50=pca_modeling(data_thetha,50,svm)
F1_SVM64=pca_modeling(data_thetha,64,svm)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

              precision    recall  f1-score   support

           1       0.38      0.44      0.41      2194
           2       0.36      0.38      0.37      2177
           3       0.40      0.44      0.42      2216
           4       0.46      0.30      0.36      2213

    accuracy                           0.39      8800
   macro avg       0.40      0.39      0.39      8800
weighted avg       0.40      0.39      0.39      8800
               precision    recall  f1-score   support

           1       0.54      0.66      0.59      2173
           2       0.60      0.63      0.62      2178
           3       0.68      0.55      0.61      2180
           4       0.63      0.57      0.60      2269

    accuracy                           0.60      8800
   macro avg       0.61      0.60      0.60      8800
weighted avg       0.61      0.60      0.60      8800
               precision    recall  f1-score   support

           1       0.69      0.78      0.73      2237
           2       0.

## PCA on Theta band PSD for KNN classifier

In [21]:
knn=modelknn_64
F1_SVM5=pca_modeling(data_thetha,5,knn)
F1_SVM16=pca_modeling(data_thetha,16,knn)
F1_SVM32=pca_modeling(data_thetha,32,knn)
F1_SVM50=pca_modeling(data_thetha,50,knn)
F1_SVM64=pca_modeling(data_thetha,64,knn)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

              precision    recall  f1-score   support

           1       0.41      0.47      0.44      2167
           2       0.43      0.45      0.44      2261
           3       0.46      0.42      0.44      2203
           4       0.47      0.42      0.44      2169

    accuracy                           0.44      8800
   macro avg       0.44      0.44      0.44      8800
weighted avg       0.44      0.44      0.44      8800
               precision    recall  f1-score   support

           1       0.62      0.64      0.63      2203
           2       0.63      0.68      0.66      2200
           3       0.65      0.64      0.64      2217
           4       0.66      0.60      0.63      2180

    accuracy                           0.64      8800
   macro avg       0.64      0.64      0.64      8800
weighted avg       0.64      0.64      0.64      8800
               precision    recall  f1-score   support

           1       0.70      0.75      0.72      2183
           2       0.

### PCA on Alpha for KNN


In [28]:
knn=modelknn_64
F1_SVM5=pca_modeling(data_beta,5,knn)
F1_SVM16=pca_modeling(data_beta,16,knn)
F1_SVM32=pca_modeling(data_beta,32,knn)
F1_SVM50=pca_modeling(data_beta,50,knn)
F1_SVM64=pca_modeling(data_beta,64,knn)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

              precision    recall  f1-score   support

           1       0.42      0.44      0.43      2290
           2       0.42      0.47      0.45      2102
           3       0.49      0.47      0.48      2245
           4       0.46      0.39      0.43      2163

    accuracy                           0.45      8800
   macro avg       0.45      0.45      0.44      8800
weighted avg       0.45      0.45      0.45      8800
               precision    recall  f1-score   support

           1       0.59      0.65      0.62      2260
           2       0.61      0.64      0.62      2207
           3       0.64      0.63      0.63      2160
           4       0.66      0.59      0.63      2173

    accuracy                           0.63      8800
   macro avg       0.63      0.63      0.63      8800
weighted avg       0.63      0.63      0.63      8800
               precision    recall  f1-score   support

           1       0.68      0.72      0.70      2228
           2       0.

## Random Forest nd Beta band

In [27]:
rf=modelrf_64
F1_SVM5=pca_modeling(data_beta,5,rf)
F1_SVM16=pca_modeling(data_beta,16,rf)
F1_SVM32=pca_modeling(data_beta,32,rf)
F1_SVM50=pca_modeling(data_beta,50,rf)
F1_SVM64=pca_modeling(data_beta,64,rf)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

              precision    recall  f1-score   support

           1       0.45      0.44      0.44      2214
           2       0.46      0.48      0.47      2194
           3       0.49      0.51      0.50      2209
           4       0.45      0.42      0.44      2183

    accuracy                           0.46      8800
   macro avg       0.46      0.46      0.46      8800
weighted avg       0.46      0.46      0.46      8800
               precision    recall  f1-score   support

           1       0.65      0.62      0.63      2229
           2       0.66      0.68      0.67      2243
           3       0.68      0.67      0.67      2188
           4       0.65      0.66      0.66      2140

    accuracy                           0.66      8800
   macro avg       0.66      0.66      0.66      8800
weighted avg       0.66      0.66      0.66      8800
               precision    recall  f1-score   support

           1       0.73      0.73      0.73      2230
           2       0.

### Random Forest and Alpha

In [25]:
rf=modelrf_64
F1_SVM5=pca_modeling(data_alpha,5,rf)
F1_SVM16=pca_modeling(data_alpha,16,rf)
F1_SVM32=pca_modeling(data_alpha,32,rf)
F1_SVM50=pca_modeling(data_alpha,50,rf)
F1_SVM64=pca_modeling(data_alpha,64,rf)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64)

              precision    recall  f1-score   support

           1       0.43      0.47      0.45      2125
           2       0.44      0.45      0.45      2270
           3       0.45      0.47      0.46      2143
           4       0.43      0.37      0.39      2262

    accuracy                           0.44      8800
   macro avg       0.44      0.44      0.44      8800
weighted avg       0.44      0.44      0.44      8800
               precision    recall  f1-score   support

           1       0.64      0.65      0.64      2157
           2       0.66      0.66      0.66      2274
           3       0.66      0.65      0.65      2144
           4       0.64      0.64      0.64      2225

    accuracy                           0.65      8800
   macro avg       0.65      0.65      0.65      8800
weighted avg       0.65      0.65      0.65      8800
               precision    recall  f1-score   support

           1       0.71      0.72      0.71      2134
           2       0.

## PCA on stacked band powers for SVM classifier

In [34]:
modelsvmstack=modelsvm192
F1_SVM5=pca_modeling(data_stack,5,modelsvmstack)
F1_SVM16=pca_modeling(data_stack,16,modelsvmstack)
F1_SVM32=pca_modeling(data_stack,32,modelsvmstack)
F1_SVM50=pca_modeling(data_stack,50,modelsvmstack)
F1_SVM64=pca_modeling(data_stack,64,modelsvmstack)
F1_SVM128=pca_modeling(data_stack,128,modelsvmstack)
F1_SVM192=pca_modeling(data_stack,192,modelsvmstack)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64,F1_SVM128,F1_SVM192)

              precision    recall  f1-score   support

           1       0.48      0.34      0.40      2243
           2       0.42      0.49      0.45      2210
           3       0.43      0.49      0.46      2225
           4       0.42      0.43      0.42      2122

    accuracy                           0.44      8800
   macro avg       0.44      0.44      0.43      8800
weighted avg       0.44      0.44      0.43      8800
               precision    recall  f1-score   support

           1       0.69      0.65      0.67      2171
           2       0.70      0.71      0.71      2237
           3       0.70      0.73      0.71      2191
           4       0.68      0.68      0.68      2201

    accuracy                           0.69      8800
   macro avg       0.69      0.69      0.69      8800
weighted avg       0.69      0.69      0.69      8800
               precision    recall  f1-score   support

           1       0.82      0.82      0.82      2230
           2       0.

## PCA on stacked band for KNN 

In [35]:
modelknnstack=modelknn_192
F1_SVM5=pca_modeling(data_stack,5,modelknnstack)
F1_SVM16=pca_modeling(data_stack,16,modelknnstack)
F1_SVM32=pca_modeling(data_stack,32,modelknnstack)
F1_SVM50=pca_modeling(data_stack,50,modelknnstack)
F1_SVM64=pca_modeling(data_stack,64,modelknnstack)
F1_SVM128=pca_modeling(data_stack,128,modelknnstack)
F1_SVM192=pca_modeling(data_stack,192,modelknnstack)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64,F1_SVM128,F1_SVM192)

              precision    recall  f1-score   support

           1       0.44      0.49      0.47      2217
           2       0.47      0.50      0.49      2189
           3       0.51      0.50      0.51      2175
           4       0.48      0.41      0.44      2219

    accuracy                           0.47      8800
   macro avg       0.48      0.47      0.47      8800
weighted avg       0.48      0.47      0.47      8800
               precision    recall  f1-score   support

           1       0.62      0.72      0.67      2149
           2       0.67      0.67      0.67      2193
           3       0.74      0.70      0.72      2246
           4       0.70      0.62      0.66      2212

    accuracy                           0.68      8800
   macro avg       0.68      0.68      0.68      8800
weighted avg       0.68      0.68      0.68      8800
               precision    recall  f1-score   support

           1       0.73      0.79      0.76      2268
           2       0.

## PCA on stacked band for Random Forest

In [36]:
modelrfstack=modelrf_192
F1_SVM5=pca_modeling(data_stack,5,modelrfstack)
F1_SVM16=pca_modeling(data_stack,16,modelrfstack)
F1_SVM32=pca_modeling(data_stack,32,modelrfstack)
F1_SVM50=pca_modeling(data_stack,50,modelrfstack)
F1_SVM64=pca_modeling(data_stack,64,modelrfstack)
F1_SVM128=pca_modeling(data_stack,128,modelrfstack)
F1_SVM192=pca_modeling(data_stack,192,modelrfstack)
print(F1_SVM5,F1_SVM16,F1_SVM32,F1_SVM50,F1_SVM64,F1_SVM128,F1_SVM192)

              precision    recall  f1-score   support

           1       0.47      0.46      0.46      2202
           2       0.49      0.50      0.50      2215
           3       0.53      0.52      0.52      2171
           4       0.47      0.47      0.47      2212

    accuracy                           0.49      8800
   macro avg       0.49      0.49      0.49      8800
weighted avg       0.49      0.49      0.49      8800
               precision    recall  f1-score   support

           1       0.69      0.67      0.68      2152
           2       0.69      0.71      0.70      2220
           3       0.73      0.73      0.73      2263
           4       0.68      0.69      0.69      2165

    accuracy                           0.70      8800
   macro avg       0.70      0.70      0.70      8800
weighted avg       0.70      0.70      0.70      8800
               precision    recall  f1-score   support

           1       0.77      0.76      0.77      2128
           2       0.