# Project Description
The following is the code used in the making of a MANU465 Capstone Project for Group 1. The entirety of the code and the data used throughout this project is available at [this github repository](https://github.com/AlbertoMussali-UBC/MANU465_Team1_Project).
## Authors

|       Name      | Student ID |
|:---------------:|:----------:|
|   Anant Goyal   |  46894325  |
| Alberto Mussali |  50684182  |
|    Musa Habib   |  25899808  |
| Sadul Bombuwala |  76343292  |

## Overview
### Motivation
Possibly the most incredible aspect of Machine Learning and Artificial Intelligence is the ability to mimic human-like decision making. Upon learning about how Artificial Neural Networks are constructed and how they work, we were incredibly intrigued by the workings of the neural networks without our brains. We see this project as a means to further study and understand the complexities of our minds. Collecting and analyzing brainwave data is something we have never had the opportunity to do, and we hope that during the course of this project we will gain a valuable understanding of how brainwave data within humans can be used for research in the world of Artificial Intelligence.

### Goals and Objective
**Objective: To be able to use machine learning models to predict whether or not a person is fatigued, based on brainwave data.**

The Muse 2 is a multi-sensor meditation device that provides real-time feedback on brain activity, heart rate, breathing, and body movements to help users build a consistent meditation practice. When paired with the Mind Monitor phone application, one can view and analyze the neural oscillation readings picked up by the Muse 2, and use this headband for purposes beyond mediation.

> [https://choosemuse.com/muse-2/](https://choosemuse.com/muse-2/)  
> [https://mind-monitor.com/](https://mind-monitor.com/)

Our project aims to use these tools to collect data on the variations in brain activity when an individual is in a Sleep-deprived state of mind, and compare this to when they are well rested. Using this data we plan to build a Machine Learning algorithm which accurately predicts the level of sleep deprivation of an individual, based on the brainwave data passed into the algorithm.

## Challenges
It should be noted that due to the small amount of data compiled, all the trained models are observed to overfit the dataset. This can be remedied in the future by taking more data over the span of a longer timeframe (approximately 1 year) for multiple subjects on both states.

# General Setup

## Fix Random State

In [1]:
SEED = 55;

## Importing Libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import tensorflow.keras as kr
import seaborn as sns
import os

# Importing the Raw Data

In [3]:
%%time


# Get Current Working directory and append the data relative dir
cwd = os.getcwd()
notTiredDir = cwd + r"\Data\Raw\NotTired"
tiredDir = cwd + r"\Data\Raw\Tired"

# Hold file locations
filesTired=[];
filesNotTired=[];

#Populate file location arrays
for file in os.listdir(notTiredDir):
    if file.endswith('.csv'):
        filesNotTired.append(os.path.join(notTiredDir, file))
for file in os.listdir(tiredDir):
        if file.endswith('.csv'):
            filesTired.append(os.path.join(tiredDir, file))
            
#Test reading files by changing num
num=6;
sample = pd.read_csv(filesNotTired[num])
sample 

Wall time: 11.9 ms


Unnamed: 0,TimeStamp,Delta_TP9,Delta_AF7,Delta_AF8,Delta_TP10,Theta_TP9,Theta_AF7,Theta_AF8,Theta_TP10,Alpha_TP9,...,Gyro_X,Gyro_Y,Gyro_Z,HeadBandOn,HSI_TP9,HSI_AF7,HSI_AF8,HSI_TP10,Battery,Elements
0,2021-11-01 17:54:38.045,1.110457,-0.382196,0.082630,0.743808,0.455723,-0.523256,0.086015,0.487615,0.493558,...,4.134674,-5.824432,-1.510315,1.0,1.0,2.0,1.0,1.0,70.0,
1,2021-11-01 17:54:39.045,0.904642,-0.382196,0.236881,0.613098,0.313527,-0.523256,0.171247,0.546970,0.538756,...,4.329071,-2.990723,-1.644897,1.0,1.0,2.0,1.0,1.0,70.0,
2,2021-11-01 17:54:39.187,,,,,,,,,,...,,,,,,,,,,/muse/elements/jaw_clench
3,2021-11-01 17:54:40.045,0.652124,-0.382196,0.462323,0.410327,0.293693,-0.523256,0.267178,0.466408,0.343593,...,5.622559,-5.099182,-0.732727,1.0,1.0,2.0,1.0,1.0,70.0,
4,2021-11-01 17:54:41.043,0.558608,-0.382196,0.502156,0.877835,0.281408,-0.523256,0.337400,0.469669,0.381862,...,4.882355,-3.536530,-1.652374,1.0,1.0,2.0,1.0,1.0,70.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
156,2021-11-01 17:56:57.042,1.011459,-0.382196,0.502156,0.955036,0.456557,-0.523256,0.337400,0.439388,0.575279,...,5.510406,-7.880554,-2.257996,1.0,1.0,4.0,2.0,1.0,70.0,
157,2021-11-01 17:56:57.111,,,,,,,,,,...,,,,,,,,,,/muse/elements/blink
158,2021-11-01 17:56:57.852,,,,,,,,,,...,,,,,,,,,,/muse/elements/blink
159,2021-11-01 17:56:58.042,1.011459,-0.382196,0.502156,0.955036,0.456557,-0.523256,0.337400,0.439388,0.575279,...,4.844971,-6.190796,-2.781372,1.0,1.0,4.0,4.0,1.0,70.0,


In [4]:
#Mini-Summary of Block
print(f"> {len(filesNotTired)} files were added from the NOT TIRED category")
print(f"> {len(filesTired)} files were added from the TIRED category\n")

> 31 files were added from the NOT TIRED category
> 20 files were added from the TIRED category



## Available Features

In [5]:
print("Features generated by the Muse 2 headband:")

pd.DataFrame(sample.columns).T

Features generated by the Muse 2 headband:


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29,30,31,32,33,34,35,36,37,38
0,TimeStamp,Delta_TP9,Delta_AF7,Delta_AF8,Delta_TP10,Theta_TP9,Theta_AF7,Theta_AF8,Theta_TP10,Alpha_TP9,...,Gyro_X,Gyro_Y,Gyro_Z,HeadBandOn,HSI_TP9,HSI_AF7,HSI_AF8,HSI_TP10,Battery,Elements


## Raw Data Structure

In [6]:
#quick view of data
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161 entries, 0 to 160
Data columns (total 39 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   TimeStamp        161 non-null    object 
 1   Delta_TP9        141 non-null    float64
 2   Delta_AF7        141 non-null    float64
 3   Delta_AF8        141 non-null    float64
 4   Delta_TP10       141 non-null    float64
 5   Theta_TP9        141 non-null    float64
 6   Theta_AF7        141 non-null    float64
 7   Theta_AF8        141 non-null    float64
 8   Theta_TP10       141 non-null    float64
 9   Alpha_TP9        141 non-null    float64
 10  Alpha_AF7        141 non-null    float64
 11  Alpha_AF8        141 non-null    float64
 12  Alpha_TP10       141 non-null    float64
 13  Beta_TP9         141 non-null    float64
 14  Beta_AF7         141 non-null    float64
 15  Beta_AF8         141 non-null    float64
 16  Beta_TP10        141 non-null    float64
 17  Gamma_TP9       

# Data Preprocessing

Note: Initial preprocessing here was done to tailor the data so it could be passed to Jordan Bird's function 'EEG_feature_extraction' (https://github.com/AlbertoMussali-UBC/MANU465_Team1_Project)

## Creating the RAW Dataset

In [None]:
%%time
## Extract rows 21-25 from all files,
## these are the only 5 features relevant for use in the EEG_feature_extraction function.

rowsTired=[];
for f in filesTired:
    for r in range(pd.read_csv(f).shape[0]):
        rowsTired.append(pd.read_csv(f).iloc[r,[0, 21,22,23,24,25]])

rowsNotTired=[];
for f in filesNotTired:
    for r in range(pd.read_csv(f).shape[0]):
        rowsNotTired.append(pd.read_csv(f).iloc[r,[0, 21,22,23,24,25]])



In [None]:
#Convert to DataFrames:

data_NT = pd.DataFrame(rowsNotTired);
original_NT = data_NT.copy();
data_NT

In [None]:
data_T = pd.DataFrame(rowsTired);
original_T = data_T.copy();
data_T

In [None]:
#quick check of DataFrames

print(f"Not Tired Data size is: \t{data_NT.shape}", f"\nTired Data size is: \t\t{data_T.shape}")

## Remove Empty Rows

Remove NaN values associated with blinking and jaw clenching

In [None]:
data_T = data_T.dropna()

In [None]:
data_NT = data_NT.dropna()

## Convert Datetime Column to Timestamps
Required for compatibility with EEG_feature_extraction function

In [None]:
from datetime import datetime

ind = 0;
for time in data_T.iloc[:, 0]:
    tmstmp = datetime.strptime(str(time), '%Y-%m-%d %H:%M:%S.%f').timestamp()
    data_T.iat[ind, 0] = (tmstmp);
    ind=ind+1;
    
ind = 0;
for time in data_NT.iloc[:, 0]:
    tmstmp = datetime.strptime(str(time), '%Y-%m-%d %H:%M:%S.%f').timestamp()
    data_NT.iat[ind, 0] = (tmstmp);
    ind=ind+1;



In [None]:
#quick check 
data_NT.head()

## Save Data to File

**Alternative STARTING POINT once data collection is finalized**

Note: this step was done to skip Section 3.1 which would take very long to run each time

In [None]:
savelocT = cwd + r"\Data\Preprocessed\Tired.csv"
savelocNT = cwd + r"\Data\Preprocessed\NotTired.csv"

if os.path.exists(savelocT):
    os.remove(savelocT)
    
if os.path.exists(savelocNT):
    os.remove(savelocNT)

data_T.to_csv(savelocT,  mode='w', index = False)
data_NT.to_csv(savelocNT,mode='w', index = False)


## EEG Feature Generation
Execution of the function

In [None]:
from eegFG import EEG_feature_extraction as FG

#tried various combinations of Nsamp and Perio
#This combination was optimal
Nsamp = 50;
Perio = 6;

xT, yT = FG.generate_feature_vectors_from_samples(file_path=savelocT,
                                         nsamples=Nsamp, 
                                         period=Perio,
                                         #state=data_NT.iloc[:,-1],
                                         slide_percent=0.05,
                                         remove_redundant=False, 
                                         cols_to_ignore=None)
xT.shape

In [None]:
Nsamp = 50;
Perio = 5;

xNT, yNT = FG.generate_feature_vectors_from_samples(file_path=savelocNT,
                                         nsamples=Nsamp, 
                                         period=Perio,
                                         #state=data_NT.iloc[:,-1],
                                         slide_percent=0.06,
                                         remove_redundant=False, 
                                         cols_to_ignore=None)
xNT.shape

> **The following code was used to optimize feature generation in Jordan Bird's method and can be ignored for now**

> ```python
> 
> %%time
> 
> from importlib import reload
> 
> flaggity=False
> 
> tmp_results=[]
> thresh = 95;
> for ns in range(50,256,1):
>     if (flaggity==True):
>         break;
>     for p in range(3,8):
>         
>         try:
>             reload(FG);
>             xT, yT = FG.generate_feature_vectors_from_samples(file_path=savelocT,
>                                  nsamples=ns, 
>                                  period=p,
>                                  #state=data_NT.iloc[:,-1],
>                                  slide_percent=0.01,
>                                  remove_redundant=False, 
>                                  cols_to_ignore=None)
>             
>             xNT, yNT = FG.generate_feature_vectors_from_samples(file_path=savelocNT,
>                                  nsamples=ns, 
>                                  period=p,
>                                  #state=data_NT.iloc[:,-1],
>                                  slide_percent=0.01,
>                                  remove_redundant=False, 
>                                  cols_to_ignore=None)
>         
>         except (UnboundLocalError):
>             continue;
>             
>         
>         if (xNT.shape[1] == xT.shape[1]):
>             print('Cols match!', xT.shape, xNT.shape)
>             if (xNT.shape[0] >= thresh and xT.shape[0] >= thresh):
>                 print('Thresh met.')
>                 tmp_results.append((ns,p,xNT.shape[0],xT.shape[0],xNT.shape[1]))
>                 flaggity=True;
>                 break;
>                 
>                 
> tmp_results
> 
> ```

In [None]:
#some quick checks

X_NT = pd.DataFrame(np.real(xNT))
X_NT.columns = np.hstack((['TimeStamp'], yNT))
X_NT.describe()

In [None]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    display(pd.DataFrame(pd.DataFrame(X_NT).head()))

In [None]:
X_T = pd.DataFrame(np.real(xT))
X_T.columns = np.hstack((['TimeStamp'], yT))
X_T.describe()

In [None]:
# Drop TimeStamps as they are not needed anymore

X_T=X_T.iloc[:,1:];
X_NT=X_NT.iloc[:,1:];

## Attach Labels for Each Class

In [None]:
#Stack ones or zeros for each class [0 = NotTired, 1 = Tired]
X_T = pd.DataFrame(np.hstack((X_T.to_numpy(),   np.ones((X_T.shape[0], 1)))))
X_NT= pd.DataFrame(np.hstack((X_NT.to_numpy(), np.zeros((X_NT.shape[0], 1)))))

In [None]:
#Add label heading
X_T.columns  = np.hstack((yT, ['Target']))
X_NT.columns = np.hstack((yNT, ['Target']))

## Check Column Coherency

In [None]:
#Ensure Data is Coherent (same number of columns for X_T and X_NT)

print(X_T.shape[1], X_NT.shape[1])

if (X_T.shape[1] == X_NT.shape[1]):
    dataset = np.vstack((X_T, X_NT))
    dataset = pd.DataFrame(dataset)
    print('Columns are coherent')
else:
    print('NOT COHERENT')


## Randomize the Dataset

In [None]:
dataset.columns = np.hstack((yT, ['Target']))
dataset = dataset.sample(frac = 1).reset_index(drop=True)
dataset.head()

## Separating Input and Output

In [None]:
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

#Y labeled for plotting or result check
def label(n):
    if (n==0):
        return 'Not Tired'
    return 'Tired'

y_labeled = list(map(label, y));

## Splitting Dataset into the Training and Test Sets

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = SEED)

## Feature Scaling
Note for future: see if this step is not needed as we rescale the PCs anyway

In [None]:
from sklearn.preprocessing import StandardScaler
scX = StandardScaler();
scX.fit(X_train); #Fit to training data only
x = scX.transform(x)

In [None]:
#quick view of scaled data
pd.DataFrame(x)

# Principal Component Analysis
## Calculate Principal Components

In [None]:
from sklearn.decomposition import PCA

information = 225; #15^2=225;
PrinCom=PCA(n_components=information, random_state = SEED)
PrinCom.fit(X_train)

#save as new variable for PCs so as to not tamper with old variable
Z_train = PrinCom.transform(X_train);
Z_test  = PrinCom.transform(X_test);
Z       = PrinCom.transform(x)

print('Train set shape = ',Z_train.shape, '\nTest set shape  = ',Z_test.shape)


pd.DataFrame(Z).describe() #Data No longer Standard
print(f"Using the first {Z.shape[1]} Principal Components describes {np.round(PrinCom.explained_variance_ratio_.sum() * 100,5)}% of the data.")
pd.DataFrame(Z)

## Scaling the Principal Components

In [None]:
## Ignore for now - Ask Ahmad later [Al&Mus]
# scZ = StandardScaler();
# scZ.fit(Z_train);
# Z = scZ.transform(Z)
# Z_train = scZ.transform(Z_train)
# Z_test = scZ.transform(Z_test)
# pd.DataFrame(Z).head()

scZ = StandardScaler();
scZ.fit(Z);
Z = scZ.transform(Z)
pd.DataFrame(Z).head()

# Image Creation

## Rescaling and Reshaping

In [None]:
## Scale all the PCA components on 0-256 (image greyscale range)


def gen_images(data):
    images=[];
    for r in range(0,data.shape[0]): #Cycle over rows
        pixels=[];
        mini=min(data[r,:])
        maxi=max(data[r,:])
        m = (maxi-mini)/(256);

        for c in range(0,225): #Cycle over cols
            curPixel = data[r,c]
            pixels.append((((curPixel - mini) / (maxi - mini)) * 255.9).astype(np.uint8))

        #once cols are done running add the image to the images[] array
        img = np.reshape(pixels, (15,15)); #reshape into a square image
        images.append(img)
        
    return images;
   
#Generate images from each data split 
all_images  = gen_images(Z)
x_train_img = gen_images(Z_train)
x_test_img  = gen_images(Z_test)

#Get number of rows in each data split
height_total = Z.shape[0];
height_train = Z_train.shape[0];
height_test =  Z_test.shape[0];

#Reshape into input shape for CNN models
#15,15,1 indicates a 15x15 pixel greyscale image
x_train_img = np.array(x_train_img).reshape(height_train,15,15,1)
x_test_img  = np.array(x_test_img).reshape(height_test,15,15,1)
all_images  = np.array(all_images).reshape(height_total,15,15,1)

## Saving Images

In [None]:
import imageio

# Relative paths to saved folders
TiredImgFolder = cwd + r"\Data\GeneratedImages\Tired"
NotTiredImgFolder = cwd + r"\Data\GeneratedImages\Not Tired"

# Clear the folders
import glob

files = glob.glob(TiredImgFolder + r"\*")
for f in files:
    os.remove(f)

files = glob.glob(NotTiredImgFolder + r"\*")
for f in files:
    os.remove(f)   


ctr1=0;
ctr2=0;
for img in all_images.reshape(height_total,15,15):
    
    if (y[ctr1+ctr2] == 0): #Not Tired
        fstr = NotTiredImgFolder + r"\img_" + str(ctr1) + r".png"
        imageio.imwrite(fstr, img[:, :], dpi=(300,300))
        ctr1+=1; #Counter
    else:
        fstr = TiredImgFolder + r"\img_" + str(ctr2) + r".png"
        imageio.imwrite(fstr, img[:, :], dpi=(300,300))
        ctr2+=1;

## Image Feature Generation

### Datagen Definitions
Generate image data for each of the images - allows us to generate more images to increase input to CNN

In [None]:
from keras.preprocessing.image import ImageDataGenerator

#rpath to IMageFolder
genimgsPath = cwd + r"\Data\GeneratedImages"

r  = 1       #rescale
sr = 0.2     #shear range
zr = 0.2     #zoom range
hf = False   #horizontal flip

ValidationSplit = 0.2

imageGenerator = ImageDataGenerator(rescale = r,
                                   shear_range = sr,
                                   zoom_range = zr,
                                   horizontal_flip = hf,
                                   validation_split=ValidationSplit)

### Generating Features

In [None]:
imgs_train = imageGenerator.flow_from_directory(genimgsPath,
                                                target_size = (15, 15),
                                                batch_size = 32,
                                                subset="training",        #creates training subset
                                                class_mode='categorical',
                                                shuffle=True,
                                                color_mode="grayscale")

imgs_test = imageGenerator.flow_from_directory(genimgsPath,
                                               target_size = (15, 15),
                                               batch_size = 32,
                                               subset="validation",       #creates test subset
                                               class_mode='categorical',
                                               shuffle=True,
                                               color_mode="grayscale")
print(imgs_test.class_indices)

## Image Examples

In [None]:
#change n to see a different data image
n = 10
sns.heatmap(all_images.reshape(height_total,15,15)[n], cmap='gray');
print(f'This image is for the: \"{y_labeled[n]}\" class.')

In [None]:
#nth row of data
n=252
sns.heatmap(all_images.reshape(height_total,15,15)[n], cmap='gray');
print(f'This image is for the: \"{y_labeled[n]}\" class.')

# Data Exploration

## General Correlation Matrix for Principal Components

In [None]:
corr_mat = pd.DataFrame(Z).corr(method='pearson');
#mask = np.triu(np.ones_like(corr_mat, dtype=bool));
plt.figure(dpi=300);
plt.subplots(figsize=(21,21));
plt.title("Pearson's R Correlation Matrix for the top 225 Principal Components", fontsize=20);
sns.heatmap(corr_mat, annot=False, lw=0, linecolor='white', cmap='inferno');
#print('Too many features to visualize at once!')

In [None]:
corr_mat = pd.DataFrame(Z[:,0:25]).corr(method='pearson');
mask = np.triu(np.ones_like(corr_mat, dtype=bool));
plt.figure(dpi=300);
plt.subplots(figsize=(10,8));
plt.title("Pearson's R Correlation Matrix for the top 25 Principal Components", fontsize=12);
sns.heatmap(corr_mat, annot=False, lw=0.2, linecolor='white', cmap='inferno', mask=mask);
#print('Too many features to visualize at once!')

## Plotting the Principal Components

In [None]:
p1=4;
p2=100;
ax1 = sns.scatterplot(x=Z[:,p1], y=Z[:,p2], hue=y_labeled);
ax1.set(title='Principal Components',
        ylabel=f'Principal Component {p2}',
        xlabel=f'Principal Component {p1}');

## PC Distributions

In [None]:
pc_title=[];
for i in range(1,25):
    pc_title.append(f'Principal Component {i}');

Z25 = Z[:,0:25]  

import warnings
warnings.filterwarnings('ignore')
with warnings.catch_warnings():      #Catch warnings in code section
    warnings.simplefilter("ignore")
    
    plt.subplots(figsize=(10,10));
    ax = plt.gca();
    pd.DataFrame(Z25).hist(bins=30, figsize=(1,1), grid=False, layout=(5,5), sharex=False, ax=ax, alpha=0.5);
    plt.tight_layout();

# ML Models

## Definitions

In [None]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.metrics import accuracy_score
from sklearn import model_selection
from sklearn.model_selection import StratifiedKFold

#Callbacks
from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10, restore_best_weights=True)

## Basic ANN Model

In [None]:
#array to hold model info (str: name, model: model, data_to_take: z/img)
models = []; 

In [None]:
def build_basicANN(optimizer='adam', epochs=100, batch_size=50, neurons=225):
    
    #Initializing ANN
    m= tf.keras.models.Sequential()
    
    #Add input layer
    m.add(tf.keras.layers.Dense(units=neurons, activation='relu'))
    
    #Add hidden layer
    m.add(tf.keras.layers.Dense(units=(neurons/2), activation='relu'))
    
    #Add output layer
    m.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
    
    #Compiling ANN
    m.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
    
    #Return compiled, unfitted model
    return m;

In [None]:
%%time

#Build Model, Using defaults
## mANNBasic = build_basicANN() 
mANNBasic = (KerasClassifier(build_fn=build_basicANN, epochs=100, batch_size=50, optimizer='adam', verbose=0));

#Training ANN
hist_ANNBasic = mANNBasic.fit(Z_train, y_train, batch_size = 100, epochs = 100, verbose=0)

models.append(('ANN Basic', mANNBasic, 'z'))

In [None]:
print(f'Accuracy of the unoptimized Basic ANN model = {round(accuracy_score(y_true=y_test, y_pred=mANNBasic.predict(x=Z_test)) * 100,3)}%')

## Basic CNN Model
See report for info on how we defined "Basic" vs "Advanced" CNN

In [None]:
# Random-ish architecture

def build_basicCNN(optimizer='adam', epochs=100, batch_size=50, neurons=225):
    
    m = tf.keras.models.Sequential()
    m.add(tf.keras.layers.Conv2D(filters=neurons, kernel_size=3, activation='relu', input_shape=[15, 15, 1]))
    m.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
    m.add(tf.keras.layers.Conv2D(filters=neurons/2, kernel_size=3, activation='relu'))
    m.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
    m.add(tf.keras.layers.Flatten())
    m.add(tf.keras.layers.Dense(units=neurons, activation='relu'))
    m.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
    m.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])

    return m;


In [None]:
%%time

#Build model using defaults
#mCNNBasic = build_basicCNN()
mCNNBasic = (KerasClassifier(build_fn=build_basicCNN, epochs=100, batch_size=50, optimizer='adam', verbose=0));

### ORIGINAL
hist_CNNBasic = mCNNBasic.fit(x=x_train_img,
                              y=y_train, 
                              batch_size = 50,
                              epochs = 100, 
                              verbose=0,
                              callbacks=es,
                              validation_data=(x_test_img, y_test))


models.append(('CNN Basic', mCNNBasic, 'img'))

In [None]:
print(f'Accuracy of the unoptimized Basic CNN model = {round(accuracy_score(y_true=y_test, y_pred=mCNNBasic.predict(x=x_test_img)) * 100,3)}%')

In [None]:
%%time
#### DATAGEN -- DOES NOT USE KERASCLASSIFIER DUE TO ERROR
mCNNBasic2 = build_basicCNN()
hist_CNNBasic2 = mCNNBasic2.fit(
                               x=imgs_train,
                               #y=y_train, 
                               batch_size = 50,
                               epochs = 100, 
                               verbose=0,
                               callbacks=es,
                               validation_data=imgs_test
                               )

## Advanced CNN Model

In [None]:
def build_advancedCNN(optimizer='adam', epochs=100, batch_size=50, neurons=225):
    #params
    initFilt = neurons;
    initUnits= neurons;
    
    #model
    m = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(filters=initFilt, kernel_size=3, activation='relu', input_shape=[15,15,1]),
        tf.keras.layers.Conv2D(filters=initFilt/2, kernel_size=3, activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=initUnits),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ])

    m.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
    
    return m;

In [None]:
%%time

# build using defaults
#mCNNAdvanced = build_advancedCNN()
mCNNAdvanced = (KerasClassifier(build_fn=build_advancedCNN, epochs=100, batch_size=50, optimizer='adam', verbose=0));

#fit
hist_CNNAdvanced = mCNNAdvanced.fit(x_train_img,
                      y=y_train, 
                      batch_size = 50,
                      epochs = 100, 
                      verbose=0,
                      callbacks=es,
                      validation_data=(x_test_img, y_test))

models.append(('CNN Advanced', mCNNAdvanced, 'img'))

In [None]:
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) #ignore warnings

print(f'Accuracy of the unoptimized Advanced CNN model = {round(accuracy_score(y_true=y_test, y_pred=mCNNAdvanced.predict(x=x_test_img)) * 100,3)}%')

In [None]:
%%time
#### DATAGEN -- DOES NOT USE KERASCLASSIFIER DUE TO ERROR
mCNNAdvanced2 = build_advancedCNN()
hist_CNNAdvanced2 = mCNNBasic2.fit(
                               x=imgs_train,
                               #y=y_train, 
                               batch_size = 50,
                               epochs = 100, 
                               verbose=0,
                               callbacks=es,
                               validation_data=imgs_test
                               )

## Random Forest Model

In [None]:
%%time
from sklearn.ensemble import RandomForestClassifier
RFCmodel = RandomForestClassifier(n_estimators=100); #N_estimators and criterion can be optimized.
RFCmodel.fit(Z_train, y_train);
models.append(('RF', RFCmodel, 'z'));

## Logistic Regression Model

In [None]:
from sklearn.linear_model import LogisticRegression
LRmodel = LogisticRegression(solver='newton-cg');
LRmodel.fit(Z_train, y_train);
models.append(('LR',LRmodel, 'z'));

# Performance Comparison
## Via K-Fold Cross-Validation
### For SKLearn Models

In [None]:
%%time
#Suppress warnings for non-convergent ANN models
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

# Number of splits to make.
N = 3;


CV_results = [];
scoring = 'accuracy';

trun=0;
for tp in models:
    
    #Check whether model uses Z dataset or images for training
    mode = tp[2];
    
    if (mode == 'z'):
        kfold = StratifiedKFold(n_splits=N, shuffle=True)
        #kfold = model_selection.KFold(n_splits=N);
        CVinternal_results = model_selection.cross_val_score(tp[1], Z, y, cv=kfold, scoring=scoring);
        CV_results.append((CVinternal_results));
        
    if (mode == 'img'):
        kfold = StratifiedKFold(n_splits=N, shuffle=True)
        #kfold = model_selection.KFold(n_splits=N);
        CVinternal_results = model_selection.cross_val_score(tp[1], all_images, y, cv=kfold, scoring=scoring);
        CV_results.append((CVinternal_results));    
    
    print(f'run#{trun} for model \"{tp[0]}\" returned {CVinternal_results}')
    trun+=1;


### For Keras Models
$\color{red}{NOTE:}$ The Following was not implemented due to model overfitting by all developed ML models. The results of K-Fold CV are essentially useless until more data is compiled.

In [None]:
names = [];
for tp in models:
    names.append(tp[0]);
    
CVdf = pd.DataFrame(CV_results).T;
CVdf.columns = names;
CVdf.T

ax2 = sns.boxplot(data=CVdf, palette='Spectral')
ax2.set(xlabel = "ML Algorithm",
       ylabel = 'Accuracy',
       title = f"ML Algorithm Accuracy Comparison \nfor Cross-Validation with {N} Splits");
sns.despine(ax=ax2,offset=5, trim=False)
ax2.plot();
plt.ylim(0.95,1);

# Conclusion

With the availability of sophisticated technology such as the Muse 2, it is valuable to understand how brainwave data is impacted by the level of fatigue experienced by a person. Our study aimed to determine whether a machine learning model can accurately predict if a person is fatigued or not simply by analyzing their brainwaves.

After collecting data for both fatigued and not fatigued individuals, randomizing and scaling the data and training five different machine learning models on the dataset, our study concluded that all of our models (Artificial Neural Network, two different Convolutional Neural Network, Random Forest and Logistic Regression) were able to predict fatigued or not with an accuracy of 100% or very close to it. Although our results show our study to be surprisingly promising, it is important to note machine learning models tend to overfit smaller amounts of data input resulting in an accuracy of 100%, as is the case in our study. Due to this reason, it is difficult to compare the accuracy of one model to another. 

Future research into the correlation between brain activity and state of fatigue should aim to apply data collected over the span of many months. Furthermore, while our research focused on fatigue resulting from sleep deprivation, this can be expanded to include a wide range of both physical and mental fatigue.