# MANU 465 Capstone Project

The following is the code for MANU 465 EEG Group #9 Capstone Project. The code is written in Python and uses the following libraries:

* Pandas
* Numpy
* Matplotlib
* Seaborn
* Scikit-learn


## Objective

Our objective is to use machine learning to determine if a person is left or right handed based on brainwave data.

## Setup

### Import Libraries

In [174]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import tensorflow.keras as kr
import seaborn as sns
import os

### Authors

In [175]:
d = {'Josiah': 7412148, 'Muyang Li': 000000, 'Sofi': 0, 'Marti': 0, 'Lisa': 0, 'Simon': 0}    # please add your name and student ID
Authors = pd.DataFrame(d.items(), columns=['Name', 'Student ID'])
Authors

Unnamed: 0,Name,Student ID
0,Josiah,7412148
1,Muyang Li,0
2,Sofi,0
3,Marti,0
4,Lisa,0
5,Simon,0


## Importing Raw Data

Get Current Working directory and append the data relative dir


In [176]:
cwd = os.getcwd()
LHD_dir = cwd + r"\Data Collection\Unprocessed Data\Left Handed Dominant"
RHD_dir = cwd + r"\Data Collection\Unprocessed Data\Right Hand Dominant"

# Hold file locations
leftHandDominant    = []
rightHandDominant   = []

#Populate file location arrays
for file in os.listdir(LHD_dir):
    if file.endswith('.csv'):
        leftHandDominant.append(os.path.join(LHD_dir, file))
for file in os.listdir(RHD_dir):
        if file.endswith('.csv'):
            rightHandDominant.append(os.path.join(RHD_dir, file))
            
#Test reading files by changing num
num = 9
sample = pd.read_csv(leftHandDominant[num])
sample.head()

Unnamed: 0,TimeStamp,Delta_TP9,Delta_AF7,Delta_AF8,Delta_TP10,Theta_TP9,Theta_AF7,Theta_AF8,Theta_TP10,Alpha_TP9,...,HSI_AF7,HSI_AF8,HSI_TP10,Battery,Elements,Participant,Test,Gender,English,Dominance
0,14:57.0,1.056179,1.324996,1.302308,0.964805,0.591862,0.955451,0.752519,1.228747,0.730552,...,1.0,1.0,2.0,50.0,,215.0,LHC,Male,No,Left
1,14:57.6,,,,,,,,,,...,,,,,/muse/elements/blink,215.0,LHC,Male,No,Left
2,14:58.0,1.056179,1.324996,1.252481,0.964805,0.591862,0.955451,0.880058,1.228747,0.730552,...,1.0,1.0,2.0,50.0,,215.0,LHC,Male,No,Left
3,14:59.1,1.056179,0.859274,1.064056,0.964805,0.591862,0.56391,0.899856,1.228747,0.730552,...,1.0,1.0,2.0,50.0,,215.0,LHC,Male,No,Left
4,15:00.0,,,,,,,,,,...,,,,,/muse/elements/blink,215.0,LHC,Male,No,Left


### Summary of Unprocessed Data

In [177]:
#Mini-Summary of Block
print(f"> {len(leftHandDominant)} files were added from the LHD category")
print(f"> {len(rightHandDominant)} files were added from the RHD category\n")

> 72 files were added from the LHD category
> 321 files were added from the RHD category



### MUSE Features

Features generated by the Muse 2 headband

In [178]:
pd.DataFrame(sample.columns[0:39]).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29,30,31,32,33,34,35,36,37,38
0,TimeStamp,Delta_TP9,Delta_AF7,Delta_AF8,Delta_TP10,Theta_TP9,Theta_AF7,Theta_AF8,Theta_TP10,Alpha_TP9,...,Gyro_X,Gyro_Y,Gyro_Z,HeadBandOn,HSI_TP9,HSI_AF7,HSI_AF8,HSI_TP10,Battery,Elements


Features added from our data collection

In [179]:
pd.DataFrame(sample.columns[39:]).T

Unnamed: 0,0,1,2,3,4
0,Participant,Test,Gender,English,Dominance


### Raw Data Summary

In [180]:
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 44 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   TimeStamp        19 non-null     object 
 1   Delta_TP9        16 non-null     float64
 2   Delta_AF7        16 non-null     float64
 3   Delta_AF8        16 non-null     float64
 4   Delta_TP10       16 non-null     float64
 5   Theta_TP9        16 non-null     float64
 6   Theta_AF7        16 non-null     float64
 7   Theta_AF8        16 non-null     float64
 8   Theta_TP10       16 non-null     float64
 9   Alpha_TP9        16 non-null     float64
 10  Alpha_AF7        16 non-null     float64
 11  Alpha_AF8        16 non-null     float64
 12  Alpha_TP10       16 non-null     float64
 13  Beta_TP9         16 non-null     float64
 14  Beta_AF7         16 non-null     float64
 15  Beta_AF8         16 non-null     float64
 16  Beta_TP10        16 non-null     float64
 17  Gamma_TP9        1

## Data Preprocessing

Here we are processing the data to make it more usable for our machine learning models and to pas it to Jordan Bird's code 'EEG_Feature_Extraction.py'

### Raw Dataset

Extract rows 21-25 from all files, which are are the only 5 features relevant for use in the EEG_feature_extraction function.

In [181]:
pd.DataFrame(sample.columns[21:25]).T

Unnamed: 0,0,1,2,3
0,RAW_TP9,RAW_AF7,RAW_AF8,RAW_TP10


In [182]:
rowsLHD = []
rowsRHD = []

for f in leftHandDominant:
    for r in range(pd.read_csv(f, encoding = "ISO-8859-1").shape[0]):
        rowsLHD.append(pd.read_csv(f, encoding = "ISO-8859-1").iloc[r, [0, 21, 22, 23, 24, 25]])

for f in rightHandDominant:
    for r in range(pd.read_csv(f, encoding = "ISO-8859-1").shape[0]):
        rowsRHD.append(pd.read_csv(f, encoding = "ISO-8859-1").iloc[r, [0, 21, 22, 23, 24, 25]])

### Convert to Dataframe

For the LHD and RHD datasets, we need to convert the data to a dataframe so that we can use the EEG_feature_extraction function.

For the LHD

In [242]:
# get only first 6 rows

data_LHD = pd.DataFrame(rowsLHD)
data_LHD = data_LHD.iloc[:, 0:6]
original_data_LHD = data_LHD.copy()
data_LHD

Unnamed: 0,TimeStamp,RAW_TP9,RAW_AF7,RAW_AF8,RAW_TP10,AUX_RIGHT
0,41:40.0,1297.033,1120.1465,1171.7216,1573.4432,768.3883
1,41:41.1,213.95604,913.0403,732.1245,65.27473,672.49084
2,41:42.1,988.7912,672.49084,1021.4286,1385.2748,757.1062
3,41:43.1,0.0,616.8865,1621.7949,1129.8169,843.3333
4,41:44.1,120.87912,0.0,0.0,0.0,399.70697
...,...,...,...,...,...,...
40,,768.791209,782.893773,683.369963,,
41,,,,,,
42,,,,,,
43,,,,,,


And for the RHD

In [184]:
data_RHD = pd.DataFrame(rowsRHD)
data_RHD = data_RHD.iloc[:, 0:6]
original_data_RHD = data_RHD.copy()
data_RHD

Unnamed: 0,TimeStamp,RAW_TP9,RAW_AF7,RAW_AF8,RAW_TP10,AUX_RIGHT
0,35:45.4,886.4469,493.99268,1622.1978,0.0,983.1502
1,35:46.4,784.5055,770.40295,1622.1978,741.39197,836.4835
2,35:47.4,814.7253,950.9158,602.381,151.90475,781.685
3,35:48.4,858.6447,833.663,212.34433,1466.6666,826.0073
4,35:49.4,819.9634,847.3626,466.1905,904.57874,743.4066
...,...,...,...,...,...,...
29,,,,,,
30,,826.4103,283.663,691.02563,,
31,,788.93774,864.2857,830.84247,,
32,,784.90845,860.2564,911.8315,,


Check dataframes

In [185]:
print(f"LHD Data size is: \t{data_LHD.shape}", f"\nRHD Data size is: \t\t{data_RHD.shape}")

LHD Data size is: 	(3273, 6) 
RHD Data size is: 		(13706, 6)


### Remove Empty Rows

Remove NaN rows from the dataframes

In [186]:
data_LHD = data_LHD.dropna()
data_RHD = data_RHD.dropna()

Check dataframes again

In [187]:
print(f"LHD Data size is: \t{data_LHD.shape}", f"\nRHD Data size is: \t\t{data_RHD.shape}")

LHD Data size is: 	(1132, 6) 
RHD Data size is: 		(5498, 6)


### Converted Datetime Column to Timestamps

Needed for compatibility with EEG_feature_extraction function

In [244]:
import datetime as dt

ind = 0
for time in data_LHD.iloc[:, 0]:
    # convert t to timestamp
    print(type(time))
    date_time = dt.datetime.strptime(str(time), '%M:%S.%f')
    print(type(date_time))
    data_LHD.iloc[ind, 0] = date_time
    ind += 1 
    
    
# ind = 0
# for time in data_RHD.iloc[:, 0]:
#     # convert to datetime object from string in format: 00:00.0
#     data_RHD.iloc[ind, 0] = datetime.strptime(str(time), '%M:%S.%f')
#     ind += 1

<class 'datetime.datetime'>


ValueError: time data '1900-01-01 00:41:40' does not match format '%M:%S.%f'

Quick check of the dataframes

In [189]:
data_LHD

Unnamed: 0,TimeStamp,RAW_TP9,RAW_AF7,RAW_AF8,RAW_TP10,AUX_RIGHT
0,41:40.0,1297.033,1120.1465,1171.7216,1573.4432,768.3883
1,41:41.1,213.95604,913.0403,732.1245,65.27473,672.49084
2,41:42.1,988.7912,672.49084,1021.4286,1385.2748,757.1062
3,41:43.1,0.0,616.8865,1621.7949,1129.8169,843.3333
4,41:44.1,120.87912,0.0,0.0,0.0,399.70697
...,...,...,...,...,...,...
16,20:52.6,809.48718,752.673993,813.113553,814.725275,772.820513
18,20:53.6,820.769231,766.776557,894.908425,761.538462,830.43956
21,20:54.6,778.864469,782.087912,640.25641,808.681319,729.304029
23,20:55.6,835.274725,858.241758,819.56044,881.611722,626.959707
