# Author: Makayla McKibben
## Course: DSC550 Data Mining
## Assignment: Term Project Milestone 3
## Date: 11.04.2024

Global awareness and tolerance of mental health conditions have been increasing rapidly in the last few years. Mental illness is a topic that affects a great many people, but schizophrenia is still somewhat mysterious to a lot of everyday people. Portraying schizophrenia is done in a dramatized and dangerous way in most television shows and movies. They imply that everyone with schizophrenia is dangerous or violent and has no care for anybody else. It would likely shock many people how far that is from the truth and how many 'regular' people who have schizophrenia there are. Regarless, whether you are directly impacted or have a spouse, relative, or friend who deals with a mental illness, the detriment of having a mental illness will touch nearly everyone.

We will be using several datasets during the EDA phase of this project that address the need for repeated testing. These datasets are mixed, some from healthy people and some from people with schizophrenia. The datasets are EEG (electroencephalogram) readings. There are subtle differences in the EEGs between those with and those without schizophrenia. The goal with these datasets would be to find a reliable, broadly applicable way to determine if someone has schizophrenia with an improved accuracy. Many current models are hand-tailored to their datasets and so do not perform well on new data; the goal is to develop a model with wide applicability while maintaining high to excellent accuracy.

In [3]:
# Import relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import string
from sklearn import preprocessing
import warnings
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer
from os import listdir
from os.path import isfile, join
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [4]:
!pip install modin[ray]



In [5]:
pip install -U ipywidgets

Note: you may need to restart the kernel to use updated packages.


In [6]:
# Remove future warning
warnings.simplefilter(action='ignore', category=FutureWarning)

The goal of this exercise will be to find out if there are specific EEG patterns present in ill patients different from healthy patients which can reliably indicate if someone may have a schizophrenia spectrum illness.

In [8]:
# Create list of files for healthy batch 'b' datasets
healthy_lst_b = [f for f in listdir('EEG Schizophrenia/EEG Healthy/EEG Healthy B') if isfile(join('EEG Schizophrenia/EEG Healthy/EEG Healthy B', f))]

In [9]:
# Create list of files for schizophrenic batch 'b' datasets
schizo_lst_b = [f for f in listdir('EEG Schizophrenia/EEG Schizo/EEG Schizo B') if isfile(join('EEG Schizophrenia/EEG Schizo/EEG Schizo B', f))]

In [10]:
# Open each healthy b file in the list and append to list
i = 0
healthy_lst_dfs_b = []
for file in healthy_lst_b:
    dir = 'EEG Schizophrenia/EEG Healthy/EEG Healthy B/' + str(file)
    temp_hp = pd.read_csv(dir, header = None, delim_whitespace = True)
    healthy_lst_dfs_b.append(temp_hp)
    i += 1

In [11]:
# Open each schizo b file in the list and append to list
i = 0
schizo_lst_dfs_b = []
for file in schizo_lst_b:
    dir = 'EEG Schizophrenia/EEG Schizo/EEG Schizo B/' + str(file)
    temp_schizo = pd.read_csv(dir, header = None, delim_whitespace = True)
    #print(len(temp_schizo))
    schizo_lst_dfs_b.append(temp_schizo)
    i += 1

In [12]:
# Create list of files for healthy batch 'a' datasets
healthy_lst_a = [f for f in listdir('EEG Schizophrenia/EEG Healthy/EEG Healthy A') if isfile(join('EEG Schizophrenia/EEG Healthy/EEG Healthy A', f))]

In [13]:
# Create list of files for schizophrenic batch 'a' datasets
schizo_lst_a = [f for f in listdir('EEG Schizophrenia/EEG Schizo/EEG Schizo A') if isfile(join('EEG Schizophrenia/EEG Schizo/EEG Schizo A', f))]

In [14]:
# Open each file from our list
i = 0
healthy_lst_dfs_a = []
for file in healthy_lst_a:
    dir = 'EEG Schizophrenia/EEG Healthy/EEG Healthy A/' + str(file)
    temp_hp = pd.read_csv(dir)
    healthy_lst_dfs_a.append(temp_hp)
    i += 1    
print(temp_hp.shape)


(216250, 19)


In [15]:
# Open each file from our list 
i = 0
schizo_lst_dfs_a = []
for file in schizo_lst_a:
    dir = 'EEG Schizophrenia/EEG Schizo/EEG Schizo A/' + str(file)
    temp_schiz = pd.read_csv(dir)
    schizo_lst_dfs_a.append(temp_schiz)
    i += 1   

In [16]:
# Create header list for b datasets
electrode_pos_b = ['F7', 'F3', 'F4', 'F8', 'T3', 'C3', 'Cz', 'C4', 'T4', 'T5', 'P3', 'Pz', 'P4', 'T6', 'O1', 'O2']

In [17]:
# Create empty lists
healthy_b = []
healthy_b_mean = []
# Reshape the df's in the b healthy batch and append the restructured dfs in a list
for df in healthy_lst_dfs_b:
    scaler = preprocessing.StandardScaler()
    df = scaler.fit_transform(df)
    df = pd.DataFrame(df)
    reshaped = pd.DataFrame(df.values.reshape(7680, len(electrode_pos_b)), columns = electrode_pos_b)
    healthy_b.append(pd.DataFrame(reshaped))
    # Find means and append to list
    for column in reshaped:
        healthy_b_mean.append(reshaped[column].mean())

In [18]:
# Reshape the df's in the b schizophrenic batch and append the restructured dfs in a list
schizo_b = []
schizo_b_mean = []
# Scale dfs then reshape and append to list
for df in schizo_lst_dfs_b:
    df = scaler.fit_transform(df)
    df = pd.DataFrame(df)
    reshaped = pd.DataFrame(df.values.reshape(7680, len(electrode_pos_b)), columns = electrode_pos_b)
    schizo_b.append(reshaped)
    # Find means and append to list
    for column in reshaped:
        schizo_b_mean.append(reshaped[column].mean())

In [19]:
# Make then reshape dfs
schizo_means_b = pd.DataFrame(schizo_b_mean)
schizo_means_b = pd.DataFrame(schizo_means_b.values.reshape(45, len(electrode_pos_b)), columns = electrode_pos_b)
schizo_means_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,0.062508,0.05589,0.039356,0.031961,0.00734,-0.001046,-0.013607,-0.020776,-0.034788,-0.045534,-0.055916,-0.05144,-0.028383,-0.011717,0.024642,0.041508
1,-0.010461,-0.004568,0.005993,0.01622,0.048791,0.057951,0.052119,0.039004,0.021699,0.013961,-0.010793,-0.025108,-0.046094,-0.052717,-0.056956,-0.049041
2,0.007,0.007243,0.00122,-0.000867,-0.001182,0.001879,0.00734,0.010072,0.014667,0.012229,-0.00143,-0.008079,-0.014714,-0.016698,-0.012128,-0.006551
3,-0.005213,-0.003884,0.002828,0.006935,0.017678,0.020956,0.023642,0.017097,-0.006178,-0.011561,-0.010913,-0.006697,-0.00866,-0.011751,-0.012791,-0.011487
4,0.011827,0.010201,0.004216,0.000479,-0.005485,-0.007247,-0.005221,-0.005584,-0.010845,-0.008544,-0.001832,-0.001975,0.00185,0.000109,0.006985,0.011064
5,-0.024552,-0.013543,0.016129,0.03167,0.050851,0.052446,0.041801,0.032983,0.013317,0.004268,-0.017845,-0.029535,-0.041782,-0.042198,-0.037914,-0.036095
6,0.00113,0.010401,0.005249,0.001769,-0.003636,-0.004169,-0.003709,-0.00175,0.005295,0.003922,0.010547,0.009888,0.004522,-0.004322,-0.019072,-0.016066
7,-0.004819,-0.000434,0.000351,-0.002177,0.006785,0.019025,0.034672,0.039446,0.016157,0.000627,-0.021013,-0.024625,-0.018619,-0.015949,-0.013928,-0.0155
8,0.005854,0.010955,0.006176,0.003006,-0.000246,0.001309,0.00317,0.002277,0.003394,0.003271,0.004825,0.005452,-0.002585,-0.009679,-0.019442,-0.017737
9,0.008115,0.012181,0.019801,0.010837,-0.029893,-0.049808,-0.055766,-0.041334,0.005091,0.027401,0.055018,0.05038,0.014862,-0.005172,-0.013846,-0.007866


In [20]:
# Make then reshape dfs
healthy_means_b = pd.DataFrame(healthy_b_mean)
healthy_means_b = pd.DataFrame(healthy_means_b.values.reshape(39, len(electrode_pos_b)), columns = electrode_pos_b)
healthy_means_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,0.016793,0.011504,0.005642,0.002418,-0.00419,-0.011146,-0.02616,-0.03199,-0.026718,-0.015509,0.009407,0.011436,0.005287,0.00679,0.020797,0.025639
1,-0.05771,-0.078287,-0.087469,-0.069711,-0.028648,-0.020001,-0.004383,0.012903,0.045175,0.052449,0.058157,0.063719,0.065156,0.051194,0.007889,-0.010433
2,0.009372,0.009894,0.001414,-0.009085,-0.03193,-0.046348,-0.059883,-0.055121,-0.034367,-0.016168,0.027101,0.044945,0.057101,0.052658,0.031109,0.019308
3,-0.027958,-0.028439,-0.025369,-0.021895,-0.015201,-0.011558,-0.001286,0.006946,0.030256,0.037405,0.03878,0.035986,0.013734,0.002146,-0.013862,-0.019685
4,0.003996,0.006633,0.006587,0.006426,0.003427,0.004019,0.004377,0.002421,-0.002308,-0.004197,-0.01545,-0.013365,-0.007886,-0.00404,0.004033,0.005326
5,0.014804,0.045977,0.087394,0.096146,0.088983,0.0786,0.057506,0.045807,0.020053,0.00248,-0.04436,-0.07097,-0.112246,-0.120884,-0.104852,-0.084439
6,-0.026477,-0.028226,-0.010539,0.0028,0.017735,0.013145,0.002116,-0.007046,-0.018773,-0.013695,0.001905,0.008497,0.018623,0.025282,0.013135,0.001518
7,0.015399,0.023005,0.039137,0.046663,0.04726,0.033278,-0.013657,-0.031632,-0.041912,-0.037352,-0.014151,-0.007074,-0.012953,-0.018562,-0.016703,-0.010745
8,-0.022514,-0.019569,-0.002157,0.006349,0.000767,-0.010652,-0.032213,-0.030766,-0.006061,0.009606,0.027548,0.028712,0.022735,0.017845,0.00917,0.0012
9,0.004349,-0.004816,-0.014523,-0.016278,-0.023092,-0.024158,-0.017754,-0.012927,-0.002052,-0.000536,0.006999,0.011606,0.018726,0.024571,0.025861,0.024024


In [21]:
# Create empty lists
healthy_a = []
healthy_a_mean = []
# Scale dfs then append to list
for df in healthy_lst_dfs_a:
    df = scaler.fit_transform(df)
    df = pd.DataFrame(df)
    healthy_a.append(df)
    # Find means and append to list
    for column in df:
        healthy_a_mean.append(df[column].mean())
# Create a df for all the healthy a datasets
healthy_a = pd.concat(healthy_lst_dfs_a)
healthy_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216245,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216246,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216247,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216248,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09


In [22]:
# Create list of electrode positions
electrode_pos_a = healthy_a.columns.tolist()

In [23]:
# Make then reshape dfs
healthy_means_a = pd.DataFrame(healthy_a_mean)
healthy_means_a = pd.DataFrame(healthy_means_a.values.reshape(14, len(electrode_pos_a)), columns = electrode_pos_a)
healthy_means_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz
0,3.625688e-18,2.246083e-17,-1.7022300000000002e-17,-1.236344e-17,-3.0276800000000005e-17,9.724834e-18,-2.058654e-18,-5.77652e-18,-1.1499270000000001e-17,2.383583e-17,9.079584e-18,8.296067000000001e-18,2.68854e-18,1.8727600000000002e-17,1.0062820000000001e-17,-1.54399e-17,-1.516337e-17,-1.788263e-17,-2.3136810000000003e-17
1,-8.37035e-18,-5.9342030000000004e-18,-1.343004e-17,6.148928e-18,-5.559411e-18,3.685452e-18,-4.575583e-18,-1.013499e-17,7.003921e-18,-1.767768e-17,-1.022869e-17,-2.023876e-17,2.030122e-17,2.250312e-17,1.7396580000000002e-17,4.903526e-18,1.923931e-17,-3.101402e-17,-2.201902e-18
2,5.122154e-18,-3.3731260000000002e-18,-1.1243750000000002e-17,-7.121044e-18,6.0435170000000004e-18,-1.030677e-17,1.417962e-17,2.873404e-18,5.809272e-18,-5.715575e-18,6.933648e-18,8.37035e-18,6.996113e-18,2.139436e-18,5.559411e-18,-2.6360350000000003e-17,8.37035e-18,1.3554970000000001e-17,-2.623542e-17
3,-6.606127e-18,-2.08938e-18,-1.3903590000000002e-17,-7.374281e-19,3.3798789999999996e-19,4.516747e-18,5.684342e-18,8.818411e-18,8.61101e-18,9.571203e-18,-3.898383e-18,-1.2259740000000002e-17,1.5025100000000002e-17,7.343555e-18,-1.0846340000000001e-17,4.608926e-20,-8.618691e-18,-6.3603180000000004e-18,-1.013196e-17
4,6.255784e-18,-9.353600000000001e-18,2.168471e-17,1.418078e-17,5.571557e-18,5.173052e-18,-2.6196090000000004e-17,-2.5474270000000002e-17,3.970017e-18,-3.50384e-18,1.70831e-17,7.654312e-18,-6.6768460000000004e-18,-6.736998e-18,-2.3850180000000003e-17,-2.201555e-17,3.248195e-18,5.233204e-18,-3.3083469999999997e-19
5,3.42283e-18,-8.801562e-18,-1.4669270000000002e-17,5.867708e-18,-1.1246440000000001e-17,2.0048000000000003e-17,-4.889755999999999e-19,0.0,1.3202340000000002e-17,-1.2713370000000001e-17,-4.889755999999999e-19,5.134244e-18,1.9559030000000002e-18,-1.2224390000000001e-17,4.889755999999999e-19,5.378732e-18,-1.2224390000000001e-17,-1.9559030000000002e-18,7.334635000000001e-18
6,1.264922e-18,6.715018999999999e-19,-8.963770000000001e-18,1.432017e-17,4.997224e-18,1.8895750000000002e-18,6.37146e-18,-1.0947040000000001e-17,8.901305e-19,-2.6516520000000002e-17,6.808717e-18,-9.838283999999998e-19,-2.2940380000000002e-17,-6.07475e-18,1.3664280000000001e-17,5.840505e-18,-1.7583980000000002e-17,-2.436147e-18,3.123265e-19
7,3.466824e-18,-7.43337e-18,-2.6079260000000002e-18,-4.349146e-18,-5.075305e-18,1.063472e-17,6.574472e-18,9.682121e-19,-1.063472e-17,1.1651730000000002e-17,2.810938e-18,-4.84106e-18,-2.248751e-18,3.092032e-18,-2.2799830000000002e-17,-5.372015e-18,-1.202457e-18,9.479109000000001e-18,8.401582e-18
8,-2.2391910000000003e-17,1.6487730000000003e-17,1.7398480000000002e-17,-1.1211660000000001e-17,6.87774e-18,-8.385189e-18,1.614227e-17,-3.768624e-18,-1.1400090000000002e-17,-1.0159580000000002e-17,1.3253000000000002e-17,2.025636e-17,7.00336e-18,1.4886070000000002e-17,-2.266671e-17,-7.317412e-18,-3.705814e-18,-1.1086040000000001e-17,2.323985e-18
9,1.480988e-17,2.511992e-17,-5.95199e-18,1.0451030000000002e-17,1.0833389999999999e-19,8.730435000000001e-18,-6.7421900000000005e-18,-7.678959999999999e-19,1.739715e-17,-2.063521e-18,1.8958430000000003e-17,-4.550023e-18,-1.8518720000000002e-17,-1.752778e-17,9.447351e-18,1.09226e-17,8.089991e-18,-2.2660900000000002e-17,1.911774e-20


In [24]:
# Create a df for all the healthy b datasets
healthy_b = pd.concat(healthy_b)
healthy_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,1.007423,1.482913,1.425500,1.073003,1.007423,1.187770,1.425500,1.310734,1.073003,1.245153,1.187770,0.597513,-0.296054,-0.353467,0.236790,0.474550
1,0.179407,-0.058324,-0.476430,-0.714161,-0.951921,-0.828927,-0.058324,-0.000940,-0.058324,-0.115707,0.056443,0.359783,0.712280,0.712280,0.294203,0.122023
2,0.417166,0.531933,0.417166,0.122023,-0.115707,-0.181287,-0.238671,-0.533814,-0.771544,-0.828927,-0.476430,-0.296054,-0.058324,-0.115707,-0.000940,-0.000940
3,-0.000940,0.056443,-0.058324,-0.181287,-0.000940,0.294203,0.597513,0.531933,0.179407,0.236790,0.654897,0.654897,0.712280,0.835244,1.245153,1.130387
4,-0.115707,-0.181287,-0.296054,-0.533814,-1.066687,-0.894508,-0.058324,0.597513,1.245153,1.482913,1.548494,1.663260,1.548494,1.482913,1.605877,1.548494
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,-1.601020,-0.801288,1.471772,2.719469,3.870921,3.668515,2.520360,1.571315,-0.499315,-1.398590,-2.248094,-2.599837,-3.548883,-3.621879,-2.423953,-1.501453
7676,0.396613,0.798153,1.073554,1.246117,1.571315,1.372205,0.721811,0.622269,0.323617,-0.001580,-0.024805,0.472956,0.821378,0.323617,-0.250460,-0.250460
7677,0.024966,0.074737,0.074737,-0.051351,-0.751517,-0.851059,-0.150893,0.423160,1.621086,1.770424,1.222892,0.798153,0.147758,0.048191,-0.350002,-0.801288
7678,-1.574474,-1.524703,-0.874308,-0.499315,-0.150893,-0.150893,0.273846,0.423160,-0.077897,-0.399773,-0.950626,-1.149710,-1.501453,-1.750334,-2.798921,-3.147369


In [25]:
# Create empty lists
schizo_a = []
schizo_a_mean = []
# Scale dfs then append to list
for df in schizo_lst_dfs_a:
    df = scaler.fit_transform(df)
    df = pd.DataFrame(df)
    schizo_a.append(df)
    # Find means and append to list
    for column in df:
        schizo_a_mean.append(df[column].mean())
# Create a df for all the schizo a datasets
schizo_a = pd.concat(healthy_lst_dfs_a)
schizo_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216245,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216246,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216247,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
216248,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09


In [26]:
# Make then reshape dfs
schizo_means_a = pd.DataFrame(schizo_a_mean)
schizo_means_a = pd.DataFrame(schizo_means_a.values.reshape(14, len(electrode_pos_a)), columns = electrode_pos_a)
schizo_means_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz
0,1.2041390000000002e-17,5.650707e-18,1.0763250000000002e-17,-5.650707e-18,3.19534e-18,1.365587e-17,-1.143595e-18,-1.0157820000000001e-17,1.1940480000000001e-17,-1.3067260000000001e-17,1.1772309999999998e-19,5.919788e-18,-1.315135e-17,-7.601546e-18,7.475414e-18,-5.3143550000000004e-18,-2.0458590000000002e-17,2.01811e-17,1.641396e-17
1,-1.1319040000000002e-17,1.85672e-17,8.936083e-18,-1.70282e-17,1.552644e-17,-1.0127560000000001e-17,7.645316e-18,-1.886506e-17,-6.850997e-18,1.9063640000000003e-17,-1.4893469999999998e-19,-3.2418120000000004e-17,-1.1319040000000002e-17,1.042543e-18,-3.624078e-18,-3.2070610000000004e-17,-8.638214e-18,2.680825e-18,-1.948563e-17
2,7.98992e-18,-5.587048e-18,1.394551e-17,1.5213280000000003e-17,1.1483670000000001e-17,1.728447e-17,-1.780779e-17,1.376861e-17,-1.061392e-18,-3.197074e-17,2.7389800000000002e-17,-1.5294360000000002e-17,1.963575e-17,4.923678e-18,1.612726e-17,1.1041420000000001e-17,1.112987e-18,-1.7468740000000002e-17,1.450569e-17
3,-9.293073e-18,9.623284e-18,8.113749e-18,-1.6982270000000002e-17,-1.485948e-17,7.689193e-18,-1.1227160000000001e-17,-4.953161e-18,3.066243e-19,-1.405754e-17,-1.943526e-17,-2.5945130000000003e-17,-2.8115090000000004e-17,8.774171e-18,-8.019403e-18,-1.000067e-17,-9.151555e-18,-3.868183e-18,-3.6158080000000005e-17
4,-1.4626000000000002e-17,3.864075e-18,3.2573200000000002e-18,-5.317095e-18,8.494578e-18,-1.27738e-18,-4.989766e-18,2.682498e-18,-8.302971e-18,-1.871362e-17,-1.3097140000000001e-17,-6.490688e-18,3.19345e-18,-9.692122e-18,-3.512796e-19,1.9607790000000003e-17,-1.3644020000000002e-17,2.709643e-17,-2.094904e-17
5,4.916188e-18,1.0907790000000002e-17,-1.3673150000000002e-17,6.6637390000000004e-18,1.328907e-17,1.113824e-18,5.37708e-19,1.1503110000000001e-17,2.150832e-18,6.913388999999999e-19,-1.190639e-17,-3.4566939999999995e-19,-1.6764970000000002e-17,1.152231e-18,-1.920386e-18,6.375681e-18,-1.574716e-17,6.145234e-18,-8.027213e-18
6,-4.930512e-18,3.151516e-18,-3.980306e-18,-3.093448e-18,5.643166e-18,5.194458e-18,-2.3438410000000002e-18,-1.3218420000000001e-17,2.539161e-18,1.2891120000000002e-17,-1.6195730000000002e-17,4.328715e-18,1.815949e-18,-3.821939e-18,1.0726770000000001e-17,-2.0059900000000002e-18,3.5791080000000004e-18,5.490078e-18,3.557993e-18
7,-7.768393e-18,1.513121e-18,1.90076e-17,1.3625890000000001e-17,-1.1286010000000001e-17,-3.05744e-18,7.612401e-18,2.495869e-18,1.575517e-18,-6.692049e-18,7.799590999999999e-19,-1.559918e-20,9.921080000000001e-18,1.5895570000000003e-17,1.096622e-17,-2.9178270000000004e-17,-1.006147e-17,-4.118184e-18,1.622315e-18
8,7.6750609999999995e-19,1.6141610000000002e-17,-1.3551280000000002e-17,-1.422285e-17,2.375671e-17,4.101361e-18,4.994786e-18,1.2903700000000002e-17,-1.675322e-17,-2.104646e-17,-1.952344e-17,4.077376e-18,1.1150480000000001e-17,-7.1474e-18,-1.043329e-18,-3.825538e-18,-6.907555e-18,-1.016946e-17,1.4390739999999998e-19
9,-3.9957580000000004e-18,-3.678104e-19,7.4398e-18,-1.0833690000000001e-17,3.2868870000000005e-17,-1.3876480000000001e-18,6.169183e-18,3.4549100000000004e-17,-1.646787e-17,-8.292452e-18,7.105427e-18,-9.011354e-18,-1.2104300000000001e-17,-1.5247410000000002e-17,1.5514910000000002e-17,2.1533620000000003e-17,1.681896e-17,2.0263010000000002e-17,-3.460761e-17


In [27]:
# Assign healthy status
healthy_means_a['Healthy'] = True
healthy_means_b['Healthy'] = True
schizo_means_a['Healthy'] = False
schizo_means_b['Healthy'] = False

In [28]:
# Create a df for all the schizophrenic a datasets
schizo_a = pd.concat(schizo_lst_dfs_a)
schizo_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09
3,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,-6.091205e-07,-3.033103e-07,-6.091205e-07,-7.620257e-07
4,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,3.083103e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,-3.033103e-07,-1.504051e-07,-6.091205e-07,-4.562154e-07,-7.620257e-07,-7.620257e-07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09


In [29]:
# Create a df for all the schizophrenic b datasets
schizo_b = pd.concat(schizo_b)
schizo_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,0.234490,0.458294,0.857948,0.985846,0.985846,0.770034,0.186537,0.010686,0.010686,0.186537,0.282443,0.058640,-0.516865,-0.740669,-0.652733,-0.341015
1,0.322427,0.146576,-0.388968,-0.604779,-0.604779,-0.652733,-0.964473,-1.052387,-0.564818,-0.205125,0.146576,0.370380,0.586191,0.546230,0.058640,-0.077228
2,0.058640,0.370380,0.586191,0.546230,0.586191,0.634145,0.945885,0.897909,-0.253078,-0.077228,0.098623,0.234490,0.146576,0.010686,-0.029275,0.058640
3,0.146576,0.186537,0.282443,0.370380,0.770034,1.033799,0.897909,0.634145,-0.165164,-0.476882,-0.876536,-0.964473,-1.052387,-1.052387,-0.828583,-0.788622
4,-0.700686,-0.516865,-0.029275,0.186537,0.586191,0.722081,0.945885,1.033799,1.481406,1.881061,2.544480,2.632416,2.232762,2.008958,1.609304,1.345539
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635
7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911
7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022
7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229


In [30]:
# Add a column to denote if the patient is healthy or schizophrenic
healthy_a['Healthy'] = True
healthy_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,True
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216245,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
216246,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
216247,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
216248,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True


In [31]:
# Add a column to denote if the patient is healthy or schizophrenic
healthy_b['Healthy'] = True
healthy_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,1.007423,1.482913,1.425500,1.073003,1.007423,1.187770,1.425500,1.310734,1.073003,1.245153,1.187770,0.597513,-0.296054,-0.353467,0.236790,0.474550,True
1,0.179407,-0.058324,-0.476430,-0.714161,-0.951921,-0.828927,-0.058324,-0.000940,-0.058324,-0.115707,0.056443,0.359783,0.712280,0.712280,0.294203,0.122023,True
2,0.417166,0.531933,0.417166,0.122023,-0.115707,-0.181287,-0.238671,-0.533814,-0.771544,-0.828927,-0.476430,-0.296054,-0.058324,-0.115707,-0.000940,-0.000940,True
3,-0.000940,0.056443,-0.058324,-0.181287,-0.000940,0.294203,0.597513,0.531933,0.179407,0.236790,0.654897,0.654897,0.712280,0.835244,1.245153,1.130387,True
4,-0.115707,-0.181287,-0.296054,-0.533814,-1.066687,-0.894508,-0.058324,0.597513,1.245153,1.482913,1.548494,1.663260,1.548494,1.482913,1.605877,1.548494,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,-1.601020,-0.801288,1.471772,2.719469,3.870921,3.668515,2.520360,1.571315,-0.499315,-1.398590,-2.248094,-2.599837,-3.548883,-3.621879,-2.423953,-1.501453,True
7676,0.396613,0.798153,1.073554,1.246117,1.571315,1.372205,0.721811,0.622269,0.323617,-0.001580,-0.024805,0.472956,0.821378,0.323617,-0.250460,-0.250460,True
7677,0.024966,0.074737,0.074737,-0.051351,-0.751517,-0.851059,-0.150893,0.423160,1.621086,1.770424,1.222892,0.798153,0.147758,0.048191,-0.350002,-0.801288,True
7678,-1.574474,-1.524703,-0.874308,-0.499315,-0.150893,-0.150893,0.273846,0.423160,-0.077897,-0.399773,-0.950626,-1.149710,-1.501453,-1.750334,-2.798921,-3.147369,True


In [32]:
# Add a column to denote if the patient is healthy or schizophrenic
schizo_a['Healthy'] = False
schizo_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,False
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,False
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,False
3,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,-6.091205e-07,-3.033103e-07,-6.091205e-07,-7.620257e-07,False
4,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,3.083103e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,-3.033103e-07,-1.504051e-07,-6.091205e-07,-4.562154e-07,-7.620257e-07,-7.620257e-07,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False


In [33]:
# Add a column to denote if the patient is healthy or schizophrenic
schizo_b['Healthy'] = False
schizo_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0.234490,0.458294,0.857948,0.985846,0.985846,0.770034,0.186537,0.010686,0.010686,0.186537,0.282443,0.058640,-0.516865,-0.740669,-0.652733,-0.341015,False
1,0.322427,0.146576,-0.388968,-0.604779,-0.604779,-0.652733,-0.964473,-1.052387,-0.564818,-0.205125,0.146576,0.370380,0.586191,0.546230,0.058640,-0.077228,False
2,0.058640,0.370380,0.586191,0.546230,0.586191,0.634145,0.945885,0.897909,-0.253078,-0.077228,0.098623,0.234490,0.146576,0.010686,-0.029275,0.058640,False
3,0.146576,0.186537,0.282443,0.370380,0.770034,1.033799,0.897909,0.634145,-0.165164,-0.476882,-0.876536,-0.964473,-1.052387,-1.052387,-0.828583,-0.788622,False
4,-0.700686,-0.516865,-0.029275,0.186537,0.586191,0.722081,0.945885,1.033799,1.481406,1.881061,2.544480,2.632416,2.232762,2.008958,1.609304,1.345539,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635,False
7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911,False
7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022,False
7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229,False


In [34]:
# Change the index for the healthy b consolidated data
healthy_b.reset_index(inplace = True, names = 'old_index')
healthy_b

Unnamed: 0,old_index,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,1.007423,1.482913,1.425500,1.073003,1.007423,1.187770,1.425500,1.310734,1.073003,1.245153,1.187770,0.597513,-0.296054,-0.353467,0.236790,0.474550,True
1,1,0.179407,-0.058324,-0.476430,-0.714161,-0.951921,-0.828927,-0.058324,-0.000940,-0.058324,-0.115707,0.056443,0.359783,0.712280,0.712280,0.294203,0.122023,True
2,2,0.417166,0.531933,0.417166,0.122023,-0.115707,-0.181287,-0.238671,-0.533814,-0.771544,-0.828927,-0.476430,-0.296054,-0.058324,-0.115707,-0.000940,-0.000940,True
3,3,-0.000940,0.056443,-0.058324,-0.181287,-0.000940,0.294203,0.597513,0.531933,0.179407,0.236790,0.654897,0.654897,0.712280,0.835244,1.245153,1.130387,True
4,4,-0.115707,-0.181287,-0.296054,-0.533814,-1.066687,-0.894508,-0.058324,0.597513,1.245153,1.482913,1.548494,1.663260,1.548494,1.482913,1.605877,1.548494,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299515,7675,-1.601020,-0.801288,1.471772,2.719469,3.870921,3.668515,2.520360,1.571315,-0.499315,-1.398590,-2.248094,-2.599837,-3.548883,-3.621879,-2.423953,-1.501453,True
299516,7676,0.396613,0.798153,1.073554,1.246117,1.571315,1.372205,0.721811,0.622269,0.323617,-0.001580,-0.024805,0.472956,0.821378,0.323617,-0.250460,-0.250460,True
299517,7677,0.024966,0.074737,0.074737,-0.051351,-0.751517,-0.851059,-0.150893,0.423160,1.621086,1.770424,1.222892,0.798153,0.147758,0.048191,-0.350002,-0.801288,True
299518,7678,-1.574474,-1.524703,-0.874308,-0.499315,-0.150893,-0.150893,0.273846,0.423160,-0.077897,-0.399773,-0.950626,-1.149710,-1.501453,-1.750334,-2.798921,-3.147369,True


In [35]:
# Rename the column to be a time reference
healthy_b.rename(columns = {'old_index' : 'time_point'}, inplace = True)
healthy_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,1.007423,1.482913,1.425500,1.073003,1.007423,1.187770,1.425500,1.310734,1.073003,1.245153,1.187770,0.597513,-0.296054,-0.353467,0.236790,0.474550,True
1,1,0.179407,-0.058324,-0.476430,-0.714161,-0.951921,-0.828927,-0.058324,-0.000940,-0.058324,-0.115707,0.056443,0.359783,0.712280,0.712280,0.294203,0.122023,True
2,2,0.417166,0.531933,0.417166,0.122023,-0.115707,-0.181287,-0.238671,-0.533814,-0.771544,-0.828927,-0.476430,-0.296054,-0.058324,-0.115707,-0.000940,-0.000940,True
3,3,-0.000940,0.056443,-0.058324,-0.181287,-0.000940,0.294203,0.597513,0.531933,0.179407,0.236790,0.654897,0.654897,0.712280,0.835244,1.245153,1.130387,True
4,4,-0.115707,-0.181287,-0.296054,-0.533814,-1.066687,-0.894508,-0.058324,0.597513,1.245153,1.482913,1.548494,1.663260,1.548494,1.482913,1.605877,1.548494,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299515,7675,-1.601020,-0.801288,1.471772,2.719469,3.870921,3.668515,2.520360,1.571315,-0.499315,-1.398590,-2.248094,-2.599837,-3.548883,-3.621879,-2.423953,-1.501453,True
299516,7676,0.396613,0.798153,1.073554,1.246117,1.571315,1.372205,0.721811,0.622269,0.323617,-0.001580,-0.024805,0.472956,0.821378,0.323617,-0.250460,-0.250460,True
299517,7677,0.024966,0.074737,0.074737,-0.051351,-0.751517,-0.851059,-0.150893,0.423160,1.621086,1.770424,1.222892,0.798153,0.147758,0.048191,-0.350002,-0.801288,True
299518,7678,-1.574474,-1.524703,-0.874308,-0.499315,-0.150893,-0.150893,0.273846,0.423160,-0.077897,-0.399773,-0.950626,-1.149710,-1.501453,-1.750334,-2.798921,-3.147369,True


In [36]:
# Change the index for the healthy b consolidated data
schizo_b.reset_index(inplace = True, names = 'old_index')
schizo_b

Unnamed: 0,old_index,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,0.234490,0.458294,0.857948,0.985846,0.985846,0.770034,0.186537,0.010686,0.010686,0.186537,0.282443,0.058640,-0.516865,-0.740669,-0.652733,-0.341015,False
1,1,0.322427,0.146576,-0.388968,-0.604779,-0.604779,-0.652733,-0.964473,-1.052387,-0.564818,-0.205125,0.146576,0.370380,0.586191,0.546230,0.058640,-0.077228,False
2,2,0.058640,0.370380,0.586191,0.546230,0.586191,0.634145,0.945885,0.897909,-0.253078,-0.077228,0.098623,0.234490,0.146576,0.010686,-0.029275,0.058640,False
3,3,0.146576,0.186537,0.282443,0.370380,0.770034,1.033799,0.897909,0.634145,-0.165164,-0.476882,-0.876536,-0.964473,-1.052387,-1.052387,-0.828583,-0.788622,False
4,4,-0.700686,-0.516865,-0.029275,0.186537,0.586191,0.722081,0.945885,1.033799,1.481406,1.881061,2.544480,2.632416,2.232762,2.008958,1.609304,1.345539,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635,False
345596,7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911,False
345597,7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022,False
345598,7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229,False


In [37]:
# Rename the columns to have a time reference
schizo_b.rename(columns = {'old_index' : 'time_point'}, inplace = True)
schizo_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,0.234490,0.458294,0.857948,0.985846,0.985846,0.770034,0.186537,0.010686,0.010686,0.186537,0.282443,0.058640,-0.516865,-0.740669,-0.652733,-0.341015,False
1,1,0.322427,0.146576,-0.388968,-0.604779,-0.604779,-0.652733,-0.964473,-1.052387,-0.564818,-0.205125,0.146576,0.370380,0.586191,0.546230,0.058640,-0.077228,False
2,2,0.058640,0.370380,0.586191,0.546230,0.586191,0.634145,0.945885,0.897909,-0.253078,-0.077228,0.098623,0.234490,0.146576,0.010686,-0.029275,0.058640,False
3,3,0.146576,0.186537,0.282443,0.370380,0.770034,1.033799,0.897909,0.634145,-0.165164,-0.476882,-0.876536,-0.964473,-1.052387,-1.052387,-0.828583,-0.788622,False
4,4,-0.700686,-0.516865,-0.029275,0.186537,0.586191,0.722081,0.945885,1.033799,1.481406,1.881061,2.544480,2.632416,2.232762,2.008958,1.609304,1.345539,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635,False
345596,7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911,False
345597,7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022,False
345598,7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229,False


In [38]:
# Create vector that can be used as a patient id
id_lst_b2 = []
for i in range (40, 40 + int(len(schizo_b) / 7680)):
    id_lst_b2.extend([i + 1] * 7680)
    i += 1

In [39]:
# Assign the vector to a series in the df for patient id
schizo_b['patient_id'] = id_lst_b2
schizo_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy,patient_id
0,0,0.234490,0.458294,0.857948,0.985846,0.985846,0.770034,0.186537,0.010686,0.010686,0.186537,0.282443,0.058640,-0.516865,-0.740669,-0.652733,-0.341015,False,41
1,1,0.322427,0.146576,-0.388968,-0.604779,-0.604779,-0.652733,-0.964473,-1.052387,-0.564818,-0.205125,0.146576,0.370380,0.586191,0.546230,0.058640,-0.077228,False,41
2,2,0.058640,0.370380,0.586191,0.546230,0.586191,0.634145,0.945885,0.897909,-0.253078,-0.077228,0.098623,0.234490,0.146576,0.010686,-0.029275,0.058640,False,41
3,3,0.146576,0.186537,0.282443,0.370380,0.770034,1.033799,0.897909,0.634145,-0.165164,-0.476882,-0.876536,-0.964473,-1.052387,-1.052387,-0.828583,-0.788622,False,41
4,4,-0.700686,-0.516865,-0.029275,0.186537,0.586191,0.722081,0.945885,1.033799,1.481406,1.881061,2.544480,2.632416,2.232762,2.008958,1.609304,1.345539,False,41
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635,False,85
345596,7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911,False,85
345597,7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022,False,85
345598,7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229,False,85


In [40]:
# Import modin.pandas which is faster for concatenation of large datasets
import modin.pandas as pd

In [41]:
# Make consolidated means dataset a
a_means_dataset = pd.concat([healthy_means_a, schizo_means_a])

2024-11-10 18:48:35,660	INFO worker.py:1816 -- Started a local Ray instance.


In [42]:
# Make consolidated means dataset b
b_means_dataset = pd.concat([healthy_means_b, schizo_means_b])

In [43]:
# Make final consolidated a dataset
a_dataset = pd.concat([healthy_a, schizo_a])
a_dataset



Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,F4,C4,P4,F3,C3,P3,Fz,Cz,Pz,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,True
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,True
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,False


In [44]:
# Make final consolidated b dataset
b_dataset = pd.concat([healthy_b, schizo_b])
b_dataset



Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy,patient_id
0,0,1.007423,1.482913,1.425500,1.073003,1.007423,1.187770,1.425500,1.310734,1.073003,1.245153,1.187770,0.597513,-0.296054,-0.353467,0.236790,0.474550,True,
1,1,0.179407,-0.058324,-0.476430,-0.714161,-0.951921,-0.828927,-0.058324,-0.000940,-0.058324,-0.115707,0.056443,0.359783,0.712280,0.712280,0.294203,0.122023,True,
2,2,0.417166,0.531933,0.417166,0.122023,-0.115707,-0.181287,-0.238671,-0.533814,-0.771544,-0.828927,-0.476430,-0.296054,-0.058324,-0.115707,-0.000940,-0.000940,True,
3,3,-0.000940,0.056443,-0.058324,-0.181287,-0.000940,0.294203,0.597513,0.531933,0.179407,0.236790,0.654897,0.654897,0.712280,0.835244,1.245153,1.130387,True,
4,4,-0.115707,-0.181287,-0.296054,-0.533814,-1.066687,-0.894508,-0.058324,0.597513,1.245153,1.482913,1.548494,1.663260,1.548494,1.482913,1.605877,1.548494,True,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,0.516219,0.218075,-0.307199,-0.364004,-0.179440,-0.122635,-0.065860,-0.065860,-0.122635,-0.122635,0.147091,0.331655,0.218075,-0.094247,-0.250424,-0.122635,False,85.0
345596,7676,-0.605343,-0.789906,-0.676326,-0.491762,-0.548537,-0.605343,-0.364004,-0.179440,0.005124,0.005124,-0.023264,0.033511,0.402639,0.700783,1.126685,1.069911,False,85.0
345597,7677,0.516219,0.402639,0.147091,0.005124,-0.463375,-0.392391,0.374252,0.771767,0.913734,0.729170,0.487832,0.544606,0.459444,0.374252,0.033511,-0.151022,False,85.0
345598,7678,-0.179440,-0.122635,-0.065860,-0.065860,-0.065860,-0.094247,0.218075,0.402639,0.700783,0.771767,0.771767,0.587203,-0.023264,-0.250424,-0.917665,-1.102229,False,85.0


In [45]:
# View shapes of large datasets
a_dataset.shape, b_dataset.shape

((7215750, 20), (645120, 19))

In [46]:
# Re-import pandas as pd
import pandas as pd

In [47]:
# Prep data for train/test split
target_a_means = a_means_dataset['Healthy']
a_means_dataset.drop('Healthy', inplace = True, axis = 1)
features_a_means = a_means_dataset
target_b_means = b_means_dataset['Healthy']
b_means_dataset.drop('Healthy', inplace = True, axis = 1)
features_b_means = b_means_dataset

In [48]:
# Split into train and test sets
features_train_a, features_test_a, target_train_a, target_test_a = train_test_split(features_a_means, target_a_means, test_size = 0.2, random_state = 118)
# Split into train and test sets
features_train_b, features_test_b, target_train_b, target_test_b = train_test_split(features_b_means, target_b_means, test_size = 0.2, random_state = 8)

# pip install tensorflow

In [50]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.layers import Embedding, Input, Dense, LSTM
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [51]:
# Create LSTM
lstm = tf.keras.layers.LSTM(
    # Choose units for output dimensionality to be 1. Activation function set to sigmoid for binary classifier
    units = 1, activation = 'tanh', recurrent_activation = 'sigmoid',
    # Use bias. Initialize kernel weights matrix
    use_bias = True, kernel_initializer = 'glorot_uniform',
    # Initialize weights matrix for recurrent state
    recurrent_initializer ='orthogonal',
    # Initialize bias vector
    bias_initializer ='zeros', unit_forget_bias = True,
    # Regularizer function to kernel weights matrix, to recurrent state weights matrix, to bias vector and to layer's output.
    kernel_regularizer = None, recurrent_regularizer = None, bias_regularizer = None, activity_regularizer = None, 
    # Constraint applied to weights matrix and recurrent state weights matrix, bias vector
    kernel_constraint = None, recurrent_constraint = None, bias_constraint = None, 
    # Set fraction of units to drop for inputs and in recurrent state
    dropout = 0.0, recurrent_dropout = 0.0, 
    # Return output and return last state. Do not go in reverse.Use last state for initial state
    return_sequences = True, return_state = False, go_backwards = False, stateful = True,
    # Compute t from t-1. Unroll the network
     unroll = True
)

In [52]:
# View the shapes of each dataset
a_means_dataset.shape, b_means_dataset.shape

((28, 19), (84, 16))

## Prep models/Establish parameters

In [54]:
inputs_a = Input(shape = (19,))

In [55]:
inputs_b = Input(shape = (16,))

In [56]:
layer_1a = Dense(19, activation = 'tanh')(inputs_a)

In [57]:
layer_1b = Dense(19, activation = 'tanh')(inputs_b)

In [58]:
layer_2a = Dense(19, activation = 'tanh')(layer_1a)

In [59]:
layer_2b = Dense(19, activation = 'tanh')(layer_1b)

In [60]:
layer_3a = Dense(19, activation = 'tanh')(layer_2a)

In [61]:
layer_3b = Dense(19, activation = 'tanh')(layer_2b)

In [62]:
layer_4a = Dense(19, activation = 'tanh')(layer_3a)

In [63]:
layer_4b = Dense(19, activation = 'tanh')(layer_3b)

In [64]:
layer_5a = Dense(19, activation = 'tanh')(layer_4a)

In [65]:
layer_5b = Dense(19, activation = 'tanh')(layer_4b)

In [66]:
layer_6a = Dense(19, activation = 'tanh')(layer_5a)

In [67]:
layer_6b = Dense(19, activation = 'tanh')(layer_5b)

In [68]:
layer_7a = Dense(19, activation = 'tanh')(layer_6a)

In [69]:
layer_7b = Dense(19, activation = 'tanh')(layer_6b)

In [70]:
layer_8a = Dense(19, activation = 'tanh')(layer_7a)

In [71]:
layer_8b = Dense(19, activation = 'tanh')(layer_7b)

In [72]:
output = Dense(1, activation = 'sigmoid')(layer_8a)

In [73]:
output_b = Dense(1, activation = 'sigmoid')(layer_8b)

In [74]:
model = Model(inputs = inputs_a, outputs = output)

In [75]:
model_b = Model(inputs = inputs_b, outputs = output_b)

## Compile Models

In [77]:
model.compile(optimizer = 'adam', loss = BinaryCrossentropy(), metrics = ['accuracy'])

In [78]:
model_b.compile(optimizer = 'adam', loss = BinaryCrossentropy(), metrics = ['accuracy'])

In [177]:
# Fit model a
model.fit(features_train_a, target_train_a * 1, epochs = 10, batch_size = 19, validation_data = (features_test_a, target_test_a * 1))

Epoch 1/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step - accuracy: 0.5391 - loss: 0.6931 - val_accuracy: 0.5000 - val_loss: 0.6931
Epoch 2/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.4912 - loss: 0.6931 - val_accuracy: 0.5000 - val_loss: 0.6931
Epoch 3/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - accuracy: 0.4912 - loss: 0.6932 - val_accuracy: 0.5000 - val_loss: 0.6931
Epoch 4/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.4912 - loss: 0.6932 - val_accuracy: 0.5000 - val_loss: 0.6932
Epoch 5/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - accuracy: 0.4912 - loss: 0.6932 - val_accuracy: 0.5000 - val_loss: 0.6932
Epoch 6/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.5263 - loss: 0.6930 - val_accuracy: 0.5000 - val_loss: 0.6932
Epoch 7/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x1d80173a450>

In [202]:
# Fit model b
model_b.fit(features_train_b, target_train_b * 1, epochs = 280, batch_size = 16, validation_data = (features_test_b, target_test_b * 1))

Epoch 1/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9733 - loss: 0.1237 - val_accuracy: 0.7059 - val_loss: 1.0991
Epoch 2/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9733 - loss: 0.1299 - val_accuracy: 0.7059 - val_loss: 1.1009
Epoch 3/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9924 - loss: 0.0549 - val_accuracy: 0.7059 - val_loss: 1.1063
Epoch 4/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9889 - loss: 0.0682 - val_accuracy: 0.7059 - val_loss: 1.1002
Epoch 5/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9733 - loss: 0.1237 - val_accuracy: 0.7059 - val_loss: 1.0968
Epoch 6/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9889 - loss: 0.0620 - val_accuracy: 0.7059 - val_loss: 1.0992
Epoch 7/280
[1m5/5[0m [32m━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x1d801059f10>

In [203]:
# View model a summary
model.summary()

In [204]:
# Predict and view model a test features
predict_a = model.predict(features_test_a)
predict_a

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step


array([[0.49586314],
       [0.49586314],
       [0.49586314],
       [0.49586314],
       [0.49586314],
       [0.49586314]], dtype=float32)

In [205]:
# Create empty list
true_false = []
# Loop and assign True/False based on predicted values
for _ in predict_a:
    if _ > 0.5:
        true_false.append(True)
    else:
        true_false.append(False)

In [206]:
# Set count to 0
count = 0
# Loop and increment correct predictions
for i in range(0, len(true_false)):
    if true_false[i] == target_test_a[i]:
        count += 1
    else:
        pass

In [207]:
# Calculate and print the accuracy of model a
print('Accuracy of model for dataset a of means: ', round(count / len(true_false), 4)*100, '%')

Accuracy of model for dataset a of means:  50.0 %


In [208]:
# View summary of model
model_b.summary()

In [209]:
# Calculate and view predicted values for b dataset
predict_b = model_b.predict(features_test_b)
predict_b

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step


array([[0.14919995],
       [0.981715  ],
       [0.8898362 ],
       [0.0398418 ],
       [0.87703484],
       [0.03011091],
       [0.13975725],
       [0.0315239 ],
       [0.02676319],
       [0.05832848],
       [0.8959812 ],
       [0.02379467],
       [0.12190312],
       [0.8876052 ],
       [0.04050396],
       [0.9282857 ],
       [0.02327237]], dtype=float32)

In [210]:
# Create empty list
true_false = []
# Append True/False if prediction matches or not
for _ in predict_b:
    if _ >= 0.5:
        true_false.append(True)
    else:
        true_false.append(False)

In [211]:
# Reset count
count = 0
# Loop and count how many values were correct guesses with the model
for i in range(0, len(true_false)):
    if true_false[i] == target_test_b[i]:
        count += 1
    else:
        pass

In [212]:
# Calculate and print the accuracy of the b model
print('Accuracy of model for dataset b of means: ', round(count / len(true_false), 4)*100, '%')

Accuracy of model for dataset b of means:  47.06 %


I have chosen to use a Long Short-Term Memory neural network for my first round of testing. This model is supposed to be very well suited for time-series data. In order to utilize the model each set of data needs to be standardized and have the same length. In order to achieve that I used standardscaler and then I took the mean of each column in a given dataframe and put that value into a new dataframe. The results were not as good as I had hoped with around only a 50% accuracy for the smaller dataset and ~47% accuracy for the larger dataset. In the next iteration I will try to use the whole dataframe of each file instead of taking the mean of every column. This should provide a lot more data to the model and improve accuracy.