# Author: Makayla McKibben
## Course: DSC550 Data Mining
## Assignment: Term Project Milestone 2
## Date: 10.22.2024

Global awareness and tolerance of mental health conditions have been increasing rapidly in the last few years. Mental illness is a topic that affects a great many people, but schizophrenia is still somewhat mysterious to a lot of everyday people. Portraying schizophrenia is done in a dramatized and dangerous way in most television shows and movies. They imply that everyone with schizophrenia is dangerous or violent and has no care for anybody else. It would likely shock many people how far that is from the truth and how many 'regular' people who have schizophrenia there are. Regarless, whether you are directly impacted or have a spouse, relative, or friend who deals with a mental illness, the detriment of having a mental illness will touch nearly everyone.

We will be using several datasets during the EDA phase of this project that address the need for repeated testing. These datasets are mixed, some from healthy people and some from people with schizophrenia. The datasets are EEG (electroencephalogram) readings. There are subtle differences in the EEGs between those with and those without schizophrenia. The goal with these datasets would be to find a reliable, broadly applicable way to determine if someone has schizophrenia with an improved accuracy. Many current models are hand-tailored to their datasets and so do not perform well on new data; the goal is to develop a model with wide applicability while maintaining high to excellent accuracy.

In [3]:
# Import relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import string
import warnings
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer
from os import listdir
from os.path import isfile, join

In [4]:
!pip install modin[ray]



In [5]:
pip install -U ipywidgets




In [6]:
# Remove future warning
warnings.simplefilter(action='ignore', category=FutureWarning)

The goal of this exercise will be to find out if there are specific EEG patterns present in ill patients different from healthy patients which can reliably indicate if someone may have a schizophrenia spectrum illness.

In [8]:
# Create list of files for healthy batch 'a' datasets
healthy_lst_a = [f for f in listdir('EEG Schizophrenia/EEG Healthy/EEG Healthy A') if isfile(join('EEG Schizophrenia/EEG Healthy/EEG Healthy A', f))]

In [9]:
# Create list of files for healthy batch 'b' datasets
healthy_lst_b = [f for f in listdir('EEG Schizophrenia/EEG Healthy/EEG Healthy B') if isfile(join('EEG Schizophrenia/EEG Healthy/EEG Healthy B', f))]

In [10]:
# Open each healthy a file from our list and add a patient id and time scale to the df then append the df to a list
i = 0
healthy_lst_dfs_a = []
for file in healthy_lst_a:
    dir = 'EEG Schizophrenia/EEG Healthy/EEG Healthy A/' + str(file)
    temp_hp = pd.read_csv(dir)
    temp_hp['patient_id'] = i + 1
    temp_hp['time_point'] = range(0, len(temp_hp), 1)
    healthy_lst_dfs_a.append(temp_hp)
    i += 1

In [11]:
# Open each healthy b file in the list and append to list
i = 0
healthy_lst_dfs_b = []
for file in healthy_lst_b:
    dir = 'EEG Schizophrenia/EEG Healthy/EEG Healthy B/' + str(file)
    healthy_lst_dfs_b.append(pd.read_csv(dir, header = None, delim_whitespace = True))
    i += 1

In [12]:
# Create list of files for schizophrenic batch 'a' datasets
schizo_lst_a = [f for f in listdir('EEG Schizophrenia/EEG Schizo/EEG Schizo A') if isfile(join('EEG Schizophrenia/EEG Schizo/EEG Schizo A', f))]

In [13]:
# Create list of files for schizophrenic batch 'b' datasets
schizo_lst_b = [f for f in listdir('EEG Schizophrenia/EEG Schizo/EEG Schizo B') if isfile(join('EEG Schizophrenia/EEG Schizo/EEG Schizo B', f))]

In [14]:
# Open each file from our list and add a patient id and time scale to the df then append the df to a list
i = 0
schizo_lst_dfs_a = []
for file in schizo_lst_a:
    dir = 'EEG Schizophrenia/EEG Schizo/EEG Schizo A/' + str(file)
    temp_schiz = pd.read_csv(dir)
    temp_schiz['patient_id'] = i + 1
    temp_schiz['time_point'] = range(0, len(temp_schiz), 1)
    schizo_lst_dfs_a.append(temp_schiz)
    i += 1

In [15]:
# Open each healthy b file in the list and append to list
i = 0
schizo_lst_dfs_b = []
for file in schizo_lst_b:
    dir = 'EEG Schizophrenia/EEG Schizo/EEG Schizo B/' + str(file)
    schizo_lst_dfs_b.append(pd.read_csv(dir, header = None, delim_whitespace = True))
    i += 1

In [16]:
# Create header list for b datasets
electrode_pos_b = ['F7', 'F3', 'F4', 'F8', 'T3', 'C3', 'Cz', 'C4', 'T4', 'T5', 'P3', 'Pz', 'P4', 'T6', 'O1', 'O2']

In [17]:
# Reshape the df's in the b healthy batch and append the restructured dfs in a list
healthy_b = []
for df in healthy_lst_dfs_b:
    healthy_b.append(pd.DataFrame(df.values.reshape(7680, len(electrode_pos_b)), columns = electrode_pos_b))

In [18]:
# Reshape the df's in the b schizophrenic batch and append the restructured dfs in a list
i = 0
schizo_b = []
for df in schizo_lst_dfs_b:
    schizo_b.append(pd.DataFrame(df.values.reshape(7680, len(electrode_pos_b)), columns = electrode_pos_b))
    i += 1

In [19]:
# Create a df for all the healthy a datasets
healthy_a = pd.concat(healthy_lst_dfs_a)
healthy_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,...,C4,P4,F3,C3,P3,Fz,Cz,Pz,patient_id,time_point
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,0
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,1
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,2
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,1,3
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,...,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216245,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216245
216246,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216246
216247,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216247
216248,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216248


In [20]:
# Create a df for all the healthy b datasets
healthy_b = pd.concat(healthy_b)
healthy_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37
1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68
2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28
3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18
4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,-631.17,-310.93,599.28,1098.90,1559.98,1478.93,1019.17,639.14,-190.01,-550.11,-890.28,-1031.13,-1411.16,-1440.39,-960.70,-591.30
7676,168.75,329.54,439.82,508.92,639.14,559.41,298.97,259.11,139.52,9.30,0.00,199.32,338.84,139.52,-90.36,-90.36
7677,19.93,39.86,39.86,-10.63,-291.00,-330.86,-50.49,179.38,659.07,718.87,499.62,329.54,69.10,29.23,-130.22,-310.93
7678,-620.54,-600.61,-340.17,-190.01,-50.49,-50.49,119.59,179.38,-21.26,-150.15,-370.73,-450.45,-591.30,-690.96,-1110.85,-1250.38


In [21]:
# Create a df for all the schizophrenic a datasets
schizo_a = pd.concat(schizo_lst_dfs_a)
schizo_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,...,C4,P4,F3,C3,P3,Fz,Cz,Pz,patient_id,time_point
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,0
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,1
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,2
3,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,...,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,-6.091205e-07,-3.033103e-07,-6.091205e-07,-7.620257e-07,1,3
4,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,3.083103e-07,-3.033103e-07,...,2.500000e-09,-3.033103e-07,-3.033103e-07,-1.504051e-07,-6.091205e-07,-4.562154e-07,-7.620257e-07,-7.620257e-07,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542495
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542496
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542497
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542498


In [22]:
# Create a df for all the schizophrenic b datasets
schizo_b = pd.concat(schizo_b)
schizo_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2
0,108.01,208.82,388.84,446.45,446.45,349.24,86.41,7.20,7.20,86.41,129.61,28.80,-230.43,-331.24,-291.63,-151.22
1,147.62,68.41,-172.82,-270.03,-270.03,-291.63,-432.05,-471.65,-252.03,-90.01,68.41,169.22,266.43,248.43,28.80,-32.40
2,28.80,169.22,266.43,248.43,266.43,288.03,428.45,406.84,-111.61,-32.40,46.81,108.01,68.41,7.20,-10.80,28.80
3,68.41,86.41,129.61,169.22,349.24,468.05,406.84,288.03,-72.01,-212.42,-392.44,-432.05,-471.65,-471.65,-370.84,-352.84
4,-313.23,-230.43,-10.80,86.41,266.43,327.64,428.45,468.05,669.67,849.69,1148.52,1188.13,1008.11,907.30,727.28,608.47
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45
7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98
7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72
7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34


In [23]:
# Add a column to denote if the patient is healthy or schizophrenic
healthy_a['Healthy'] = True
healthy_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,...,P4,F3,C3,P3,Fz,Cz,Pz,patient_id,time_point,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,0,True
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,1,True
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,2,True
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,...,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,1,3,True
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,...,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07,1,4,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216245,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216245,True
216246,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216246,True
216247,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216247,True
216248,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,14,216248,True


In [24]:
# Add a column to denote if the patient is healthy or schizophrenic
healthy_b['Healthy'] = True
healthy_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37,True
1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68,True
2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28,True
3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18,True
4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,-631.17,-310.93,599.28,1098.90,1559.98,1478.93,1019.17,639.14,-190.01,-550.11,-890.28,-1031.13,-1411.16,-1440.39,-960.70,-591.30,True
7676,168.75,329.54,439.82,508.92,639.14,559.41,298.97,259.11,139.52,9.30,0.00,199.32,338.84,139.52,-90.36,-90.36,True
7677,19.93,39.86,39.86,-10.63,-291.00,-330.86,-50.49,179.38,659.07,718.87,499.62,329.54,69.10,29.23,-130.22,-310.93,True
7678,-620.54,-600.61,-340.17,-190.01,-50.49,-50.49,119.59,179.38,-21.26,-150.15,-370.73,-450.45,-591.30,-690.96,-1110.85,-1250.38,True


In [25]:
# Add a column to denote if the patient is healthy or schizophrenic
schizo_a['Healthy'] = False
schizo_a

Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,...,P4,F3,C3,P3,Fz,Cz,Pz,patient_id,time_point,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,0,False
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,1,False
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,2,False
3,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,3.083103e-07,2.500000e-09,...,-1.504051e-07,-3.033103e-07,2.500000e-09,-6.091205e-07,-3.033103e-07,-6.091205e-07,-7.620257e-07,1,3,False
4,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,2.500000e-09,2.500000e-09,2.500000e-09,1.554051e-07,3.083103e-07,-3.033103e-07,...,-3.033103e-07,-3.033103e-07,-1.504051e-07,-6.091205e-07,-4.562154e-07,-7.620257e-07,-7.620257e-07,1,4,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542495,False
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542496,False
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542497,False
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542498,False


In [26]:
# Add a column to denote if the patient is healthy or schizophrenic
schizo_b['Healthy'] = False
schizo_b

Unnamed: 0,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,108.01,208.82,388.84,446.45,446.45,349.24,86.41,7.20,7.20,86.41,129.61,28.80,-230.43,-331.24,-291.63,-151.22,False
1,147.62,68.41,-172.82,-270.03,-270.03,-291.63,-432.05,-471.65,-252.03,-90.01,68.41,169.22,266.43,248.43,28.80,-32.40,False
2,28.80,169.22,266.43,248.43,266.43,288.03,428.45,406.84,-111.61,-32.40,46.81,108.01,68.41,7.20,-10.80,28.80,False
3,68.41,86.41,129.61,169.22,349.24,468.05,406.84,288.03,-72.01,-212.42,-392.44,-432.05,-471.65,-471.65,-370.84,-352.84,False
4,-313.23,-230.43,-10.80,86.41,266.43,327.64,428.45,468.05,669.67,849.69,1148.52,1188.13,1008.11,907.30,727.28,608.47,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45,False
7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98,False
7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72,False
7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34,False


In [27]:
# Change the index for the healthy b consolidated data
healthy_b.reset_index(inplace = True, names = 'old_index')
healthy_b

Unnamed: 0,old_index,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37,True
1,1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68,True
2,2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28,True
3,3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18,True
4,4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299515,7675,-631.17,-310.93,599.28,1098.90,1559.98,1478.93,1019.17,639.14,-190.01,-550.11,-890.28,-1031.13,-1411.16,-1440.39,-960.70,-591.30,True
299516,7676,168.75,329.54,439.82,508.92,639.14,559.41,298.97,259.11,139.52,9.30,0.00,199.32,338.84,139.52,-90.36,-90.36,True
299517,7677,19.93,39.86,39.86,-10.63,-291.00,-330.86,-50.49,179.38,659.07,718.87,499.62,329.54,69.10,29.23,-130.22,-310.93,True
299518,7678,-620.54,-600.61,-340.17,-190.01,-50.49,-50.49,119.59,179.38,-21.26,-150.15,-370.73,-450.45,-591.30,-690.96,-1110.85,-1250.38,True


In [28]:
# Rename the column to be a time reference
healthy_b.rename(columns = {'old_index' : 'time_point'}, inplace = True)
healthy_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37,True
1,1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68,True
2,2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28,True
3,3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18,True
4,4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299515,7675,-631.17,-310.93,599.28,1098.90,1559.98,1478.93,1019.17,639.14,-190.01,-550.11,-890.28,-1031.13,-1411.16,-1440.39,-960.70,-591.30,True
299516,7676,168.75,329.54,439.82,508.92,639.14,559.41,298.97,259.11,139.52,9.30,0.00,199.32,338.84,139.52,-90.36,-90.36,True
299517,7677,19.93,39.86,39.86,-10.63,-291.00,-330.86,-50.49,179.38,659.07,718.87,499.62,329.54,69.10,29.23,-130.22,-310.93,True
299518,7678,-620.54,-600.61,-340.17,-190.01,-50.49,-50.49,119.59,179.38,-21.26,-150.15,-370.73,-450.45,-591.30,-690.96,-1110.85,-1250.38,True


In [29]:
# Change the index for the healthy b consolidated data
schizo_b.reset_index(inplace = True, names = 'old_index')
schizo_b

Unnamed: 0,old_index,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,108.01,208.82,388.84,446.45,446.45,349.24,86.41,7.20,7.20,86.41,129.61,28.80,-230.43,-331.24,-291.63,-151.22,False
1,1,147.62,68.41,-172.82,-270.03,-270.03,-291.63,-432.05,-471.65,-252.03,-90.01,68.41,169.22,266.43,248.43,28.80,-32.40,False
2,2,28.80,169.22,266.43,248.43,266.43,288.03,428.45,406.84,-111.61,-32.40,46.81,108.01,68.41,7.20,-10.80,28.80,False
3,3,68.41,86.41,129.61,169.22,349.24,468.05,406.84,288.03,-72.01,-212.42,-392.44,-432.05,-471.65,-471.65,-370.84,-352.84,False
4,4,-313.23,-230.43,-10.80,86.41,266.43,327.64,428.45,468.05,669.67,849.69,1148.52,1188.13,1008.11,907.30,727.28,608.47,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45,False
345596,7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98,False
345597,7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72,False
345598,7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34,False


In [30]:
# Rename the columns to have a time reference
schizo_b.rename(columns = {'old_index' : 'time_point'}, inplace = True)
schizo_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy
0,0,108.01,208.82,388.84,446.45,446.45,349.24,86.41,7.20,7.20,86.41,129.61,28.80,-230.43,-331.24,-291.63,-151.22,False
1,1,147.62,68.41,-172.82,-270.03,-270.03,-291.63,-432.05,-471.65,-252.03,-90.01,68.41,169.22,266.43,248.43,28.80,-32.40,False
2,2,28.80,169.22,266.43,248.43,266.43,288.03,428.45,406.84,-111.61,-32.40,46.81,108.01,68.41,7.20,-10.80,28.80,False
3,3,68.41,86.41,129.61,169.22,349.24,468.05,406.84,288.03,-72.01,-212.42,-392.44,-432.05,-471.65,-471.65,-370.84,-352.84,False
4,4,-313.23,-230.43,-10.80,86.41,266.43,327.64,428.45,468.05,669.67,849.69,1148.52,1188.13,1008.11,907.30,727.28,608.47,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45,False
345596,7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98,False
345597,7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72,False
345598,7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34,False


In [31]:
# Create vector that can be used as a patient id
id_lst = []
for i in range (0, int(len(healthy_b) / 7680)):
    id_lst.extend([i + 1] * 7680)
    i += 1

In [32]:
# Assign the vector to a series in the df for patient id
healthy_b['patient_id'] = id_lst
healthy_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy,patient_id
0,0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37,True,1
1,1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68,True,1
2,2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28,True,1
3,3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18,True,1
4,4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95,True,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299515,7675,-631.17,-310.93,599.28,1098.90,1559.98,1478.93,1019.17,639.14,-190.01,-550.11,-890.28,-1031.13,-1411.16,-1440.39,-960.70,-591.30,True,39
299516,7676,168.75,329.54,439.82,508.92,639.14,559.41,298.97,259.11,139.52,9.30,0.00,199.32,338.84,139.52,-90.36,-90.36,True,39
299517,7677,19.93,39.86,39.86,-10.63,-291.00,-330.86,-50.49,179.38,659.07,718.87,499.62,329.54,69.10,29.23,-130.22,-310.93,True,39
299518,7678,-620.54,-600.61,-340.17,-190.01,-50.49,-50.49,119.59,179.38,-21.26,-150.15,-370.73,-450.45,-591.30,-690.96,-1110.85,-1250.38,True,39


In [33]:
# Create vector that can be used as a patient id
id_lst = []
for i in range (40, 40 + int(len(schizo_b) / 7680)):
    id_lst.extend([i + 1] * 7680)
    i += 1

In [34]:
# Assign the vector to a series in the df for patient id
schizo_b['patient_id'] = id_lst
schizo_b

Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy,patient_id
0,0,108.01,208.82,388.84,446.45,446.45,349.24,86.41,7.20,7.20,86.41,129.61,28.80,-230.43,-331.24,-291.63,-151.22,False,41
1,1,147.62,68.41,-172.82,-270.03,-270.03,-291.63,-432.05,-471.65,-252.03,-90.01,68.41,169.22,266.43,248.43,28.80,-32.40,False,41
2,2,28.80,169.22,266.43,248.43,266.43,288.03,428.45,406.84,-111.61,-32.40,46.81,108.01,68.41,7.20,-10.80,28.80,False,41
3,3,68.41,86.41,129.61,169.22,349.24,468.05,406.84,288.03,-72.01,-212.42,-392.44,-432.05,-471.65,-471.65,-370.84,-352.84,False,41
4,4,-313.23,-230.43,-10.80,86.41,266.43,327.64,428.45,468.05,669.67,849.69,1148.52,1188.13,1008.11,907.30,727.28,608.47,False,41
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45,False,85
345596,7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98,False,85
345597,7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72,False,85
345598,7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34,False,85


In [35]:
import modin.pandas as pd

In [36]:
# Make final consolidated a dataset
a_dataset = pd.concat([healthy_a, schizo_a])
a_dataset

2024-10-24 05:24:29,363	INFO worker.py:1816 -- Started a local Ray instance.


Unnamed: 0,Fp2,F8,T4,T6,O2,Fp1,F7,T3,T5,O1,...,P4,F3,C3,P3,Fz,Cz,Pz,patient_id,time_point,Healthy
0,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,0,True
1,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,1,True
2,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,...,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,2.500000e-09,1,2,True
3,4.612154e-07,4.612154e-07,3.083103e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,2.500000e-09,...,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-3.033103e-07,1,3,True
4,4.612154e-07,4.612154e-07,4.612154e-07,3.083103e-07,1.554051e-07,2.500000e-09,2.500000e-09,-1.504051e-07,-1.504051e-07,-3.033103e-07,...,2.500000e-09,2.500000e-09,-1.504051e-07,-3.033103e-07,2.500000e-09,2.500000e-09,-1.504051e-07,1,4,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542495,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542495,False
542496,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542496,False
542497,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542497,False
542498,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,...,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,-1.500000e-09,14,542498,False


In [37]:
# Make final consolidated b dataset
b_dataset = pd.concat([healthy_b, schizo_b])
b_dataset



Unnamed: 0,time_point,F7,F3,F4,F8,T3,C3,Cz,C4,T4,T5,P3,Pz,P4,T6,O1,O2,Healthy,patient_id
0,0,347.78,507.87,488.54,369.86,347.78,408.50,488.54,449.90,369.86,427.82,408.50,209.77,-91.08,-110.41,88.32,168.37,True,1
1,1,69.00,-11.04,-151.81,-231.85,-311.90,-270.49,-11.04,8.28,-11.04,-30.36,27.60,129.73,248.41,248.41,107.65,49.68,True,1
2,2,149.05,187.69,149.05,49.68,-30.36,-52.44,-71.76,-171.13,-251.17,-270.49,-151.81,-91.08,-11.04,-30.36,8.28,8.28,True,1
3,3,8.28,27.60,-11.04,-52.44,8.28,107.65,209.77,187.69,69.00,88.32,229.09,229.09,248.41,289.81,427.82,389.18,True,1
4,4,-30.36,-52.44,-91.08,-171.13,-350.54,-292.57,-11.04,209.77,427.82,507.87,529.95,568.59,529.95,507.87,549.27,529.95,True,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
345595,7675,176.17,78.81,-92.72,-111.27,-51.00,-32.45,-13.91,-13.91,-32.45,-32.45,55.63,115.90,78.81,-23.18,-74.18,-32.45,False,85
345596,7676,-190.08,-250.35,-213.26,-152.99,-171.53,-190.08,-111.27,-51.00,9.27,9.27,0.00,18.54,139.08,236.44,375.52,356.98,False,85
345597,7677,176.17,139.08,55.63,9.27,-143.72,-120.54,129.81,259.62,305.98,245.71,166.90,185.44,157.63,129.81,18.54,-41.72,False,85
345598,7678,-51.00,-32.45,-13.91,-13.91,-13.91,-23.18,78.81,139.08,236.44,259.62,259.62,199.35,0.00,-74.18,-292.07,-352.34,False,85


In [74]:
# Re-import pandas as pd
import pandas as pd