# COGS 189 Final Project

This project aims to </br>
Group Members: 
Stephen Gelinas (A15816513)
Aditya Tomar (A17162996)
Shay Samat
Rolando Restua
Kevin Wong 

## Data Loading

We will first load and inspect the raw EEG data we collected with OpenBCI

In [None]:
# required imoprts
import pandas as pd
from IPython.display import Image

In [None]:
# read EEG data
df = pd.read_csv('data/eeg.txt')
df.head()

There appears to be no missing values from the data collection process in the raw EEG data

In [None]:
# determine if any values are missing
df.isna().sum()

The image below illustrates locations of the 8 selected channels for the data collection process (including GND and REF) 

In [None]:
Image("data/channels.png", width=400)

## Data Cleaning/Preprocessing

In [None]:
# "Other" channels that didn't collect EEG data
df[[' Other', ' Other.7']].value_counts().to_frame(name='Total Count')

In [None]:
# "Analog" channels that didn't collect EEG data
analog = [' Analog Channel 0', ' Analog Channel 1', ' Analog Channel 2']
df[analog].value_counts().to_frame(name='Total Count')

In [None]:
# drop data from these channels
dropped = [' Other', ' Other.7', ' Analog Channel 0', ' Analog Channel 1', ' Analog Channel 2']
df_cleaned = df.drop(columns=dropped)

In [None]:
import pandas as pd
import numpy as np
import mne

# Load the data as a pandas DataFrame
df = pd.read_csv('data/eeg.txt')

# Get the sampling frequency from the timestamps
sfreq = 1 / np.mean(np.diff(df[' Timestamp']))

# Convert the data to MNE format
ch_names = df.columns[1:9].tolist()
ch_types = ['eeg'] * len(ch_names)
info = mne.create_info(ch_names=ch_names, sfreq=sfreq, ch_types=ch_types)
data = df[ch_names].values.T.astype(np.float32)
raw = mne.io.RawArray(data, info)

# Apply high-pass filter to remove eye movements
raw.filter(l_freq=1.0, h_freq=None)

# Apply low-pass filter to remove eye blinks
raw.filter(l_freq=None, h_freq=40.0)

# Get the preprocessed data as a numpy array
df_cleaned = raw.get_data()

In [None]:
# Get the accelerometer data
accelerometer_data = df[[' Accel Channel 0', ' Accel Channel 1', ' Accel Channel 2']].values.T.astype(np.float32)

# Calculate the norm of the accelerometer data to get the overall acceleration
acceleration = np.linalg.norm(accelerometer_data, axis=0)

# Identify and remove segments with high acceleration (i.e. head movements)
threshold = np.percentile(acceleration, 95)
bad_segments = np.where(acceleration > threshold)[0]
raw.annotations.append(bad_segments, [1] * len(bad_segments), 'bad')

# Interpolate bad segments
raw.interpolate_bads(reset_bads=True)

# Apply high-pass filter to remove slow drifts
raw.filter(l_freq=1.0, h_freq=None)

# Apply low-pass filter to remove high-frequency noise
raw.filter(l_freq=None, h_freq=40.0)

# Get the preprocessed data as a numpy array
df_cleaned = raw.get_data()

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Load preprocessed EEG data from a numpy file
data = df_cleaned

# Select a subset of channels and time points to visualize
channels = [0, 1, 2, 3, 4, 5, 6, 7]
time_points = range(20000, 50000)

# Plot the EEG data
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(data[channels][:, time_points].T)
ax.set_xlabel('Time')
ax.set_ylabel('Amplitude')
ax.set_title('Preprocessed EEG Data')
ax.legend(['Channel {}'.format(i) for i in channels])
plt.show()