EEG Motor Movement/Imagery Dataset (Sept. 9, 2009, midnight)

A set of 64-channel EEGs from subjects who performed a series of motor/imagery tasks has been contributed to PhysioNet by the developers of the BCI2000 instrumentation system for brain-computer interface research.

When using this resource, please cite the original publication: Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R. BCI2000: A General-Purpose Brain-Computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering 51(6):1034-1043, 2004.

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215â€“e220. RRID:SCR_007345.

https://physionet.org/content/eegmmidb/1.0.0/S001/#files-panel

In [None]:
# import required packages (fill in the blanks!)
# you may not use all of these
!pip -q install matplotlib
!pip -q install scikit-learn
!pip -q install mne
!pip -q install numpy

In [None]:
import mne
from mne.time_frequency import psd_array_welch
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
# bunch of stuff for svm analysis
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, ConfusionMatrixDisplay

Please download S001R03.edf from the dataset and import it to your drive!

In [None]:
from google.colab import drive
# mount your drive!

In [None]:
# from the files tab, copy your specific file path
file_path = ???

In [None]:
raw = mne.io.read_raw_edf(file_path, preload=True)
# apply a bandpass filter from 1-40HZ and collect info on raw

In [None]:
# https://mne.tools/stable/generated/mne.preprocessing.ICA.html
from mne.preprocessing import ICA

ica = ???

# lets fit and apply our ICA model to our raw data as usual
ica.fit(raw)
raw_clean = ica.apply(raw.copy())

I think that my comments may have been a bit overwhelming last time, so I tried to keep it a lot similar this time. Instead of me explaining all the code, I think it would be better for most of you to read through the documentation in the links I provided.

This will be pretty similar to how things will work once we get started on our project, as I won't be able to explain every code snippet in its entirety, so it will help in the long run to become familiar with navigating through library documentation to determine which function will work.

In [None]:
# take a look at this and find which function will determine the event id and events. keep in mind that the data set you're working with is annotated!
# https://mne.tools/stable/api/events.html

events, event_id = ???

In [None]:
print(event_id)


In [None]:
event_dict = {'left_fist': ???, 'right_fist': ???}
# at rest = class 1
# left fist movement/imagery = class 2
# right fist movement/imagery = class 3

#https://mne.tools/1.7/generated/mne.Epochs.html
epochs = mne.???(???, ???, event_id=???, tmin=0, tmax=3, preload=True, baseline = None)

In [None]:
X = epochs.get_data()
y = epochs.events[:, -1]

In [None]:
print(f"Shape: {X.shape}, Labels: {np.unique(y)}")
# shape should be like (n_epochs, n_channels, n_times)

In [None]:
sfreq = raw.info['sfreq']
freq_bands = {
    # what are the commonly accepted frequency ranges? you can look up standard EEG frequency bands in python
    'delta': (1, 4),
    'theta': (??, ??),
    'alpha': (??, ??),
    'beta':  (??, ??),
    'gamma': (??, ??),
}

In [None]:
def bandpower(data, sfreq, band):
    fmin, fmax = band
    # fmin = lower bound of the band and fmax = upper bound of the band (ex. fmin delta = 1 and fmax delta = 4)
    psd, freqs = psd_array_welch(??, ??, ??, ??, n_fft=??)
    # psd_array_welch: MNE function that does the PSD of the EEG signal using the Welch method
    # https://mne.tools/stable/generated/mne.time_frequency.psd_array_welch.html
    return np.mean(psd, axis=-1)

In [None]:
# lets make another list of arrays
freq_band_features = []

In [None]:
for name, (??, ??) in freq_bands.items():
    bp = bandpower(X, ??, (??, ??))
    freq_band_features.append(???)

In [None]:
# https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html this is usually what we use when we want to combina data into a feature matrix
X_features = np.???(freq_band_features, axis=1)
print("Feature matrix shape:", ???)

In [None]:
# normalize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_features)

In [None]:
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
X_train, X_test, y_train, y_test = ???(X_scaled, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
# https://scikit-learn.org/stable/modules/svm.html
svm = SVC(kernel="linear", C=1)
# fit the svm data
???(X_train, y_train)
svm_accuracy = svm.score(X_test, y_test)

# https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
knn = KNeighborsClassifier(n_neighbors=5)
???(???)
knn_accuracy = knn.score(X_test, y_test)

print(f"SVM accuracy: {??}")
print(f"KNN accuracy: {??}")

In [None]:
# make it 2D for graph analysis
# https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
# read through this to understand how scikit's pca functions work! HINT: read the examples, they are very similar (the same) to our work
pca = ???
X_pca = ???

svm_2d = SVC(kernel="linear", C=1)
# fit the svm data again using our 2d composition
???

knn_2d = ???(n_neighbors=???)
# fit the knn data (really similar to what you did earlier)
???

In [None]:
# Determine plot limits based on PCA points
x_min, x_max = ??[:,0].min() - 0.5, ??[:,0].max() + 0.5
y_min, y_max = ??[:,1].min() - 0.5, ??[:,1].max() + 0.5

In [None]:
# lets see what our svm boundary looks like
# https://scikit-learn.org/stable/modules/generated/sklearn.inspection.DecisionBoundaryDisplay.html
# if you're curious as to what DecisionBoundaryDisplay.from_estimator does read through scikit's description
DecisionBoundaryDisplay.from_estimator(svm_2d, X_pca, response_method="predict", alpha=0.6)
plt.scatter(X_pca[:,0], X_pca[:,1], c=y, edgecolor="k", s=50)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)

plt.title("SVM Decision Boundary")
plt.show()

In [None]:
# lets look at our knn decision boundary too
# please research what a decision boundary is as well if you're not sure (note that we talked about hyperplanes last class)
DecisionBoundaryDisplay.from_estimator(knn_2d, X_pca, response_method="predict", alpha=0.6)
plt.scatter(X_pca[:,0], X_pca[:,1], c=y, edgecolor="k", s=50)
plt.xlim(???, ???)
plt.ylim(???, ???)

plt.title("KNN Decision Boundary")
plt.show()

In [None]:
from sklearn.model_selection import cross_val_score
import numpy as np

# Range of k values to try, limited by the size of the training set
k_range = range(1, X_train.shape[0] + 1) # Max value of k can be X_train.shape[0]
cv_scores = []

# Evaluate each k using 5-fold cross-validation
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_scaled, y, cv=5, scoring='accuracy')
    cv_scores.append(scores.mean())

# Plot accuracy vs. k
plt.figure(figsize=(8, 5))
plt.plot(k_range, cv_scores, marker='o')
plt.title("k-NN Cross-Validation Accuracy vs k")
plt.xlabel("Number of Neighbors: k")
plt.ylabel("Cross-Validated Accuracy")
plt.grid(True)
plt.show()

# Best k - use nanargmax to correctly handle potential NaN values if present
best_k = k_range[np.nanargmax(cv_scores)]
print(f"Best k from cross-validation: {best_k}")

In [None]:
# train final model with best k
best_knn = KNeighborsClassifier(n_neighbors=best_k)
best_knn.fit(X_train, y_train)

# predict on test data https://scikit-learn.org/1.3/tutorial/statistical_inference/supervised_learning.html (look at the examples!)
y_pred = best_knn.???(X_test)

In [None]:
# lets build our confusion matrix https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
# hint: y_true = y_test in our cases
cm = ???
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(y))

plt.title(f"Confusion Matrix (k={best_k})")
plt.show()

In [None]:
# classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=[str(label) for label in np.unique(y)]))