# BCI-I & IDUN Challenge Example Submission

This script shows how to load the competition data and train a simple model before preparing a submission file.

Note that here we use data that has been prepared with `epochs` for each sleep event. Check `01_data_preparation.ipynb` for more details on how the data was prepared and `02_tutorial.ipynb` for how to fit and evaluate your own models.

You need to format your submission file as a `.csv` file **exactly** as shown in this example. If you decide to apply your own preprocessing steps, make sure to save the data in the same format as the original data. In particular, the number and order of epochs must match the data we've provided exactly (which should be the case if you followed the instructions in `01_data_preparation.ipynb`).
The `.csv` file must have two columns: `ID` and `Marker`. `ID` is the ID of the trial (integer, starting from 0) and `Marker` is the predicted label (the string with the marker name).


In [10]:
import numpy as np
from pathlib import Path
import pandas as pd


### Load all data
See `Tutorial.ipynb` for more details on how to load and use the data. Here we train a SVM classifier on all training data and evaluate it on the test data.

In [11]:
data_dir = Path(r'')

train_subjects = ["S001", "S002", "S003"]
test_subjects = ["S004",]


Xs = [np.load(data_dir / f"{subject}_X.npy") for subject in train_subjects]
Ys = [np.load(data_dir / f"{subject}_Y.npy") for subject in train_subjects]

X_test = np.load(data_dir / f"{test_subjects[0]}_X.npy")
# there's no labels for the test set


X_train = np.concatenate(Xs)
Y_train = np.concatenate(Ys)

print(X_train.shape, Y_train.shape, X_test.shape)


(2542, 7501) (2542,) (448, 7501)


## Fit a model
We fit a simple SVM model with a radial basis function kernel to all training data.
See `02_tutorial.ipynb` for more details on how to fit and evaluate your own models.

In [12]:
from sklearn.svm import SVC
from sklearn.metrics import f1_score


clf = SVC(kernel='rbf', random_state=42, class_weight="balanced")  # class_weight None or "balanced"
clf.fit(X_train, Y_train)

train_f1 = f1_score(Y_train, clf.predict(X_train), average='weighted')
print(f"Train F1 score: {train_f1:.4f}")




Train F1 score: 0.2302


## Prepare a submission file
We predict the labels for the test set and prepare a submission `.csv` file.
The file must have two columns: `ID` and `Marker`. 
`ID` is the ID of the trial (integer, starting from 0) and `Marker` is the predicted label (the string with the marker name).


In [14]:
Y_test_pred = clf.predict(X_test)
print(Y_test_pred[:5])

['S' 'S' 'S' 'S' 'S']


We organize the data in a `DataFrame` and save it to `.csv`. After this, you simply need to upload the csv file to Kaggle and see how well you did!

In [15]:
IDs = np.arange(len(Y_test_pred))

df = pd.DataFrame({"ID":IDs, "Marker": Y_test_pred})
df.head()
df.to_csv("example_submission.csv", index=False)

