<h1 class="text-center">BCI - Introduction to EEG classification for a MI BCI</h1>
<h2 class="text-center">February, 2022</h2>

<br>

The purpose of this tutorial is to implement a Motor Imagery BCI, using a public dataset (Cho, H., Ahn, M., Ahn, S., Kwon, M. and Jun, S.C., 2017. EEG datasets for motor imagery brain computer interface. GigaScience.). You will use MNE to load and pre-process the data and Sklearn+MNE for the classification part. 
</b></div>

- In Section I, exploration data analysis and epoching using MNE
- In Section II, a first classifier is trained based on Common Spatial Patterns and Linear Discriminant Analysis.
- In Section III, some possible improvements of this baseline pipeline
- MOABB toolbox is used in Section IV to implement more elaborated pipeline
- The last section (V) is an opportunity to improve and explore other pipelines from what you learnt today.

The code must be completed after each ❓ **Question** ❓ A blank cell with "HERE" appears as a comment in the code. The parameters that do not change the course of the story are accompanied "EDIT ME!" as a comment: you can change them at the time or at the end of the section to see the changes involved.

You can also find some 🔴 HINTS 🔴 with associated links to documentation and usefull functions.

 Then we define a hack that will hide the very verbose output of some functions.

In [None]:
import os, sys
class HiddenPrints:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')
    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout

# I - Motor Imagery Dataset: 

**Subjects were asked to imagine the hand movement (left vs right) depending on the instruction given.** 

Five or six runs were performed during the MI experiment. After each run, we calculated the classification accuracy over one run and gave the subject feedback to increase motivation. Between each run, a maximum 4-minute break was given depending on the subject’s demands. (cf [MOABB dataset](http://moabb.neurotechx.com/docs/generated/moabb.datasets.Cho2017.html) or
[gigadb datasets](http://gigadb.org/dataset/100295))


EEG data were collected using 64 Ag/AgCl active electrodes. A 64-channel montage based on the international 10-20 system was used to record the EEG signals with 512 Hz sampling rates. The EEG device used in this experiment was the Biosemi ActiveTwo system. The BCI2000 system 3.0.2 was used to collect EEG data and present instructions (left hand or right hand MI). 

### We load the dataset, and plot the sensor locations:

In [None]:
import matplotlib.pyplot as plt
from moabb.datasets import Cho2017

# define and load the dataset
ds = Cho2017()
raws = ds.get_data(subjects=[1,2])
raw = raws[1]['session_0']['run_0']

# show infos
print(raw.info)

# display the montage (sensors on the scalp)
plt.rcParams['figure.dpi'] = 150
raw.plot_sensors(ch_type='eeg',show_names=True, kind='3d')
#plt.show()
raw.plot_sensors(ch_type='eeg',show_names=True)
#plt.show()

### Here we plot the EEG data, that is:
$$ \mathbb{S} = \begin{pmatrix} s_{11} & s_{12} & \ldots & s_{1T}\\
                        s_{21} & s_{22} & \ldots & s_{2T}\\
                        \cdots & & & \cdots\\
                        s_{C1} & s_{C2} & \ldots & s_{CT} \end{pmatrix} $$
with $T$ the number of time points in the considered interval $[t_{min},t_{max}]$, $C$ the number of channels.

### ❓ **Question** ❓ Explore the signal and change the filtering options

In [None]:
plt.rcParams['figure.dpi'] = 150
scal = dict(eeg=1e-3)                      # EDIT ME!
raw.plot(n_channels=64, scalings=scal,
         start=15, duration=2,             # EDIT ME!
         lowpass=200, highpass=5,          # EDIT ME!
         show_scrollbars=False, show_scalebars=False)
plt.show()

### Let's show some of the studied events
#### ❓ **Question** ❓ Explore the signal and change the filtering options

🔴 HINTS 🔴  
Along the EEG data, they are some **Markers** that are triggers that corresponds to events. The **markers** are syncronized with EEG so it is possible to superimpose it.

We will use MNE function [`find_events`](https://mne.tools/stable/generated/mne.find_events.html).

In [None]:
from mne import find_events

# Get the event (left / right hand) by looking at the "stim" channel.
events = find_events(raw, shortest_event=0, verbose=True)

# Display tge EEG signals with the events 
scal = dict(eeg=5e-3)     # EDIT ME!
plt.rcParams['figure.dpi'] = 150
raw.plot(events=events, event_color='red', event_id=ds.event_id,
         scalings=scal, clipping=None, show_scrollbars=False, show_scalebars=False, 
         start=680,       # EDIT ME!
         duration=40,     # EDIT ME!
         n_channels=64)   # EDIT ME!
plt.show()

### Power Spectral Density (PSD)

We will perform the Fast Fourier Transform of the signal to study it in the frequency domain.

#### ❓ **Question** ❓ Try some other filtering and cropping of the data to see how it impact the PSD

In [None]:
raw_left = raw.copy()

# crop data between tmin and tmax
tmin,tmax=0,600                                                          # EDIT ME!
raw_left.crop(tmin,tmax)                                                     

# filter data
raw_left.filter(7., 30., fir_design='firwin', skip_by_annotation='edge') # EDIT ME!

# power spectral density
raw_left.plot_psd()
#plt.show()

# topomap with power spectral densities
plt.rcParams['figure.dpi'] = 100
raw_left.plot_psd_topo()
#plt.show()

### Let's epoch the data:

The **markers** will now be used to slice data accordingly and select **epochs** of interest.


Each epoch $\mathbb{S}_i$, with $i \in \{1, \ldots, n\}$, corresponds to a time window located at a given **event**.
An epoch will produce a sample to be classified, *i.e.* a row of the matrix
$$ \mathbb{X} = \begin{pmatrix}
f_1(\mathbb{S}_1) & f_2(\mathbb{S}_1) & \ldots & f_d(\mathbb{S}_1)\\
f_1(\mathbb{S}_2) & f_2(\mathbb{S}_2) & \ldots & f_d(\mathbb{S}_2)\\
\cdots & & & \cdots\\
f_1(\mathbb{S}_n) & f_2(\mathbb{S}_n) & \ldots & f_d(\mathbb{S}_n)\\
\end{pmatrix}.$$


🔴 HINTS 🔴  
- [`Epochs`](https://mne.tools/stable/auto_tutorials/epochs/10_epochs_overview.html)


In [None]:
from mne import Epochs, find_events, pick_types

def load_epoch(raws, subject_nb, event_id, fmin = 7., fmax = 35.):
    """Function to load epoched data for a specified subject"""
    
    raw = raws[subject_nb]['session_0']['run_0']

    # Apply band-pass filter
    raw.filter(fmin, fmax, fir_design='firwin', skip_by_annotation='edge')

    # Get the event (left / right hand) by looking at the "stim" channel.
    events = find_events(raw, shortest_event=0, verbose=True)

    picks = pick_types(raw.info, meg=False, eeg=True, stim=False, eog=False,
                       exclude='bads')
    tmin, tmax = -1., 4.
    # Read epochs (train will be done only between 1 and 2s)
    # Testing will be done with a running classifier
    epochs = Epochs(raw, events, event_id, tmin, tmax, proj=True, picks=picks,
                    baseline=None, preload=True)
    labels = epochs.events[:, -1] - 1
    return epochs, labels

In [None]:
ds = Cho2017()
event_id = ds.event_id
raws = ds.get_data(subjects=[1,2])

epochs, labels = load_epoch(raws, 1, event_id)
epoch_train = epochs.copy().crop(tmin=1., tmax=2.)
epochs_data_train = epoch_train.get_data()

### Some plotting for the data epochs

In [None]:
# Show epochs
plt.rcParams['figure.figsize'] = [15, 5]
plt.rcParams['figure.dpi'] = 100
max_sample = 4
max_channel= 8
first_epoch = 98
for s in range(max_sample):
    for c in range(max_channel):
        index = s*max_channel + c + 1
        plt.subplot(max_sample, max_channel, index)
        plt.axis('off')
        plt.plot(epoch_train.get_data()[s+first_epoch,c,:])
        title = f'E{s+first_epoch+1} C{c+1} L={labels[s+first_epoch]}'
        plt.title(title, fontsize=7)
plt.suptitle('EEG Dataset (E=epoch, C=channel, L=label)');
plt.show()

# II - A first classification pipeline: CSP + LDA
#### ❓ **Question** ❓ Assemble a first classification pipeline with:

1. Common Spatial Filter (feature extractor) 
2. Linear Discriminant Analysis as classifier

$$ \mbox{EEG data} \rightarrow CSP \rightarrow LDA \rightarrow \mbox{prediction}$$ 

🔴 HINTS 🔴
1. [CSP](https://mne.tools/0.23/generated/mne.decoding.CSP.html): `csp = CSP(n_components=4, reg=None, log=True, norm_trace=False)`
2. [LinearDiscriminantAnalysis](https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html)
3. [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) $\rightarrow$ reminder about how to use Pipeline in the Titanic BE in Section III.

In [None]:
import numpy as np
from sklearn.model_selection import ShuffleSplit, cross_val_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from mne.decoding import CSP

In [None]:
# Assemble CSP feature extractor
# HERE

In [None]:
# Assemble a classifier: LinearDiscriminantAnalysis
# HERE

In [None]:
# Use scikit-learn Pipeline
# HERE

#### ❓ **Question** ❓: use the sklearn functions [ShuffleSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html#sklearn.model_selection.ShuffleSplit) and [cross_val_score](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html) to evaluate this classifier.

🔴 HINTS 🔴  
- `ShuffleSplit` is used to create multiple train/test split in the data. 
- `cross_val_score` accept an outside splitter for cross validation (such as`ShuffleSplit`).

In [None]:
scores = []
epochs_data = epochs.get_data()

In [None]:
# Define shuffle split strategy
# HERE

In [None]:
# Evaluate the resulting classifier using cross-validation
# HERE

In [None]:
# Printing the results
class_balance = np.mean(labels == labels[0])
class_balance = max(class_balance, 1. - class_balance)
print("\n\nClassification accuracy: %f / Chance level: %f \n\n" % (np.mean(scores),
                                                          class_balance))

### We display now the CSP patterns
This corresponds to the weigths of the filters applied on each electrode. We have learned 4 filters.

In [None]:
# plot CSP patterns estimated on full data for visualization
with HiddenPrints():
    csp.fit_transform(epochs_data, labels)
    
print(csp.get_params())

#plt.rcParams['axes.grid'] = False
#csp.plot_filters(epochs.info, ch_type='eeg', units='Filters (AU)', size=1.5)
#plt.show()

plt.rcParams['axes.grid'] = False
csp.plot_patterns(epochs.info, ch_type='eeg', units='Patterns (AU)', size=1.5)
plt.show()

### Test the classifier on a sliding window

Here, we would wike to evaluate the predictive power of our pipeline using epochs of different length. It should help us to determine what is the minimal length to consider to have a reliable classfication.

#### ❓ **Question** ❓:
- First, compute cv_split using the its *split* attribute of the [ShuffleSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html#sklearn.model_selection.ShuffleSplit) cross-validation ```cv```.
- Then, in the loop on the couples (Training set,Validation set) of cv_split: 
    - compute X_train using the method *fit_transform* of the [Common Spatial Pattern](https://mne.tools/0.23/generated/mne.decoding.CSP.html) ```csp``` defined earlier, on the the epochs of the training set ```epochs_data_train[train_idx]``` and the labels ```y_train```.
    - train the [LinearDiscriminantAnalysis](https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html) classifier ```lda``` defined earlier using its method *fit* on the features ```X_train``` and the labels ```y_train```.
    - then, in the loop on the sliding windows, compute X_test using the method *transform* of the [Common Spatial Pattern](https://mne.tools/0.23/generated/mne.decoding.CSP.html) ```csp``` just trained, on the epochs of the testing set ```epochs_data[test_idx][:, :, n:(n + w_length)]```.
    
🔴 HINTS 🔴
- `csp.fit_transform` and `csp.transform`
- `lda.fit`

In [None]:
sfreq = raw.info['sfreq']
print(f'Sampling frequency: {sfreq}')

# window length = number of samples during 0.5secs
w_length = int(sfreq * 0.5)   # running classifier: window length 
# window step size = number of samples during 0.1secs
w_step = int(sfreq * 0.1)  # running classifier:
# different starting indices considering a w_length-length window and a w_step-length step
w_start = np.arange(0, epochs_data.shape[2] - w_length, w_step)

In [None]:
scores_windows = []
cv_split = cv.split(epochs_data_train)                                        # HERE



In [None]:
# for each couple Training set / Validation set
for train_idx, test_idx in cv_split:
    y_train, y_test = labels[train_idx], labels[test_idx]

    # fit CSP
    # HERE
    
    # fit LDA
    #HERE

    # running classifier: test classifier on sliding window
    score_this_window = []
    # for each time window
    for n in w_start:
        # compute the CSP components on the time window
        #HERE
        
        score_this_window.append(lda.score(X_test, y_test))
    scores_windows.append(score_this_window)

In [None]:
# Plot scores over time
w_times = (w_start + w_length / 2.) / sfreq + epochs.tmin
plt.plot(w_times, np.mean(scores_windows, 0), label='Score')
plt.axvline(0, linestyle='--', color='k', label='Onset')
plt.axhline(0.5, linestyle='-', color='k', label='Chance')
plt.xlabel('time (s)')
plt.ylabel('classification accuracy')
plt.title('Classification score over time')
plt.legend(loc='lower right')
plt.show()

### Cross-subject test
We will try to apply this learnt model on another participant. 
#### ❓ **Question** ❓: Load the data of the second participant.

🔴 HINTS 🔴  
- `load_epoch`
- `epochs.crop(tmin=.., tmax=..)`
- `epochs.get_data()`

Load data

In [None]:
# HERE

#### ❓ **Question** ❓: Evaluate the accuracy of the classifier $CSP+LDA$ trained on the first participant on the data of this second subject

🔴 HINTS 🔴  
- To slice windows : `epochs_data_s02[:][:, :, n:(n + w_length)]`
- `clf.score(X_test, y_test)` to compute directly accuracy

In [None]:
clf = Pipeline([('CSP', csp), ('LDA', lda)])
with HiddenPrints():
    # Fit classifier
    # HERE

score_this_window = []
# for each time window
for n in w_start:
    with HiddenPrints():
        # Compute cross validation score on the time window
        # HERE

In [None]:
print(score_this_window)
# Plot scores over time
w_times = (w_start + w_length / 2.) / sfreq + epochs_s02.tmin
print(len(w_times))
print(len(score_this_window))
plt.plot(w_times, score_this_window, label='Score')
plt.axvline(0, linestyle='--', color='k', label='Onset')
plt.axhline(0.5, linestyle='-', color='k', label='Chance')
plt.xlabel('time (s)')
plt.ylabel('classification accuracy')
plt.title('Classification score over time')
plt.legend(loc='lower right')
plt.show()

# III - Improve the Brain Computer Interface

### Temporal filtering
We used as a first approach a band-pass filtering between 7-35Hz. This can probably be improved. 

#### ❓ **Question** ❓: Find another range (band) that leads to a higher mean accuracy ([cross_val_score](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html))  

New data filtering

In [None]:
# HERE

Data classification

In [None]:
# HERE

# IV - Evaluation with the [MOABB](http://moabb.neurotechx.com/docs/index.html) toolbox
## Advanced temporal filtering: **filterbank**
**Filterbank** idea is to divide and conquer: filter the data on different sub-bands, apply the same pipeline on each sub-band and finally gather the decisions.  
In our previous approach, data were filtered only in one band, the one with the best performance.  

The sub-bands design will follow roughly the well known band of humain activity:


![Brain waves](img/brainwaves.png)  
(source: https://www.fitmind.co/blog-collection/brainwaves-in-meditation-brain-wave-frequencies).

We will take advantage again of the pipeline (CSP + LDA) as it seems to perform the best, and apply it with a Filter Bank approach. 

This time instead of converting data to Numpy format we will let [**MOABB**](https://github.com/NeuroTechX/moabb) handle everything and take advantage of the evaluation functions. Therefore, we will use [`FilterBankLeftRightImagery`](http://moabb.neurotechx.com/docs/generated/moabb.paradigms.FilterBankLeftRightImagery.html) paradigm from MOABB and the function [`WithinSessionEvaluation`](http://moabb.neurotechx.com/docs/generated/moabb.evaluations.WithinSessionEvaluation.html).

In [None]:
from sklearn.pipeline import make_pipeline
from moabb.paradigms import FilterBankLeftRightImagery
from moabb.pipelines.utils import FilterBank
from moabb.evaluations import WithinSessionEvaluation

In [None]:
ds = Cho2017()
ds.subject_list = [1,2] # Use only the two first subjects

In [None]:
pipelines_fb = {}
pipelines_fb["FBCSP+LDA"] = make_pipeline(FilterBank(CSP(n_components=4)), LinearDiscriminantAnalysis())

In [None]:
filters = [[8, 12], [12, 16], [16, 20], [20, 24]]  # HERE
paradigm = FilterBankLeftRightImagery(filters=filters)
evaluation = WithinSessionEvaluation(
    paradigm=paradigm, datasets=ds, overwrite=True)
with HiddenPrints():
    results_fb = evaluation.process(pipelines_fb)
results_fb.head()

Slight improvement of the performance ! Could be possible to do even better with filterbank taking advantage of higher frequencies !

#### ❓ **Question** ❓: Evaluate the previous approach with higher frequencies.

### Cross-subjects
One of the advantages of MOABB is that it allows to evaluate directly the previous pipeline ```pipelines_fb``` in the cross-subject context.

#### ❓ **Question** ❓: Use the MOABB evaluation [`CrossSubjectEvaluation`](http://moabb.neurotechx.com/docs/generated/moabb.evaluations.CrossSubjectEvaluation.html) to compute the scores in the cross-subject settings.

🔴 HINTS 🔴  
- For `evaluation = CrossSubjectEvaluation` you need a `paradigm` and a `dataset`
- Then you can run a pipeline `evaluation.process(pipeline)`

In [None]:
from moabb.evaluations import CrossSubjectEvaluation
# HERE

In [None]:
results_fb.head()

Performance are quite low... It is not suprising as cross-subject, along cross-session, classification is one of the most challenge of the BCI !

# V - Another approach: Riemannian geometry

First, make sure you uploaded the pictures *riemann.png*, *riemann_embeding.png* and *brainwaves.png* in the folder of the notebook.

For this Riemannian method, the first step is to compute covariance matrix for each epoch. The idea is to represent an epoch with the covariance matrix, instead of the raw data. It is depicted in the following picture (from P. L. C. Rodrigues, *Exploring invariances of multivariate time series via Riemannian geometry: validation on EEG data*).



![Riemann embeding](img/riemann_embeding.png)  



Then these covariance matrices are projected on the Tangent Space of the manifold of the SPD (Symetric Positive-Definite) matrices (the tangent space $\approx$ the SPD manifold). 



![Riemann](img/riemann.png)



The projection reduces the dimension of the matrix that becomes a vector. The vectors are then classified using a [Random Forest classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).

$$ \mbox{EEG data} \rightarrow Covariance\ matrix \rightarrow Projection\ on\ Tangent\ Space \rightarrow Standard\ Scaler \rightarrow Random\ Forest \rightarrow\mbox{prediction}$$ 

In [None]:
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler

epochs, labels = load_epoch(raws, event_id, 1, fmin = 5., fmax = 50)
epoch_train = epochs.crop(tmin=1., tmax=2.)

# Convert from MNE object to numpy Nd-array
epochs_data_train = epochs.get_data()

# Assemble feature extractor 
cov = Covariances(estimator='scm')
ts = TangentSpace()
ss = StandardScaler()

# Assemble a classifier
rf = RandomForestClassifier()

# Use scikit-learn Pipeline
clf = Pipeline([('cov', cov), ('ts', ts), ('ss', ss), ('rf', rf)])

# Evaluate the resulting classifier using cross-validation
scores = cross_val_score(clf, epochs_data_train, labels, cv=10, n_jobs=1,verbose=False)
print('Mean score:', np.mean(scores))

Quite powerfull, even without any tunning ! It is probably possible to improve results with better pre-processing, testing other covariance matrix estimator. Here we used `"scm"` which stand for 'Sample Covariance Matrix', the maximum likelihood estimator. 

Some regularization could be considered: `"lwf"` for Ledowit Wolf, `"oas"` for Oracle Aproximating Shrinkage or "`sch`" (oui ma gatée) for Schaefer-Strimmer covariance. Some tunning of the RandomForestClassifier should also be considered. 

#### ❓ **Question** ❓: Do better !!
🔴 HINTS 🔴  
- [Regularized covariance estimation](https://pyriemann.readthedocs.io/en/latest/generated/pyriemann.utils.covariance.covariances.html#pyriemann.utils.covariance.covariances) 
- [Cross-validation](https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics) procedure for hyper-parameters selection of the [RandomForest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
- Try another classifier (for instance [XGBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)).

# VI - Built a better BCI using another pipeline on this Moter Imagery Dataset
#### ❓ **Question** ❓:  Improve one of the previous pipelines:
- select parameters using cross-validation (e.g. n_components of CSP),
- try other regularizations of the covariance matrices,
- use other data preprocessing, filters.

#### ❓ **Question** ❓: Try other classifiers, other features