# Welcome to the overview of medusa.bci.erp_spellers module!

This module contains high level classes and functions specifically designed
for ERP-based spellers. This notebook will cover the main features, functions
and classes of the module through illustrative examples which will show you
the power of the included tools.

In this notebook you will learn:
    - What is an ERP-based speller
    - Download an open ERP-speller dataset and explore the files
    - Create an instance of ERPSpellerDataset
    - Know the feature extraction and decoding functions included in the module
    - Implement an asynchronous ERP-based speller using the included models

Do not forget to check the documentation if you do not understand something!

## Introduction

ERP-based spellers are


## Imports

Import the modules that will be used in this notebook

In [13]:
# General imports
import glob
from tabulate import tabulate
import numpy as np

# Medusa imports
from medusa import components
from medusa import meeg
from medusa.bci import erp_spellers

## Download the dataset

As strong supporters of open science, we have released and adapted some
valuable datasets that can be very useful for researchers and practitioners.
These datasets can be downloaded manually from www.medusa.com/datasets/ or
using a simple API. In this case, we will use the API. Run the following cell
to download the GIB-UVa ERP dataset [1].

Each file is an instance of medusa.data_structures.Recording. This class
contains the information of the performed experiment and the recorded biosignals
In this case, the recordings contain an instance of
medusa.components.ERPSpellerData, which is the default class for this
experiment. Additionally, all recordings contain a medusa.meeg.EEG instance.

In [14]:
# TODO: Download dataset
# dataset_folder = os.getcwd()

## ERPSpellerDataset class

This class is used to define an ERPSpeller dataset, which contains all the
necessary information to work with the functions and classes of the module and
checks for common errors.

First, we have to define the channel set of the dataset. The signals added
to the dataset will be adapted to this channel set, discarding the rest of EEG
channels. In addition, the channels will be reordered if necessary. This
avoids errors in heterogeneous datasets, and, believe me, saves tons of time
wasted debugging machine learning algorithms. In this case, we will use 4 EEG
channels: Fz, Cz, Pz, and Oz.

We also have to define other parameters. The sample rate of the recordings
is set to 256 Hz (if a file has different sample rate, it will throw  an error).
We also define the keys to find the target biosignal (EEG) and experiment data
(ERPSpellerData) attributes in the recording class, a well as the experiment
mode, which is  set to train because we will use this dataset to train a
model later. Check the documentation to understand the details of this and
more parameters.

In [15]:
cha_set = meeg.EEGChannelSet()
cha_set.set_standard_channels(l_cha=['Fz', 'Cz', 'Pz', 'Oz'])
dataset = erp_spellers.ERPSpellerDataset(channel_set=cha_set,
                                         fs=256,
                                         biosignal_att_key='eeg',
                                         experiment_att_key='erpspellerdata',
                                         experiment_mode='train')

## Add recordings to the dataset

Now, we have to add the recordings to the dataset. With this purpose, we read
the files that were downloaded and use the function add_recordings of our
dataset. Note that this function admits instances of medusa.components.Recording
or a list of paths. For convenience, we will use the second option in this case.

In [16]:
folder = 'data'
file_pattern = '*.rcp.bson'
files = glob.glob('%s/%s' % (folder, file_pattern))
dataset.add_recordings(files)

## Explore some functions and classes

Once we have defined our dataset, we can start to play! First, we extract
ERP features using the class StandardFeatureExtraction with default parameters.
This class extracts the EEG epochs after each stimulus onset

Afterwards, we explore some decoding functions of the module, which translate
epoch classification scores into commands,

In [18]:
# Extract ERP features of the dataset using the transform_dataset of
# StandardFeatureExtraction.
feat_extractor = erp_spellers.StandardFeatureExtraction()
x, x_info = feat_extractor.transform_dataset(dataset)

# Print some info of the extracted features
data_exploration = [
    ['Runs', np.unique(x_info['run_idx']).shape[0]],
    ['Epochs', x.shape[0]],
    ['Target', np.sum(x_info['erp_labels']==1)],
    ['Non-target', np.sum(x_info['erp_labels']==0)]
]
print('\nData exploration: \n')
print(tabulate(data_exploration))

# Check command decoding
selected_commands, selected_commands_per_seq, cmd_scores = \
    erp_spellers.decode_commands(x_info['erp_labels'],
                                 x_info['paradigm_conf'],
                                 x_info['run_idx'],
                                 x_info['trial_idx'],
                                 x_info['matrix_idx'],
                                 x_info['level_idx'],
                                 x_info['unit_idx'],
                                 x_info['sequence_idx'],
                                 x_info['group_idx'],
                                 x_info['batch_idx'])

# Decode commands using the erp labels. This would give a 100% accuracy, but we
# introduce error in trial 0 to check that everything works fine
selected_commands[0][0][0][1] = 2
cmd_acc = erp_spellers.command_decoding_accuracy(
    selected_commands,
    x_info['spell_target']
)
print('\nCommand decoding accuracy:\n')
print('All sequences: %.2f %%' % (cmd_acc * 100))

# Decode commands per number of sequences of stimulation using the erp labels.
# This would give a 100% accuracy, but we introduce error in trial 0 sequence
# 14 to check that everything works fine
selected_commands_per_seq[0][0][0][14][1] = 2
cmd_acc_per_seq = erp_spellers.command_decoding_accuracy_per_seq(
    selected_commands_per_seq,
    x_info['spell_target']
)
table_cmd_acc_per_seq = ['Command decoding accuracy']
table_cmd_acc_per_seq += cmd_acc_per_seq.tolist()
headers = [''] + list(range(1, 16))
print(tabulate(table_cmd_acc_per_seq, headers=headers))


Data exploration: 

----------  ----
Runs           8
Epochs      8640
Target      1440
Non-target  7200
----------  ----

Command decoding accuracy:

All sequences: 97.92 %


KeyboardInterrupt: 

## Command decoding model

Although the previous functions are really powerful, and you can build your
own EEG processing framework for ERP-based spellers easily with them, there is
several built-in classes that combine them all to provide an easy-to-use model,
ready to be used in your projects.

In this tutorial, we will use some of the more powerful processing algorithms
designed to date for each of the tasks. Take into account that models implement
all the stages of the signal processing pipeline using the available functions
in Medusa: (1) preprocessing, (2) feature extraction, (3) feature selection,
(4) feature classification, and (5) command decoding. Concretely, we will use
a deep convolutional neural network specifically designed for BCI applications:
EEG-Inception. Check reference to understand all the details of this model [1].
In order to continue with the tutorial, we recommend a python environment with
the GPU version of tensorflow (v 2.0+) to reduce the training time. If you don't
have access to this environment, don't worry, it will take only a few minutes.

In [None]:
# Instantiate ERPSpellerModel
model = erp_spellers.CMDModelEEGInception()
# Train model
fit_results = model.fit_dataset(dataset)

print('Train command decoding accuracy per sequence: %s' %
      str(fit_results['spell_acc_per_seq']))

## Control state detection model

There is one limitation that the previous model cannot solve. ERP-based spellers
are, inherently, synchronous systems. They always make a selection even when the
user is not attending to the stimuli. Nevertheless, this behaviour is not
suitable for real applications, where an asynchronous operation is required.
Do you imagine a web-browser based on an ERP-based speller in which you
cannot read the web page you searched because the system keeps making
selections?. Despite the futility of a synchronous system, most approaches to
date do not address this issue. In order to achieve an asynchronous approach,
we will need 2 types of models:

    - Command decoding models (CMD): these models are in charge of the
      command decoding task by detecting the ERPs within the EEG epochs to
      decode the user's instentions. The previous model solves this task.
    - Control state detection models (CSD): these models are in charge of the
      control state detection task, which detects if users are attending the
      stimuli or if they are engaged in other activities.

It is important to understand that both models are necessary to build the
complete signal processing pipeline of an ERP-based speller. Scientists have
been traditionally focused in the first type. Nevertheless, an ERP-based speller
is useless if it does not detect if the user is controlling the BCI application
or is doing something else! Keep that in mind in your designs for real-life
applications. In last years, we have worked to solve this issue, achieving quite
a success. To know more about this problem and our work in the field, check
references [2], [3] and [4].

In ths tutorial we will use the class medusa.erp_spellers.CSDModelEEGInception
to achieve an asynchronous ERP-based speller.

In [None]:
# Instantiate ERPSpellerModel
model = erp_spellers.CSDModelEEGInception()
# Train model
csd_fit_results = model.fit_dataset(dataset)

print('Train control state accuracy per sequence: %s' %
      str(csd_fit_results['control_state_acc_per_seq']))

## Conclusion

That's all for now! Now you have a comprehensive picture of the functions and
classes included in the module. As you can see, you can build the full signal
processing pipeline of an ERP-based speller in few code lines using Medusa!

See you in the next tutorial.

## References

Check the following references for extended information about some of the
aspects of this tutorial:

1. Santamaría-Vázquez, E., Martínez-Cagigal, V., Vaquerizo-Villar, F., &
Hornero, R. (2020). EEG-Inception: A Novel Deep Convolutional Neural Network for
Assistive ERP-based Brain-Computer Interfaces. IEEE Transactions on Neural
Systems and Rehabilitation Engineering.
2. Santamaría-Vázquez, E., Martínez-Cagigal, V., Gomez-Pilar, J., & Hornero,
R. (2019). Asynchronous Control of ERP-Based BCI Spellers Using Steady-State
Visual Evoked Potentials Elicited by Peripheral Stimuli. IEEE Transactions on
Neural Systems and Rehabilitation Engineering, 27(9), 1883-1892.
3. Martínez-Cagigal, V., Santamaría-Vázquez, E., & Hornero, R. (2019). 
Asynchronous control of P300-based brain–computer interfaces using sample 
entropy. Entropy, 21(3), 230.
4. XXX
