(prior-work)=
# Prior Work

```{admonition}  Deep Learning on EEG studies prior to 2017 only had limited comparisons to feature-based baselines 
* Few studies had an external baseline
* Decoding problems were very varied
* Remained unclear how deep learning approaches compare to well-tuned feature-based approaches
```

| Study                                                                                                                              | Decoding problem                                                                                                                     | External baseline                                                               |
|:-----------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
| Single-trial EEG classification of motor imagery using deep convolutional neural networks, {cite}`tang_single-trial_2017`           | Imagined movement classes, within-subject                                                                                            | FBCSP                                                                           |
| EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces, {cite}`lawhern_eegnet:_2016`                      | Oddball response (RSVP), error response (ERN), movement classes (voluntarily started and imagined)                                   |                                                                                 |
| Remembered or Forgotten? –- An EEG-Based Computational Prediction Approach, {cite}`sun_remembered_2016`                            | Memory performance, within-subject                                                                                                   |                                                                                 |
| Multimodal Neural Network for Rapid Serial Visual Presentation Brain Computer Interface, {cite}`manor_multimodal_2016`             | Oddball response using RSVP and image (combined image-EEG decoding), within-subject                                                  |                                                                                 |
| A novel deep learning approach for classification of EEG motor imagery signals, {cite}`tabar_novel_2017`                           | Imagined and executed movement classes, within-subject                                                                               | FBCSP, Twin SVM, DDFBS, Bi-spectrum, RQNN                                       |
| Predicting Seizures from Electroencephalography Recordings: A Knowledge Transfer Strategy, {cite}`liang_predicting_2016`           | Seizure prediction, within-subject                                                                                                   |                                                                                 |
| EEG-based prediction of driver's cognitive performance by deep convolutional neural network, {cite}`hajinoroozi_eeg-based_2016`    | Driver performance, within- and cross-subject                                                                                        |                                                                                 |
| Deep learning for epileptic intracranial EEG data, {cite}`antoniades_deep_2016`                                                    | Epileptic discharges, cross-subject                                                                                                  |                                                                                 |
| Learning Robust Features using Deep Learning for Automatic Seizure Detection, {cite}`thodoroff_learning_2016`                      | Start of epileptic seizure, within- and cross-subject                                                                                | Hand crafted features + SVM                                                     |
| Single-trial EEG RSVP classification using convolutional neural networks, {cite}`george_single-trial_2016`                         | Oddball response (RSVP), groupwise (ConvNet trained on all subjects)                                                                 |                                                                                 |
| Wearable seizure detection using convolutional neural networks with transfer learning, {cite}`page_wearable_2016`                  | Seizure detection, cross-subject, within-subject, groupwise                                                                          | Multiple: spectral features, higher order statistics + linear-SVM, RBF-SVM, ... |
| Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks, {cite}`bashivan_learning_2016`                | Cognitive load (number of characters to memorize), cross-subject                                                                     |                                                                                 |
| Deep Feature Learning for EEG Recordings, {cite}`stober_learning_2016`                                                             | Type of music rhythm, groupwise (ensembles of leave-one-subject-out trained models, evaluated on separate test set of same subjects) |                                                                                 |
| Convolutional Neural Network for Multi-Category Rapid Serial Visual Presentation BCI, {cite}`manor_convolutional_2015`             | Oddball response (RSVP), within-subject                                                                                              |                                                                                 |
| Parallel Convolutional-Linear Neural Network for Motor Imagery Classification, {cite}`sakhavi_parallel_2015`                       | Imagined movement classes, within-subject                                                                                            |                                                                                 |
| Using Convolutional Neural networks to Recognize Rhythm Stimuli form Electroencephalography Recordings, {cite}`stober_using_2014`  | Type of music rhythm, within-subject                                                                                                 |                                                                                 |
| Convolutional deep belief networks for feature extraction of EEG signal, {cite}`ren_convolutional_2014`                            | Imagined movement classes, within-subject                                                                                            |                                                                                 |
| Deep feature learning using target priors with applications in ECoG signal decoding for BCI, {cite}`wang_deep_2013`                | Finger flexion trajectory (regression), within-subject                                                                               |                                                                                 |
| Convolutional neural networks for P300 detection with application to brain-computer interfaces, {cite}`cecotti_convolutional_2011` | Oddball / attention response using P300 speller, within-subject                                                                      | Multiple: Linear SVM, gradient boosting, E-SVM, S-SVM, mLVQ, LDA, ...           |

% rework table into jst numbers per task for overlapping task, remaining ones, mention separately
% in our work, we therefore looked most closely at movement-related decoding as most common

```{admonition}  Most investigated network architectures were fairly shallow (below 4 layers)
* Unlike architectures in computer vision, most EEG DL architectures had only 1-3 convolutional layers
* Unlike architectures in computer vision, many architectures used several dense layers
```

In [None]:
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn
import numpy as np
import re
from myst_nb import glue
seaborn.set_palette('colorblind')
seaborn.set_style('darkgrid')

%matplotlib inline
%config InlineBackend.figure_format = 'png'
#matplotlib.rcParams['figure.figsize'] = (12.0, 1.0)
matplotlib.rcParams['font.size'] = 14

ls = np.array([' 2/2 ', ' 3/1 ', ' 2/2 ', ' 3/2 ', ' 1/1 ', ' 1/2 ', ' 1/3 ',
       ' 1–2/2 ', ' 3/1 (+ LSTM as postprocessor) ', ' 4/3 ', ' 1-3/1-3 ',
       ' 3–7/2 (+ LSTM or other temporal post-processing (see design choices)) ',
       ' 2/1 ', ' 3/3 (Spatio-temporal regularization) ',
       ' 2/2 (Final fully connected layer uses concatenated output by convolutionaland fully connected layers) ',
       ' 1-2/1 ',
       '2/0 (Convolutional deep belief network, separately trained RBF-SVM classifier) ',
       ' 3/1 (Convolutional layers trained as convolutional stacked autoencoder with target prior) ',
       ' 2/2 '])

conv_ls = [l.split('/')[0] for l in ls]
low_conv_ls = [int(re.split(r'[–-]', c)[0])for c in conv_ls]
high_conv_ls = [int(re.split(r'[–-]', c)[-1])for c in conv_ls]
dense_ls = [l.split('/')[1] for l in ls]
low_dense_ls = [int(re.split(r'[–-]', c[:8])[0][:2])for c in dense_ls]
high_dense_ls = [int(re.split(r'[–-]', c[:8])[-1][:2])for c in dense_ls]

all_conv_ls = np.concatenate([np.arange(low_c, high_c+1) for low_c, high_c in zip(low_conv_ls, high_conv_ls)])
all_dense_ls = np.concatenate([np.arange(low_c, high_c+1) for low_c, high_c in zip(low_dense_ls, high_dense_ls)])
bincount_conv = np.bincount(all_conv_ls)
bincount_dense = np.bincount(all_dense_ls)
rng = np.random.RandomState(98349384)
color = 'grey'
fig = plt.figure(figsize=(8,4))
for low_c, high_c in zip(low_conv_ls, high_conv_ls):
    offset = rng.randn(1) * 0.1
    tried_cs = np.arange(low_c, high_c+1)
    plt.plot([offset,] * len(tried_cs), tried_cs, marker='o', alpha=0.5, color=color, ls=':')
    
for i_c, n_c in enumerate(bincount_conv):
    plt.scatter(0.4, i_c, color=color, s=n_c*40)
    plt.text(0.535, i_c, str(n_c)+ "x", ha='left', va='center')

for low_c, high_c in zip(low_dense_ls, high_dense_ls):
    offset = 1 + rng.randn(1) * 0.1
    tried_cs = np.arange(low_c, high_c+1)
    plt.plot([offset,] * len(tried_cs), tried_cs, marker='o', alpha=0.5, color=color, ls=':')
    
for i_c, n_c in enumerate(bincount_dense):
    plt.scatter(1.4, i_c, color=color, s=n_c*40)
    plt.text(1.535, i_c, str(n_c)+ "x", ha='left', va='center')

plt.xlim(-0.5,2)
plt.xlabel("Type of layer")
plt.ylabel("Number of layers")
plt.xticks([0,1], ["Convolutional", "Dense"], rotation=45)
plt.yticks([1,2,3,4,5,6,7]);
plt.title("Number of layers in prior works' architectures", y=1.05)
glue('layernum_fig', fig)
plt.close(fig)
None

```{glue:figure} layernum_fig


*Number of layers in prior work*. Small grey markers represent individual architectures. Dashed lines indicate different number of layers investigated in a single study (e.g., a single study investigated 3-7 convolutional layers). Larger grey markers indicate sum of occurences of that layer number over all studies (e.g., 9 architectures used 2 convolutional layers). Note most architectures use only 1-3 convolutional layers.
```
%:figclass: margin-caption

```{admonition}  Prior work varied widely in which design choices and training strategies were compared
* Six studies did not compare any design choices or training strategies
* Most common was to try different kernel sizes
* Only one study evaluated a wider range of hyperparameters for both design choices and training strategies
```

| Study                                                                                                                              | Design choices                                                                                                                                                                                  | Training strategies                                                                                                                      |
|:-----------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------|
|{cite}`lawhern_eegnet:_2016`                      | Kernel sizes                                                                                                                                                                                    |                                                                                                                                          |
|{cite}`sun_remembered_2016`                            |                                                                                                                                                                                                 | Different time windows                                                                                                                   |
|{cite}`tabar_novel_2017`                           | Addition of six-layer stacked autoencoder on ConvNet features <br> Kernel sizes                                                                                                              |                                                                                                                                          |
| {cite}`liang_predicting_2016`           |                                                                                                                                                                                                 | Different subdivisions of frequency range <br>Different lengths of time crops <br>Transfer learning with auxiliary non-epilepsy datasets |
|{cite}`hajinoroozi_eeg-based_2016`    | Replacement of convolutional layers by restricted Boltzmann machines with slightly varied network architecture}                                                                                 |                                                                                                                                          |
|{cite}`antoniades_deep_2016`                                                    | 1 or 2 convolutional layers                                                                                                                                                                     |                                                                                                                                          |
|{cite}`page_wearable_2016`                  |                                                                                                                                                                                                 | Cross-subject supervised training, within-subject finetuning of fully connected layers                                                   |
|{cite}`bashivan_learning_2016`                | Number of convolutional layers <br>Temporal processing of ConvNet output by max pooling, temporal convolution, LSTM or temporal convolution + LSTM                                              |                                                                                                                                          |
|{cite}`stober_learning_2016`                                                             | Kernel sizes                                                                                                                                                                                    | Pretraining first layer as convolutional autoencoder with different constraints                                                          |
|{cite}`sakhavi_parallel_2015`                       | Combination ConvNet and MLP (trained on different features) vs. only ConvNet vs. only MLP                                                                                                       |                                                                                                                                          |
|{cite}`stober_using_2014`  | Best values from automatic hyperparameter optimization: frequency cutoff, one vs two layers, kernel sizes, number of channels, pooling width                                                    | Best values from automatic hyperparameter optimization: learning rate, learning rate decay, momentum, final momentum                     |
|{cite}`wang_deep_2013`                | Partially supervised CSA                                                                                                                                                                        |                                                                                                                                          |
|{cite}`cecotti_convolutional_2011` | Electrode subset (fixed or automatically determined) <br>Using only one spatial filter <br>Different ensembling strategies                                                                      |                                                                                                                                          |



In [None]:
# input domain plot (first split off comma -> time or req, then split off by dash, first and last start is int)

## References

```{bibliography} ./references.bib
```
