In [54]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


# Training and Hyper-paramters optimization

## Training

* Supervised learning with given labels 
* Binary classification : 
    * 2 classes: `Left (L)` & `Right (R)`
    * `Sigmoid` Activation function at the classifcation output layer 
    * `0.5` threshold for balanced dataset
    * `BCELoss` :  Binary Cross Entropy between the target and the output
    * During Training: `80% Train set` , `20% Validation set`

| Subject | Sessions |
|    :----:   |   :----:  |
|[1,2,3,4,5,6,7,8,9,10] | [1,3,5,7,9 ] , [ 1,3,5,7,9 ], [ 2,3,6,7 ], [ 1,3,6,7,9 ], [ 2,3,5,7 ],[ 2,3,5,7 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ] |



## Hyper-parameters :

### Model Hyper-parameters: 
   
* Kernel size
* Filter length (spatial & temporal )
* Number of layers
* Batch normalization
* Dropout probablity 
* ...

      


### Training Hypre-parameters

* `optimizers` = ["Adam", "RMSprop", "SGD"]
*  `lrs` =  [1e-5, 1e-1]
*  `batch_size`=  [32, 64, 128]
*  `input_norm` =  ["std", "minmax"]
*  `norm_axes`  :
    * `0` = Trials
    * `1` = Channels
    * `2` = Timepoints

# Shallow net
<img align="right" width="300" height="300" src="./shallow_archi.png">

## Model Hyper-parameters
* `n_filters_time` = trial.suggest_int("n_filters_time", low=20, high=60, step=10)
* `filter_time_length` = trial.suggest_int("filter_time_length", low=10, high=40, step=5)
* `n_filters_spat` = trial.suggest_int("n_filters_spat", low=20, high=60, step=10)
* `pool_time_length` = trial.suggest_int("pool_time_length", low=10, high=80, step=25)
* `pool_time_stride` = trial.suggest_int("pool_time_stride", low=15, high=45, step=15)
* `drop_prob` = trial.suggest_float("drop_prob", 0, 1)
* `batch_norm` = trial.suggest_categorical("batch_norm", [True, False])
* `batch_norm_alpha` = trial.suggest_float("batch_norm_alpha", 0, 1)   

### Training Dataset 

| Runs | Trial87 |Trial26 |
| :---:        |    :----:   |  :----: |
| Subjects| [1,2,3] | [1,2,3,4,5,6,7,8,9,10] |
| Sessions |  [1,3,5,7,9] , [ 1,3,5,7,9], [ 2,3,6,7 ] | [1,3,5,7,9 ] , [ 1,3,5,7,9 ], [ 2,3,6,7 ], [ 1,3,6,7,9 ], [ 2,3,5,7 ],[ 2,3,5,7 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ],[ 1,3,5,7,9 ]|



In [55]:
%tensorboard --logdir ./final_results/shallow/filter_data/
#| Runs    | Trial87 |Trial26                 |
#| Subjects| [1,2,3] | [1,2,3,4,5,6,7,8,9,10] |

Reusing TensorBoard on port 6007 (pid 28597), started 1:12:43 ago. (Use '!kill 28597' to kill it.)

###  Input Interval

* **Dataset Imbalance** with the input data intervall `[2000, 8000]`

| Runs | Trial 26   | Trial 81 | 
| - | - | - |
|Input interval| [2000, 8000]     | [2000, 7000] | 
 


In [56]:
%tensorboard --logdir ./final_results/shallow/interval1/
#| Runs         | Trial 26         | Trial 81     | 
#|Input interval| [2000, 8000]     | [2000, 7000] | 

Reusing TensorBoard on port 6008 (pid 28631), started 1:12:42 ago. (Use '!kill 28631' to kill it.)

## TOP 3-Model

|                    |   Trial 81: acc~ 0.976  |   Trial 10: acc~0.972   | Trial 24: acc~0.969     |
|--------------------|:-----------------------:|:-----------------------:|-------------------------|
|     batch_norm     |           True          |           True          |           True          |
|  batch_norm_alpha  |    0.9154121627636327   |   0.00404122604380025   |   0.30694568550541435   |
|     batch_size     |            64           |            64           |            64           |
|      drop_prob     | **0.10764970247279382** | **0.07107984516053512** | **0.31874295683818593** |
| filter_time_length |           100           |            90           |           100           |
|     input_norm     |           std           |           std           |           std           |
|         lr         |    0.0455351928752179   |   0.05304083971721983   |   0.021135555778184406  |
|   n_filters_spat   |            20           |            30           |            20           |
|   n_filters_time   |            30           |            20           |            20           |
|      norm_axes     |            1            |          **0**          |          **0**          |
|      optimizer     |           SGD           |           SGD           |           SGD           |
| pool_time_length   |            50           |            50           |            40           |
| pool_time_stride   |            45           |            45           |            30           |

In [57]:
%tensorboard --logdir ./final_results/shallow/best_3/
#|                    |   Trial 81: acc~ 0.976  |   Trial 10: acc~0.972   | Trial 24: acc~0.969     |
#|      drop_prob     | **0.10764970247279382** | **0.07107984516053512** | **0.31874295683818593** |
#|      norm_axes     |            1            |          **0**          |          **0**          |

Reusing TensorBoard on port 6009 (pid 28682), started 1:12:40 ago. (Use '!kill 28682' to kill it.)

# EEG-Net

*  ``EEGNet-F1,D`` to denote the number of temporal and spatial filters to learn; i.e.: EEGNet-4,2
denotes learning 4 temporal filters and 2 spatial filters per temporal filter.



<table><tr>
<td> <img src="./eegnet_archi.png" alt="Drawing" style="width: 500px;"/> </td>
<td> <img src="./eegnet_info.png" alt="Drawing" style="width: 1000px;"/> </td>
</tr></table>

* Block 1 : "we perform two convolutional steps in sequence. First, we fit F1 2D convolutional
filters of size (1; 64), with the filter length chosen to be half the sampling rate of the data (here, 128Hz), outputting F1 feature maps containing the EEG signal at different band-pass frequencies. ... We then use a Depthwise Convolution of size (C; 1) to learn a spatial filter "
* Block 2 : "we use a Separable Convolution, which is a Depthwise Convolution (here, of size (1; 16), representing 500ms of EEG activity at 32Hz) followed by F2 (1; 1) Pointwise Convolutions. ... This operation is also particularly useful for EEG signals as different feature maps may represent data at different
time-scales of information."
* Classification Block : "the features are passed directly to a softmax classification with N
units, N being the number of classes in the data. We omit the use of a dense layer for feature
aggregation prior to the softmax classification layer to reduce the number of free parameters
in the model"

# EEGNet-V1 : F1=1 and D=1

* num_filters_1 = 16
* num_filters_2 = 4
* num_filters_3 = 4

## Model Hyper-parameters

* `second_kernel_size` : 
    * dim 1 : trial.suggest_int("second_kernel_size_1", low=6, high=10, step=2)
    * dim 2 : trial.suggest_int("second_kernel_size_2", low=2, high=6, step=2)
* `third_kernel_size` :
    * dim 1 : trial.suggest_int("third_kernel_size_1", low=6, high=10, step=2)
    * dim 2 : trial.suggest_int("third_kernel_size_2", low=2, high=6, step=2)

* `dropout_prob` = trial.suggest_float("dropout_rate", 0, 1)


### Normalization axis & Learning Rate (EEGNet-V1)

|                       | Trial 11 : acc~0.948 | Trial 12: acc~ 0.502 | Trial 13: acc~0.877 |
|:---------------------:|:--------------------:|:--------------------:|:-------------------:|
|       batch_size      |          128         |          128         |         128         |
|      dropout_rate     |  0.33648887482349155 |  0.24596745056760894 |  0.3012513440131098 |
|       input_norm      |          std         |          std         |         std         |
|          **lr**         | 0.005512511929038205 |  0.02386075586781167 | 0.09775722394094524 |
|      **norm_axes**      |           2          |           2          |          0          |
|       optimizer       |         Adam         |         Adam         |         Adam        |
| second_kernel_size_1: |           8          |           8          |          8          |
| second_kernel_size_2: |           4          |           4          |          4          |
|  third_kernel_size_1: |          10          |          10          |          10         |
|  third_kernel_size_2: |           2          |           2          |          2          |

In [58]:
%tensorboard --logdir ./final_results/eegnet_v1/norm_axis/
#|                       | Trial 11 : acc~0.948 | Trial 12: acc~ 0.502 | Trial 13: acc~0.877 |
#|          **lr**         | 0.005512511929038205 |  0.02386075586781167 | 0.09775722394094524 |
#|      **norm_axes**      |           2          |           2          |          0          |

Reusing TensorBoard on port 6010 (pid 28739), started 1:12:39 ago. (Use '!kill 28739' to kill it.)

## TOP 3-Model (EEGNet-V1)
|                       |  Trial 14 : acc~0.9614  |  Trial 67: acc~ 0.9614  | Trial 11: acc~0.9485 |
|:---------------------:|:-----------------------:|:-----------------------:|:--------------------:|
|       batch_size      |         **128**         |          **64**         |          128         |
|      dropout_rate     |   0.028052359571746645  |    0.4891926373269993   |  0.3012513440131098  |
|       input_norm      |           std           |           std           |          std         |
|          *lr*         | **0.05802843218291968** | **0.00422359559662389** |  0.09775722394094524 |
|      *norm_axes*      |            2            |            2            |           0          |
|       optimizer       |         **SGD**         |         **Adam**        |         Adam         |
| second_kernel_size_1: |            6            |            10           |           8          |
| second_kernel_size_2: |            4            |            4            |           4          |
|  third_kernel_size_1: |            8            |            10           |          10          |
|  third_kernel_size_2: |            4            |            6            |           2          |

Linear scaling rule: when the minibatch size is multiplied by k, multiply the learning rate by k
. Although we initially found large batch sizes to perform worse, we were able to close most of the gap by increasing the learning rate. We saw that this is due to the larger batch sizes applying smaller batch updates, due to gradient competition between gradient vectors within a batch.

When the right learning rate is chosen, larger batch sizes can train faster, especially when parallelized

In [59]:
%tensorboard --logdir ./final_results/eegnet_v1/best_3/
#|                       |  Trial 14 : acc~0.9614  |  Trial 67: acc~ 0.9614  | Trial 11: acc~0.9485 |
#| second_kernel_size_1: |            6            |            10           |           8          |
#| second_kernel_size_2: |            4            |            4            |           4          |
#|  third_kernel_size_1: |            8            |            10           |          10          |
#|  third_kernel_size_2: |            4            |            6            |           2          |

Reusing TensorBoard on port 6011 (pid 28783), started 1:12:38 ago. (Use '!kill 28783' to kill it.)

# EEGNet-F1,D 

* `kernel_size_1` = (1, `kernel_length`)
* `kernel_size_2` = (num_channels, 1)
* `kernel_size_3` = (1, 16)


## Model Hyper-parameters

* `F1` = trial.suggest_int("F1", low=6, high=10, step=2)
* `D` = trial.suggest_int("D", low=2, high=4, step=2)
* `F2` = trial.suggest_int("F2", low=12, high=16, step=2)
* `kernel_length` = trial.suggest_int("kernel_length", low=20, high=80, step=10)
* `pool_mode` = trial.suggest_categorical("pool_mode", ["mean", "max"])
* `dropout_prob` = trial.suggest_float("dropout_rate", 0, 1)

## Kernel length & Dropout (EEGNet-F1,D)

|                     	|   Trial 17 : acc~0.966   	|   Trial 52: acc~ 0.978  	|   Trial 32: acc~0.9799   	|
|:-------------------:	|:------------------------:	|:-----------------------:	|:------------------------:	|
|          D          	|             6            	|            6            	|             6            	|
|          F1         	|            18            	|            18           	|            18            	|
|          F2         	|            14            	|            14           	|            14            	|
|    kernel_length    	|          **80**          	|          **50**         	|          **70**          	|
|      pool_mode      	|           mean           	|           mean          	|           mean           	|
|      batch_size     	|            32            	|            32           	|            32            	|
|     dropout_rate    	|  **0.29354177565584844** 	| **0.16133096921841825** 	|  **0.09796662742110902** 	|
|      input_norm     	|            std           	|           std           	|            std           	|
|      norm_axes      	|             0            	|            0            	|             0            	|
|      optimizer      	|            SGD           	|           SGD           	|            SGD           	|
|          lr         	| **0.009563534604912723** 	| **0.06789413869635393** 	| **0.008486730457873154** 	|

In [60]:
%tensorboard --logdir ./final_results/eegnet_v4/kernel_pool/
#|                     	|   Trial 17 : acc~0.966   	|   Trial 52: acc~ 0.978  	|   Trial 32: acc~0.9799   	|
#|    kernel_length    	|          **80**          	|          **50**         	|          **70**          	|

Reusing TensorBoard on port 6012 (pid 28821), started 1:12:38 ago. (Use '!kill 28821' to kill it.)

## Top 3-Model (EEGNet-F1,D)
|                     	|  Trial 32 : acc~0.9799  	| Trial 52: acc~ 0.9787 	| Trial 83: acc~0.9749 	|
|:-------------------:	|:-----------------------:	|:---------------------:	|:--------------------:	|
|          D          	|            6            	|           6           	|           4          	|
|          F1         	|            18           	|           18          	|          20          	|
|          F2         	|            14           	|           14          	|          12          	|
|    kernel_length    	|          **70**         	|           50          	|          60          	|
|      pool_mode      	|           mean          	|          mean         	|         mean         	|
|      batch_size     	|            32           	|           32          	|          32          	|
|     dropout_rate    	|   0.19166787391840304   	|  0.16133096921841825  	|  0.21337922563837466 	|
|      input_norm     	|           std           	|          std          	|          std         	|
|      norm_axes      	|            0            	|           0           	|           0          	|
|      optimizer      	|           SGD           	|          SGD          	|          SGD         	|
|          lr         	| **0.04776693185260883** 	|  0.06789413869635393  	|  0.06447106384282938 	|

In [61]:
%tensorboard --logdir ./final_results/eegnet_v4/best_3/
#|                     	|  Trial 32 : acc~0.9799  	| Trial 52: acc~ 0.9787 	| Trial 83: acc~0.9749 	|
#|          D          	|            6            	|           6           	|           4          	|
#|          F1         	|            18           	|           18          	|          20          	|
#|          F2         	|            14           	|           14          	|          12          	|
#|    kernel_length    	|          **70**         	|           50          	|          60          	|

Reusing TensorBoard on port 6013 (pid 28869), started 1:12:37 ago. (Use '!kill 28869' to kill it.)