# <center>COMP 432 Final Project</center>  
#### <center>Andrew Foote 40199068</center>
## Abstract
The goal of this project is to try and create new models to classify EEG data. To train and test the implemented models, I utilized the widely used BNCI2014001 dataset for Motor Imagery. Using the speechbrain MOABB repository as my benchmark, I attempted to make improvements on the EEGNet model. My chosen modifications included adding an LSTM layer after the initial CNN in an attempt to capture temporal dependencies in the data. I also added a dropout layer to the model to prevent overfitting. The results of the experiments showed that the modified EEGNet model performed worse than the original EEGNet, but I introduce some ideas on how to improve the model in the future.

## Introduction
The human brain is a marvel of complexity, and electroencephalography (EEG) offers a glimpse into its electrical activity. However, deciphering the vast amount of raw EEG data remains a significant challenge. Traditional methods often require extensive human expertise and lack the ability to capture the subtle nuances of brain activity. This presents a hurdle for researchers and clinicians seeking a deeper understanding of brain function.  Overcoming this barrier is crucial, as it holds the potential to revolutionize fields like neuroscience, medicine, and human-computer interaction.  Developing accurate and efficient methods for EEG data translation presents several key challenges. The inherent noisiness of the data, coupled with its high dimensionality and variability across individuals, demands robust machine learning models capable of extracting meaningful patterns. In this project I attempt to build a novel model based off the famous EEGNet model.

## Approach
My approach will revolve around the BNCI2014001 dataset. My approach started with familiarizing myself with the Speechbrain repository and the key functions it provides for data preparation, training, and testing models. This step took a significant amount of time as the tools provided are very detailed and were challenging to get the grasp of as beginner in machine learning. Since our goal was to implement a novel model, I dove in the literature to start gathering ideas. Very quickly during my research I found that EEGNet really was the standard and most models were slight variations to EEGNet. I landed on paper that created a ConvLSTM model for EEG data [ConvLSTM]. This model intrigued because it used an initial Conv layer followed by 3 depthwise-seperable convolutional layers, a bi-directional LSTM layer, an attention layer and then a fully connected layer [CLSTM]. The paper indicated that this model outperformed EEGNet so I decided to test it. Little did I know my computational limitations. I don’t have my own GPU and had limited compute credits within Colab pro and this model was significantly more computationally intensive than EEGNet [convlstm]. Instead, I decided to take the approach of using the ideas within the paper along with the base EEGNet model that was already in the MOABB. The first approach I took was to add another depth wise separable layer to the model. Adding another depthwise-separable layer can potentially increase the model's capacity to learn complex features while maintaining efficiency [Insert Source]. The second approach I took was adding an LSTM layer to EEGNet. While the initial convolutional layers of EEGNet layers extract local patterns and features within the data, the LSTM layer would hopefully capture long-range dependencies and relationships across the entire sequence of data [Insert source]. The results for the LSTM layer ended up being the best, but not better than the EEGNet standard at around 65% accuracy.

## Methodology
The initial steps to run the experiment involve cloning the MOABB repository and installing the necessary dependencies. These steps were provided by the MOABB documentation[Source].

In [None]:
!git clone https://github.com/speechbrain/benchmarks.git
%cd benchmarks
!git submodule update --init --recursive
%cd speechbrain
!pip install -r requirements.txt
!pip install -e .
%cd /content/benchmarks/benchmarks/MOABB
!pip install -r ../../requirements.txt    # Install base dependencies
!pip install -r extra-requirements.txt    # Install additional dependencies
%cd /content/benchmarks/benchmarks/MOABB
%env PYTHON_PATH=/content/benchmarks/

The next step is to import my own custom files for the experiment from google drive and put them in the correct directorys so the scripts can run them:

In [None]:
# Importing files from google drive

The train.py file needed modification for it to run any sort of RNN layer, that's why it's being replaced with my own custom one. The line 38 in the original train.py file need to be modified to:  
  
`if mod.bias is not None and (not isinstance(mod.bias, bool)):`  

Otherwise I'd get an error when trying to run the model.


The next step is to run the experiment. The following code block runs my final model with the run_experiments.sh script provided by speechbrain MOABB. The experiment is run once (so the marker can reproduce). The results are then saved to (add directory where results can be found once run is complete).

In [None]:
!./run_experiments.sh --hparams hparams/MotorImagery/BNCI2014001/EEGNetLSTM.yaml \
                    --data_folder eeg_data \
                    --output_folder /content/drive/MyDrive/EEG_Results_Hyp \
                    --nsbj 9 \
                    --nsess 2 \
                    --nruns 1 \
                    --train_mode leave-one-session-out \
                    --device=cuda

Now that the model is training I will discuss how the process is done.  

### Data Pre-processing  
The data pre-processing wasn't an area of focus for me as the MOABB repository had preprocessed data hooked up already. My approach didn't focus on researching pre-processing techniques, instead used the already provided datasets that were downloaded and used with the training script. Additionally, my goal was to compare the models to the original EEGNet[] model so I wanted to keep the data as similar as possible to the original data.

### Data Augmentation
The data augmentation could be done from the YAML file that is passed to the training script. Again this wasn't the area of focus to me. I decided to keep the original I did attempt to add new data augmentation hyperparameters through the hyperparameter tuning however due to my limited computational resources I was unable to run the full hyperparameter tuning with 50 total experiments and all 9 subjects leading to inferior results for data augmentation. I decided to keep the original data augmentation hyperparameters found in EEGNet's YAML file[].  

### Model Architecture
This section was where I focused the most time and research. I originally thought of creating my own model from scratch but that didn't seem to be the dominant approach in the literature. I found that most models were slight variations of EEGNet. Discovering this I then started to research what benefits CNN and LSTM's have in regard to EEG and time series data, with the idea that I was gonna change the EEGNet architecture to include another multiple depthwise sperable layers instead of the single depthwise and separable layer I tested the idea of using 3 depthwise separable layers in the EEGNet model template provided in MOABB[Src].
```python
    self.conv_module.add_module(
            "conv_1",
            sb.nnet.CNN.Conv2d(
                in_channels=8,
                out_channels=16,
                kernel_size=(1, C),
                groups=cnn_temporal_kernels,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "conv_1_point",
            sb.nnet.CNN.Conv2d(
                in_channels=16,
                out_channels=16,
                kernel_size=(1, 1),
                groups=cnn_temporal_kernels,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "bnorm_1",
            sb.nnet.normalization.BatchNorm2d(
                input_size=16, momentum=0.01, affine=True,
            ),
        )
        self.conv_module.add_module("act_1", activation)
        self.conv_module.add_module(
            "pool_1",
            sb.nnet.pooling.Pooling2d(
                pool_type="max",
                kernel_size=(3, 1),
                stride=(4, 1),
                pool_axis=[1, 2],
            ),
        )
        self.conv_module.add_module(
            "conv_2",
            sb.nnet.CNN.Conv2d(
                in_channels=16,
                out_channels=16,
                kernel_size=(16, 1),
                groups=16,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "conv_2_point",
            sb.nnet.CNN.Conv2d(
                in_channels=16,
                out_channels=16,
                kernel_size=(1, 1),
                groups=cnn_temporal_kernels,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "bnorm_2",
            sb.nnet.normalization.BatchNorm2d(
                input_size=16, momentum=0.01, affine=True,
            ),
        )
        self.conv_module.add_module("act_2", activation)
        self.conv_module.add_module(
            "pool_2",
            sb.nnet.pooling.Pooling2d(
                pool_type="max",
                kernel_size=(4, 1),
                stride=(4, 1),
                pool_axis=[1, 2],
            ),
        )
        self.conv_module.add_module(
            "conv_3",
            sb.nnet.CNN.Conv2d(
                in_channels=16,
                out_channels=16,
                kernel_size=(1, 1),
                groups=16,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "conv_3_point",
            sb.nnet.CNN.Conv2d(
                in_channels=16,
                out_channels=16,
                kernel_size=(1, 1),
                groups=cnn_temporal_kernels,
                padding="valid",
                bias="False",
                swap=True,
            )
        )
        self.conv_module.add_module(
            "bnorm_3",
            sb.nnet.normalization.BatchNorm2d(
                input_size=16, momentum=0.01, affine=True,
            ),
        )
        self.conv_module.add_module("act_3", activation)
        self.conv_module.add_module(
            "pool_3",
            sb.nnet.pooling.Pooling2d(
                pool_type="max",
                kernel_size=(4, 1),
                stride=(4, 1),
                pool_axis=[1, 2],
            ),
        )
  ```
 These layers were implemented inbetween the initial convolutional layer and the dense layer from the original EEGNet architecture. This model ended up with an accuracy of 55.5% which was much worse than EEGNet. The problem was severe overfitting with the additional layers. I tried with and without dropout layers in the model, but they were not enough to prevent the overfitting.  
 
I then decided to try a different approach and add an LSTM layer to the model. The LSTM layer was added after the initial convolutional layer and before the dense layer. The LSTM layer was implemented as follows:
```python
     #(1,6,1,16)
            self.lstm_module = torch.nn.Sequential()
            self.lstm_module.add_module(
                "lstm",
                sb.nnet.RNN.LSTM(input_shape=out.shape,
                                 hidden_size=lstm_hidden_size,
                                 dropout=lstm_dropout,
                                 num_layers=lstm_num_layers,
                                 bidirectional=lstm_bidirectional,
                                 )
            )
            dum = torch.ones_like(out)
            
            out_lstm, _ = self.lstm_module(
                dum
            )
```
It took a lot of debugging at this stage, first due to the fact that the lstm returns the output layer and hidden layer, so I had to deal with discarding the not needed hidden layer here and in the forward method. This model performed much better than the deeper CNN model with an accuracy of 64.4%. With this I decided to keep the LSTM layer in the model and try to improve it further with hyperparameter tuning.
### Training
Training was handled by the MOABB repository, I didn't make any modifications other than the initial addition to line 39 to fix the bool bug that was happening with RNN based layers. I ran the recommended script to train my model with 10 runs.
```python
!./run_experiments.sh --hparams hparams/MotorImagery/BNCI2014001/EEGNetLSTM.yaml \
                    --data_folder eeg_data \
                    --output_folder /content/drive/MyDrive/EEG_Results \
                    --nsbj 9 \
                    --nsess 2 \
                    --nruns 10 \
                    --train_mode leave-one-session-out \
                    --device=cuda
```
### Hyperparameter Tuning
This was the final step of my experiment. The repository provided a script that did an exhaustive search of the hyperparameters. It was extremely computationally extensive, so I had to make some modifications to the proposed settings used in the script. I decided to run the hyperparameter tuning with only 15 experiments and 3 subjects. This still took several hours and needed to be resumed multiple times as I ran out of computational resources. Here's a snippet of the hyperparameter tuning script I ran ran:
```python
%cd /kaggle/working/benchmarks/benchmarks/MOABB
!./run_hparam_optimization.sh --exp_name 'EEGNetLSTM_BNCI2014001_hopt' \
                              --output_folder results/MotorImagery/BNCI2014001/EEGNetLSTM/hopt \
                              --data_folder eeg_data/ \
                              --hparams hparams/MotorImagery/BNCI2014001/EEGNetLSTM.yaml \
                              --nsbj 9 --nsess 2 \
                              --nsbj_hpsearch 3 --nsess_hpsearch 2 \
                              --nruns 1 \
                              --nruns_eval 1 \
                              --eval_metric acc \
                              --train_mode leave-one-session-out \
                              --exp_max_trials 15
```
I would I also removed the orion tags from parameters that had already been found for EEGNet and only focused on parameters that were specific to the LSTM layer. This included the # of epochs plus:
``` python
lstm_hidden_size: 16
lstm_dropout: 0.15 # @orion_step1: --lstm_dropout~"uniform(0.0, 0.15)"
lstm_num_layers: 4 # @orion_step1: --lstm_num_layers~"uniform(2, 4,discrete=True)"
lstm_bidirectional: False # @orion_step1: --lstm_bidirectional~"choice([True, False])"
```
I also left the data augmentation orion tags to see if finding better parameters for them would help the model. The results of the hyperparameter tuning were not as good as the initial LSTM model. The best model had an accuracy of 60.2% which was worse than the initial LSTM model. This was likely due to the fact that the hyperparameter tuning was not exhaustive enough to find the best parameters for the LSTM layer. I also noticed that subject 2 had low results compared to the rest of the subjects and since they were included in the hyperparameter tuning, this could have affected the results.

## Results
The results of my experiments are as follows:  

|  Deeper EEGNet  |  EEGNetLSTM  | EEGNetLSTM (with tuned hyperparameters) |
| :-------------: | :----------: |:---------------------------------------:|
| 0.556134        | 0.644001     |                 0.6012                  |

The results show that the LSTM model performed better than the deeper EEGNet model, but worse than the original EEGNet model. The hyperparameter tuning did not improve the model, but this was likely due to the fact that the hyperparameter tuning was not exhaustive enough. The results show that the LSTM layer was able to capture some temporal dependencies in the data, but it was not enough to outperform the original EEGNet model. The model proposed in the paper [] that I based my model off of had an accuracy of 73.5% which was significantly better than my model. This shows that there is still room for improvement in the model and that more research is needed to find the best architecture for EEG data. I know the main missing component is the attention layer and then a fully connected layer that the paper used. I believe that adding these layers would improve the model and I would like to try this in the future. I just simply didn't have the computational resources to do so. I'd also would have like to do a more exhaustive hyperparameter tuning to find the best parameters for the LSTM layer. I believe that this would have improved the model as well, but when you have 146k parameters it's very hard to find the best ones without a lot of computational resources.

## Conclusion
In conclusion, the results of my experiments show that the LSTM layer might not be the right approach to decoding EEG data as it was not enough to outperform the original EEGNet model. The hyperparameter tuning did not improve the model, but this was likely due to the fact that the hyperparameter tuning was not exhaustive enough. The model proposed in the paper [] that I based my model off of had an accuracy of 73.5% which was significantly better than my model. This shows that there is still room for improvement in the model and that more research is needed to find the best architecture for EEG data. I know the main missing component is the attention layer and then a fully connected layer that the paper used as well as combing the 3 Depthwise-Seperable layers with the LSTM. I believe that adding these layers would improve the model and I would like to try this in the future. I just simply didn't have the computational resources to do so. I'd also would have like to do a more exhaustive hyperparameter tuning to find the best parameters for the LSTM layer. This was my first my real machine learning project and I learned a lot about the process and the tools used. I started it about 6 weeks ago and put way more time into this project than any other I've done at university. I would have liked to implemented something that came closer to the EEGNet benchmark, but I'm happy with knowledge I have gained from this project and how intensive real machine learning projects are and how there's seemingly endless room for improvement and experimentation. I learned lots of new tools including PyTorch, Kaggle, PaperSpace gradient, colab pro and orion. The biggest thing I learned from this experience is the patience needed to learn to and apply machine learning. I'm excited to continue learning and improving my skills in the future.