# Deep Learning for EEG data analysis
We have already seen from a more theoretical point of view what Deep Learning models are and what they are meant for. <br>
Now it is time to understand how to use these models with Python!! <br>

In the following, we will see a complete pipeline for the task of Emotion Recognition on EEG data using the SEED dataset (Zheng et al., 2015, https://bcmi.sjtu.edu.cn/home/seed). <br>
The workflow is composed of the following steps:

<span style="font-size:24px">1. Data Loading & Preprocessing</span><br>
<span style="font-size:24px">2. Model Definition</span><br>
<span style="font-size:24px">3. Model Training</span><br>
<span style="font-size:24px">4. Model Evaluation</span><br>

We will go over every step, one at the time, discussing the problem and trying to find a solution. 

## 1. Data Loading & Preprocessing


I'll report here a brief description of the SEED dataset collection (from https://bcmi.sjtu.edu.cn/home/seed/seed.html):

*"Fifteen Chinese film clips (positive, neutral and negative emotions) were chosen from the pool of materials as stimuli used in the experiments. [...] The duration of each film clip is approximately 4 minutes. Each film clip is well edited to create coherent emotion eliciting and maximize emotional meanings. [...] There is a total of 15 trials (ed. movies) for each experiment. There was a 5 s hint before each clip, 45 s for self-assessment and 15 s to rest after each clip in one session. The order of presentation is arranged in such a way that two film clips that target the same emotion are not shown consecutively. For feedback, the participants were told to report their emotional reactions to each film clip by completing the questionnaire immediately after watching each clip. [...] [EEG signals] were collected with the 62-channel ESI NeuroScan System."*

<div style="text-align:center"><img src="./images/Data_collection_setup.png" alt="exp_setup"></div>

The resulting dataset can be summarized as follows:

<div style="text-align:center"><img src="./images/dataset_schema.png" alt="exp_setup" width="1000" height="500"></div>

Finally, it is worth mentioning that the provided data were **downsampled** at **200Hz** and a **bandpass frequency filter** from **0 - 75 Hz** was applied. 

<span style="color:red;font-size:24px"><em>A bit of Math</em></span>

As we discussed before, Deep Learning is all about Linear Algebra and tensors. Let's write down a *Legend* to fix symbols for the dimensions of our dataset. <br>
<div style="text-align:center">
    <p style="font-size:24px;">N: <small><em>number of EEG recordings</em></small>, C: <small><em>number of EEG channels</em></small>, W: <small><em>EEG signal length</em></small> B: <small><em>Batch size</em></small></p>
</div>

Now, if we call the dataset containing the EEG signals as $X$ and the associated labels as $y$, we have tha $X$ and $y$ are tensors of size:
    $$X \in [N \times C \times W],$$    $$y \in [N]$$


### 1.1. Load, Extract and Transform
The first step is to **load** data from file, **extract** the meaningful information, and **transform** them into a proper format. 

<span style="color:orange;font-size:18px">This process require some proficiency with file and data handling, plus some experience with Pytorch tensors manipulation, and, hence, it goes  bit out of the scope of this tutorial. <br>
Anyway, for who's interested, the code used to manipulate the input data is available at: https://github.com/federico-carrara/DL_for_Neuro_workshop/blob/main/dataset.py</span>

### 1.2. Data preprocessing
<span style="color:orange;font-size:20px"><em>Deep Learning models are powerful feature extractors that can deal with unstructured data. <br></em></span>

Well, in principle, that's true. However, in practice, the signal-to-noise ratio for most of data source is very low. <br>
So, if we input our model with noisy data, most of the times we end up in a *garbage-in-garbage-out* situation:
<div style="text-align:center"><img src="./images/gigo.png" alt="gigo"></div>

Therefore, **data preprocessing** is an essential step in all data analysis pipelines.