# Deep Learning for EEG data analysis
We have already seen from a more theoretical point of view what Deep Learning models are and what they are meant for. <br>
Now it is time to understand how to use these models with Python!! <br>

In the following, we will see a complete pipeline for the task of <span style="color:orange">Emotion Recognition from EEG data</span> using the SEED dataset (Zheng et al., 2015, https://bcmi.sjtu.edu.cn/home/seed). <br>

The workflow is composed of the following steps:

<span style="font-size:24px">1. Data Loading & Preprocessing</span><br>
<span style="font-size:24px">2. Model Definition</span><br>
<span style="font-size:24px">3. Model Training</span><br>
<span style="font-size:24px">4. Model Evaluation</span><br>

We will go over every step, one at the time, discussing the problem and trying to find a solution. 

## 1. Data Loading & Preprocessing


I'll report here a brief description of the SEED dataset collection (from https://bcmi.sjtu.edu.cn/home/seed/seed.html):

*"Fifteen Chinese film clips (positive, neutral and negative emotions) were chosen from the pool of materials as stimuli used in the experiments. [...] The duration of each film clip is approximately 4 minutes. Each film clip is well edited to create coherent emotion eliciting and maximize emotional meanings. [...] There is a total of 15 trials (ed. movies) for each experiment. There was a 5 s hint before each clip, 45 s for self-assessment and 15 s to rest after each clip in one session. The order of presentation is arranged in such a way that two film clips that target the same emotion are not shown consecutively. For feedback, the participants were told to report their emotional reactions to each film clip by completing the questionnaire immediately after watching each clip. [...] [EEG signals] were collected with the 62-channel ESI NeuroScan System."*

<div style="text-align:center"><img src="./images/Data_collection_setup.png" alt="exp_setup"></div>

The resulting dataset can be summarized as follows:

<div style="text-align:center"><img src="./images/dataset_schema.png" alt="exp_setup" width="1000" height="500"></div>

Finally, it is worth mentioning that the provided data were **downsampled** at **200Hz** and a **bandpass frequency filter** from **0 - 75 Hz** was applied. 

<span style="color:red;font-size:24px"><em>A bit of Math</em></span>

As we discussed before, Deep Learning is all about Linear Algebra and tensors. Let's write down a *Legend* to fix symbols for the dimensions of our dataset. <br>
<div style="text-align:left">
    <p style="font-size:24px;">- N: <small><em>number of EEG recordings</em></small><br> - C: <small><em>number of EEG channels</em></small><br> - L: <small><em>EEG signal length</em></small><br></p>
</div>

Now, if we call the dataset containing the EEG signals as $X$ and the associated labels as $y$, we have tha $X$ and $y$ are tensors of size:
    $$X \in [N \times C \times L],$$    $$y \in [N]$$


### 1.1. Load, Extract and Transform
The first step is to **load** data from file, **extract** the meaningful information, and **transform** them into a proper format. 

<span style="color:orange;font-size:18px">This process require some proficiency with file and data handling, plus some experience with Pytorch tensors manipulation, and, hence, it goes  bit out of the scope of this tutorial. <br>
Anyway, for who's interested, the code used to manipulate the input data is available at: https://github.com/federico-carrara/DL_for_Neuro_workshop/blob/main/dataset.py</span>

### 1.2. Data preprocessing
<span style="color:orange;font-size:20px"><em>Deep Learning models are powerful feature extractors that can deal with unstructured data. <br></em></span>

Well, in principle, that's true. However, in practice, the signal-to-noise ratio for most of data source is very low. <br>
So, if we input our model with noisy data, most of the times we end up in a *garbage-in-garbage-out* situation:
<div style="text-align:center"><img src="./images/gigo.png" alt="gigo"></div>

Therefore, **data preprocessing** is an essential step in all data analysis pipelines. <br>
Specifically, in this case, we do the following:

<span style="font-size:24px">I) Division of the EEG signals in overlapping windows</span><br>
Each of the input signals covers 4 minutes of EEG recording downsampled at 200Hz. Hence each signal is made of ~48k timepoints!!!<br>
Splitting each signal into windows spanning 1, 5, or 10 seconds allows us to have more handy and still informative data. 
<div style="text-align:left"><img src="./images/window_overlap.png" alt="gigo" height=300, width=400></div>

Why overlapping windows?? It is an example of *Data Augmentation*!

<span style="color:red">A bit of Math:</span> after this step $X \in [\hat{N} \times C \times W]$ and $y \in [\hat{N}]$, where $W$ = window length, and $\hat{N} \approx \frac{L}{W}$

<span style="font-size:24px">II) Extraction of EEG sub-bands</span><br>
We apply a *Butterworth filter* to extract 4 sub-signals with frequencies in the *Theta*, *Alpha*, *Beta* and *Gamma* ranges.

<span style="color:red">A bit of Math:</span> after this step $X \in [\hat{N} \times F \times C \times W]$ and $y \in [\hat{N}]$, where $F$ = 4, is the number of sub-bands.

<span style="font-size:24px">III) Computation of <em>channel-wise</em> Differential Entropy </span><br>
Intuitively, the **Differential Entropy** is a measure of the *dispersion* of the distribution of a continuous variable. <br>
Assuming that the EEG signal is the set of *realizations* of a *Gaussian* random variable, we get that the **Differential Entropy** for a given signal $S$ (with standard deviation $\sigma$) can be computed as:
$$DE(S) = \frac{1}{2} \log_2{2\pi e \sigma}$$
Therefore, given a signal as input, we compute a single number as output. As a result from 62 windows, each one corresponding to a different channel, we get a single vector of 62 values.<br>
We compute the $DE$ separately on every channel of each window. 

<span style="color:red">A bit of Math:</span> after this step $X \in [\hat{N} \times F \times C]$ and $y \in [\hat{N}]$.

<span style="font-size:24px">IV) Standardization of data <em>by-trial</em></span><br>
A challenging problem when dealing with EEG data is that the signals are extremely *subject* and *trial-dependent*. <br>
Standardizing the data *by-trial* enables us to lower the *within-trial* variability of the data, so that the model is able to focus more on the *between-trials* variability

<span style="color:red">A bit of Math:</span> standardization does not alter the size of the data tensors.

<span style="font-size:24px"><span style="color:magenta">V) Option 1:</span> Concatenation of sub-bands vector</span><br>
To make the transformed data suitable for training a *Neural Network* we **concatenate** sub-band vectors in one single vector.

<span style="color:red">A bit of Math:</span> after this step $X \in [\hat{N} \times (F*C)]$

<span style="font-size:24px"><span style="color:cyan">V) Option 2:</span> Mapping to Electrode Grid <em>(optional)</em></span><br>
Instead of concatenating vectors, we can think to map the value computed from each electrode to a 2D grid, whose cells represent the position of a given electrode.


The information about the position of the electrodes from which data were collected can be helpful to allow the model to focus at once at the signals from neighboring electrode. 