![Chapeau](images/chapeau_challenge_collectif.png)

<center><h1>Epilepsy seizure challenge</h1></center>

<center><h3>A data challenge on almost real-time classification of an epilepsy seizure.</h3></center>
<br/>
<center><i>Thomas Bersani--Veroni (ENS, IPP), Kellian Cottart (ECE, IPP), Luc Gensbittel (EMSST, IPP), Titouan Lermite (ECE,IPP), Xavier Loison (EMSST, IPP), Mathys Noir (ECE, IPP) </i></center>

# Table of content <a class="anchor" id="chapter_00"></a>

* [1. Introduction to the Epilepsy problem](#chapter_1)
* [2. Challenge presentation](#chapter_2)
* [3. Starting with the challenge](#chapter_3)
* [4. Making a submission](#chapter_4)
* [References and acknowledgments](#chapter_5)

$\rightarrow$ Going to the [GitHub repo](https://github.com/kellianoy/epilepsy_seizure.git) of the challenge

# 1. Introduction to the Epilepsy problem <a class="anchor" id="chapter_1"></a>

[TABLE OF CONTENT](#chapter_00)

This challenge consits in categorizing EEG signals recorded on children suffering from epilepsy. It aims at **easing detection** of epileptic seizure recurrence and, **through early detection, fostering prevention** from effects and complication induced by an epileptic seizure. 

Credentials
---

This challenge is inspired from an article[[1]](#ref_1) published in 2018 and proposing Integer Convolutional Neural Network to detect epileptic seizure. The article cited uses three different data sets collected from the Freiburg Hospital intracranial EEG data set[[2]](#ref_2), the Children’s Hospital of Boston-MIT scalp EEG data set[[3]](#ref_3) and UPenn & Mayo Clinic’s seizure detection data set[[4]](#ref_4) but this challenge only uses the **data set from the Children’s Hospital of Boston-MIT**(more details in *Processing the data* section).

![Epilepsy](images/Galaxy_brain_female_stage_2.png)

Epilepsy
---

Epilepsy is a chronic disease of the brain. The World Health Organization[[5]](#ref_5) estimates that it affects around 50 million people worldwide. It is characterized by recurrent seizures, which are brief episodes of involuntary movement that may involve a part of the body (partial) or the entire body (generalized). The estimated proportion of the general population with active epilepsy (i.e. continuing seizures or with the need for treatment) is between 4 and 10 per 1000 people.

* Seizure episodes are a result of **excessive electrical discharges** in a group of brain cells. 
* **Different parts of the brain** can be the site of such discharges.
* Seizures can **vary in duration** from the briefest lapses of muscle jerks to severe and prolonged convulsions.
* Seizures can also **vary in frequency**, from less than one per year to several per day.

>$\rightarrow$ *The electrical discharges can be captured with electrodes.*<br>
>$\rightarrow$ *Discharges will not always be recorded by the same electrodes.*<br>
>$\rightarrow$ *Duration & frequency are not key criteria to categorize an epileptic seizure.*

Moreover, characteristics of seizures depend on where in the brain the disturbance first starts, and how far it spreads.

Causes & Diagnosis
---

Although many underlying disease mechanisms can lead to epilepsy, the cause of the disease is still unknown in about 50% of cases globally.

**Epilepsy is not always properly diagnosed**, especially in low- and middle-income countries, where almost 80% of people with epilepsy live. An abnormal **electroencephalography (EEG) pattern is one of the two most consistent predictors of seizure recurrence** (as one seizure does not signify epilepsy).

>$\rightarrow$ *EEG is one solution to diagnose epilepsy and predict seizure recurrence.*

Life exptancy effects & Treatment
---

People with epilepsy tend to have more physical problems (such as fractures), as well as higher risks of psychological disorders, including anxiety and depression. The risk of premature death in people with epilepsy is up to three times higher than in the general population. **Persistent epileptic seizure are considered as such as an emergency** and can cause particularly serious sequelae. In addition, **a great proportion of the causes of death related to epilepsy are potentially preventable**, such as falls, drowning, burns and prolonged seizures. 

>$\rightarrow$ *Medical intervention at the onset of an epileptic seizure reduces the risk of complications.*

Finally, seizures can be controlled: most of people living with epilepsy could become seizure free with appropriate use of antiseizure medicines. Discontinuing antiseizure medicine can even be considered after 2 years without seizures. Surgery might also be beneficial to patients who respond poorly to drug treatments.

# 2. Challenge presentation <a class="anchor" id="chapter_2"></a>

[TABLE OF CONTENT](#chapter_00)

Classification challenge rationale  
---

When diagnosed, about two-thirds of people suffering from epilepsy can be treated with medication, and 7-8% can be cured with surgery.
>$\rightarrow$ *Categorizing epileptic seizures through EEG could be of interest for seizures that occur at a very low frequency or with a low intensity (such that it does not alarm the patient).* 

The patients who cannot be treated with medication or surgery suffer from refractory epilepsy, meaning that they are not able to control their seizures. 
>$\rightarrow$ *For more intense seizures, being able to categorize a seizure at its early stage could help to warn the patient so that it can adopt a safe position and fasten emergency or relatives contact, to check whether the patient is safe and decide to intervene.*

State of the art
---

According to the aformentioned article, most of the methods relying on EEG are based on ideas such as thresholding or basic machine learning models. However, these methods provide unsatisfactory results. Better results can be obtained by using time-frequency analysis, but these methods are costly in terms of computation and memory. One could use FFT (Fast Fourier Transform) to extract features from EEG, but as it is often used with SVM (Support Vector Machine), it is highly computationally expensive. Deep learning with CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) are promising alternatives.

Classification challenge
---

When capturing an epileptic seizure, the electroencephalogram can be divided into four phases:
- **inter-ictal**, when no clue of an epileptic seizure can be detected,
- **pre-ictal**, when premises of a seizure can be detected,
- **ictal**, during the epileptic seizure,
- **post-ictal**, after the epileptic seizure.

Pre-ictal and post-ictal phases have been removed from the EEG signals:
- there is no interest to categorize EEG signal once the epileptic seizure has occurred, 
- the sampling strategy depicted later in the notebook makes categorizing the pre-ictal phase tendencious as the symptoms appear punctually. 

<b>Challenge:</b> This challenge consists in classifying the EEG phase between interictal phase and ictal phase.

**Note about the diffusion of the data:** Any communication from this challenge must comply with the requirements of owners of the Database (see ref. 3).

Presenting the data
---

The data set has been retrieved from the *Children’s Hospital of Boston-MIT scalp EEG* data set. This resource has been made available with the following publication:
> * Ali Shoeb. Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment. PhD Thesis, Massachusetts Institute of Technology, September 2009.[[6]](#ref_6)
> * Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

In this data set, the EEG of 22 patients (5 male and 17 female, from 1.5 to 19 years old) have been recorded :
* for about 24 hours periods (stack of around one hour period),
* in `edf` files, which correspond to a european standard for medical recording,
* through around 23 channels, corresponding to the number of electrodes that have been placed on the children's skull.

This data set is made of 664 files, with 198 seizures.
The `RECORD` file list all the files while the `RECORDS-WITH-SEIZURES` file list those with an epilepsy seizure.

![Epilepsy](images/chbmit.png)

Processing the data
---

Amongst the 22 patients, we have selected 16 patients for the same practical reasons than those exposed in the aformentioned article.

Within all the channels available, we selected 17 channels as some channels are not suited for analysis while other are changing from one record to another one. These channels are the ones that are meaningful for the challenge, as highlighted in the article.

In the `SEIZURE_SUMMARY.CSV` file, beginning and ending time of all seizure periods are listed.

Following the same process than the one depicted in the article, we removed from the record the four hours preceding and following a seizure period (pre-ictal and post-ictal phases).

During interictal phases (not an epileptic seizure period), the records have been divided into one second slot samples, with around 400 measures (on as many features as electrodes). During ictal phases, records have been created with the same length but samples overlap to counterbalance unbalanced classes by oversampling.


# 3. Starting with the challenge <a class="anchor" id="chapter_3"></a>

[TABLE OF CONTENT](#chapter_00)

Setup
---

### Preriquisites

The following cell will install the required package dependencies, if necessary. You can examine the file, `requirements.txt`, included in the repo to view the list of dependencies.

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

> **NOTE:** Due to the structure of the challenge, libraries not included in `requirements.txt` will need to be added via a pull request to the [GitHub repo](https://github.com/kellianoy/epilepsy_seizure.git).

Install the `ramp-workflow` package from PyPI using the following command in you dedicated python environment:

```pip install ramp-workflow```

Starting with the data
---

### Presentation of download.py

The public data are stored in a public repository: you need to run the following script which create the `data/` repository locally and downloads data that have been sampled according to aformentioned strategy.

```python download_data.py```

In [None]:
!python download_data.py

> This script makes THIS AND THAT.

### Loading the data

First, we load the data using the utility function designed for the challenge in `problem.py`. 

In [None]:
data_trains, labels_trains = get_train_data()

`labels_trains` simply consists in the set of labels for the training set, stored as a `numpy.darray`

In [None]:
# print labels_trains type & shape

`data_trains` consists in the set of sampled EEG recording for the training set, stored as a list of `numpy.darray`

In [None]:
# print data_trains type & shape

### Data visualization

One sample of an EEG recording is a numpy array representing the recording for 23 electrodes during one second, with a sampling frequency of 400 Hz (i.e. 400 recording point per second).

In [None]:
# Display of an array

# Plot the EEG of an ictal phase and of an interictal phase

### Data statistics

In [None]:
# Barplot of class distribution per patient?
# Energy distribution ?

# 4. Making a submission <a class="anchor" id="chapter_4"></a>

[TABLE OF CONTENT](#chapter_00)

Evaluation
---

Ranking is made according to the AUC of your classifiers.

Locally, the RAMP platform use a 3-fold cross-validation scheme implemented in the `get_cv` method.

The classifier's performance is evaluated on a separate test set which can be loaded just as the training data.

In [None]:
data_test, labels_test = get_test_data()

Mandatory structure of a submission
---

A submission (usually stored in `./subsmissions/<submission_foldername>/`) must contain on file named `eegclassifier.py`.

This python script must itself implement :
 * A `blablabla` method that...

The two arguments must be understood as follow:
 * `blablabla` is ....


We illustrate this below with a simple example.

Illustration with a dummy random classifier
---

This classifier does not use data and just predict random labels. Still, it is a valid submission regarding the RAMP workflow.

Submitting to RAMP
---

To do a submission, you need to submit it on [ramp.studio](https://ramp.studio/). 

Go to your sandbox and copy-paste your code.

You can try your code with this command :
    
```bash
ramp_test_submission --submission starting_kit # --quick-test
```

# References and acknowledgments <a class="anchor" id="chapter_5"></a>

[TABLE OF CONTENT](#chapter_00)

References
---

[1]: <a class="anchor" id="ref_1"></a> Truong et al., Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram, Neural Networks, 2018, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8370634&tag=1 </br>
[2]: <a class="anchor" id="ref_2"></a> https://epilepsy.uni-freiburg.de/freiburg-seizure-prediction-project/eeg-database </br>
[3]: <a class="anchor" id="ref_3"></a>  https://physionet.org/content/chbmit/1.0.0/chb23/ </br>
[4]: <a class="anchor" id="ref_4"></a>  https://www.kaggle.com/c/seizure-detection </br>
[5]: <a class="anchor" id="ref_5"></a>  https://www.who.int/news-room/fact-sheets/detail/epilepsy </br>
[6]: <a class="anchor" id="ref_6"></a> https://dspace.mit.edu/handle/1721.1/54669 </br>

Acknowledgments
---

The database used in this challenge was made available through PhysioNet by team of investigators from Children’s Hospital Boston (CHB) and the Massachusetts Institute of Technology (MIT). See references 3 and 6.