<h1 class="text-center">EEG data analysis with MNE (+ Python and Pandas introduction)</h1>
<h2 class="text-center">February, 2022</h2>

<br>

The purpose of this tutorial is to go over the data analysis steps you previously performed using the EEGLab program, but this time using the MNE python library. Also, this tutorial will serve as an introduction to basic Python concepts and the library Pandas, that will be used in future tutorials.
</b></div>

- In Section I, we will go over basic python structures such as lists, dictionaries, functions and classes
- In Section II, we will introduce MNE and explain the main objects used to analyze EEG data (Raw, Epochs, etc.)
- In Section III, we will describe the process to analyze the 'Rest vs Count down experiment' data
- In Section IV, the 'Oddball task' data will be analyzed

The code must be completed after each **Question** to work, where "HERE" appears as a comment in the code. The parameters that do not change the course of the story are accompanied "EDIT ME!" as a comment: you can change them at the time or at the end of the section to see the changes involved.

# I - Python basics to get started
First of all, we will go over the basics needed for python and jupyter notebooks so you can follow and complete the course going ahead. In order to execute the code you will see in the notebooks, click on a cell (like the one below) and press `Ctrl + Enter` to execute it. For those not on Windows, keep in mind the following:

- `Ctrl`: Command key `⌘`
- `Shift`: Shift `⇧`
- `Alt`: Option key `⌥`

In [6]:
print("Cell executed correctly")

Cell executed correctly


You can also use `Shift + Enter` to execute the cell and move to the next one (if there is no cell below, it will insert a new one), and `Alt + Enter` to execute the current cell and insert a new one just below (even if there are cells after the one you run).

In [7]:
print("Run me with Shift + Enter to go to the cell below")

Run me with Shift + Enter to go to the cell below


In [8]:
print("Run me with Alt + Enter to insert a new cell below, and print something there!")

Run me with Alt + Enter to insert a new cell below, and print something there!


A code cell will execute all the code inside it, and all the variables will be stored for the rest of the notebook. Be careful! Variables will stay in memory even if you delete the cell.

Let us see an example of code using multiple cells:

In [14]:
# Let's find out the number of seconds in a day
n_hours = 24
n_min = 60  # Minutes per hour

min_day = n_hours * n_min

In [15]:
n_seconds = 60  # Seconds in a minute
sec_day = min_day * n_seconds

print(f'The number of seconds in a day is {sec_day}')

The number of seconds in a day is 86400


We can assign values to variables using the `=` operator. A variable is just a name we give to a particular value, you can imagine it as a box you put a certain value into, and on which you write a name with a black marker. The following code block contains two operations. First, we assign the value 2 to the name `x`. After that `x` will hold the value 2. You might say Python stored the value 2 in `x`. Finally we print the value using the `print()` command. 

In [18]:
x = 2 
print(x)

2


Now we stores the value '2' in `x`, and hence we can use it for operations like these:

In [20]:
print(x * x)
print(x == x)
print(x > 6)

4
True
False


Let's explain these a bit:
- In the first case, we are taking our value and multiplying it by itself
- In the second case, we are using the double equal `==` operator. Contrary to `=`, that assigns the value on the right to the variable on the left, the `==` symbol checks whether the two values are equal or not, and always returns `True` or `False`. Theew is also the operator `!=`, which returns True when the compared elements are not equal
- Finally, the third case used the `>` operator, which returns true when the value on the left is higher than the value on the right. Other related operators are `<`, `>=`, and `<=`

Of course, variables are not limited to integers, we can do strings (text) like the following:

In [21]:
book = "Radiant Words"
print(book)

Radiant Words


Although we can give our variables any name we want, it is better to use descriptive names that are easy to understand by other people (Note: your future self looking at the code six months from now also counts as other people). Although extreme, not leveraging the power of clean variable names could have some devastating effects on the readability of your code.

In [24]:
a = 2
aa = 5
aaa = 10
aaaa = 1

print(aa > aaaa)
print(a <= aaa)

True
True


You can also update variables:

In [25]:
n_books = 1000
n_books = n_books + 1
print(n_books)

1001


This is equivalent to:

In [26]:
n_books = 1000
n_books += 1
print(n_books)

1001


## Lists and Dictionaries

We will now go over two of the most important data structures in Python: lists and dictionaries. These are used to store different variables using the structure that better serves our purpose:

- Lists keep the variables in the order we add them to the list, and are indexed by position
- Dictionaries do not keep an order, as they are indexed by a "key", similar to how real dictionaries are ordered by words that contain definitions

We will start with lists. A list is created with square brackets `[ ]` and can contain any type of variable:

In [29]:
my_list = [1, 2, "three", 5.7, True]
print(my_list)

[1, 2, 'three', 5.7, True]


To recover elements of the list, we use the index or position. For example, if we want to print the first element of the list:

In [31]:
print(my_list[0])

1


In Python, the first index of any element is always 0. You need to keep in mind the length of the list, as you will get an error if you try to index beyond the length of the list.

In [32]:
print(my_list[20])

IndexError: list index out of range

Usually, lists are iterated over using a for loop, and the iterator used is a range of numbers generated over the length of the list, like this:

In [34]:
for idx in range(len(my_list)):
    print(my_list[idx])

1
2
three
5.7
True


The `len()` function returns the number of elements on a list, while the `range()` function takes a number and generates a list of numbers from 0 to the number - 1. This is a good time to tell you that you can check how any function works by running a code cell with the function name followed by `?`:

In [37]:
range?

[0;31mInit signature:[0m [0mrange[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [38]:
len?

[0;31mSignature:[0m [0mlen[0m[0;34m([0m[0mobj[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return the number of items in a container.
[0;31mType:[0m      builtin_function_or_method


Whenever you encounter a function and you are unsure of how it works, check the documentation! This will be very valuable going forward. According to the documentation on `range()`, we can choose the starting and stopping point of the sequence, and even the step!

In [42]:
for i in range(10, 15):  # (start, stop)
    print(i)

10
11
12
13
14


In [43]:
for i in range(10, 31, 2):  # (start, stop, step)
    print(i)

10
12
14
16
18
20
22
24
26
28
30


You can also index using negative numbers to start from the end 

In [45]:
print(my_list[-2])

5.7


Could you iterate the list in reverse? Try it out! Also, we can index several elements at once by using a range of indexes separared by `:`

In [49]:
print(my_list)
print(my_list[0:2])

[1, 2, 'three', 5.7, True]
[1, 2]


As with the `range` function, index ranges include the beginning but not the end. Lastly, you can index fron the nth element until the end or from the beginning until the last element like:

In [50]:
print(my_list[:4])  # Up to the 4th element, identical to my_list[0:4]
print(my_list[3:])  # From the 3rd element onwards, identical to my_list[3:-1]

[1, 2, 'three', 5.7]
[5.7, True]


Now we will finish by explaining dictionaries and their uses. Dictionaries are created using the curly brackets `{}` and always follow a `{key: value}` structure:

In [1]:
my_dict = {'books': ['The Way of Kings', 'The Final Empire', 'Warbreaker'],
           'ratings': [10, 9, 8.5]}

The keys and values can be any type of variable we want, in this case we are using strings for the keys as it is more intuitive, but we could, for example, have a dictionary where the labels of our data observations are the keys, and the values correspond to the text labels. Imagine the classic 'dogs vs. cats' neural network, we could have our labels as:

In [2]:
label_id = {0: 'dog', 
            1: 'cat'}

That way, we can work with numerical labels and have an easy way of looking up their real meaning at any time. But how do we access dictionaries? Contrary to lists, that are accessed by position, dictionaries are accessed by key:

In [3]:
print(label_id[0])
print(my_dict['books'])

dog
['The Way of Kings', 'The Final Empire', 'Warbreaker']


If we try to access a key that does not exist, we will get an error:

In [4]:
my_dict['n_pages']

KeyError: 'n_pages'

However, we can assign a value to a key that does not exist in order to create it

In [6]:
my_dict['n_pages'] = [1007, 541, 592]
print(my_dict['n_pages'])

[1007, 541, 592]


As for iterating, if you use a dictionary by itself, the iterator will be the keys of the dictionary. You can also iterate the values directly by using `dict.values()` and get both keys and values as tuples with `dict.items()`

In [12]:
# Iterate keys
for key in my_dict:
    print(f'{key} is something we are keeping track of in this dict')
    print(f'  The values for the entry {key} are {my_dict[key]}')
    print()

# Iterate values
for val in my_dict.values():
    print(f"Something in our dict has the value {val}, but we don't really know what")
    
print()

# Iterate both keys and values
for key, val in my_dict.items():
    print(f'We know we have the key {key}, and its value is {val}. This looks similar to what we did before...')
    print()

books is something we are keeping track of in this dict
  The values for the entry books are ['The Way of Kings', 'The Final Empire', 'Warbreaker']

ratings is something we are keeping track of in this dict
  The values for the entry ratings are [10, 9, 8.5]

n_pages is something we are keeping track of in this dict
  The values for the entry n_pages are [1007, 541, 592]

Something in our dict has the value ['The Way of Kings', 'The Final Empire', 'Warbreaker'], but we don't really know what
Something in our dict has the value [10, 9, 8.5], but we don't really know what
Something in our dict has the value [1007, 541, 592], but we don't really know what

We know we have the key books, and its value is ['The Way of Kings', 'The Final Empire', 'Warbreaker']. This looks similar to what we did before...

We know we have the key ratings, and its value is [10, 9, 8.5]. This looks similar to what we did before...

We know we have the key n_pages, and its value is [1007, 541, 592]. This looks s

Most of the time, iteraring by keys is what makes the most sense, as you can access the values by using `my_dict[key]` if you need so. Iterating by key and value pairs has uses as well, but iterating over values only is much more uncommon and defeats the purpose of using dictionaries in the first place

Most of the EEG data we will process during this course will be stored on high-level objects, but the big majority of the information is stored in list or dictionaries, so with that knowledge you will surely be able to manipulate all the necessary information for completing this and the following practical sessions.

# II - The basics of MNE

With the basics of Python covered, we will now move on to a short summary of the actions that we will use the most for the remainder of the course, and then move on to the actual analyses of the data you already performed using EEGLab. We will cover:

- How to load data
- The raw object and relevant methods / attributes
- Basic pre-processing steps
- Event extraction and manipulation
- Epoching data and how to work with the Epochs object

Our basic summary is inspired from the MNE documentation. You can find tutorials and examples for a wide range of different analysis and signal processing techniques [here](https://mne.tools/stable/auto_tutorials/intro/10_overview.html#sphx-glr-auto-tutorials-intro-10-overview-py). We will start by importing the libraries we are going to need.

In [15]:
import os

import mne
import numpy as np

## Loading Data
First of all, we will learn how to load our data. MNE supports many common file extensions for EEG and MEG data (you can read more about the details in the [MNE I/O documentation](https://mne.tools/0.16/manual/io.html). Since you have been working with EEGLab data, we will load the an example .set file using the `read_raw_eeglab()` function. 

In [27]:
notebook_path = os.getcwd()
data_path = os.path.join(notebook_path, 'data/mne_tuto/example_data.set')

raw = mne.io.read_raw_eeglab(data_path, preload=True)

Reading /home/dcas/j.torre-tresols/gitrepos/isae_ICM_cours/data/mne_tuto/example_data.fdt
Reading 0 ... 143272  =      0.000 ...   286.544 secs...


  raw = mne.io.read_raw_eeglab(data_path, preload=True)


And like that, we get our data in a `raw` object, one of MNE's core data structures. If you are curious about that `preload=True` argument, remind yourself of checking the function's signature!

In [29]:
mne.io.read_raw_eeglab?

[0;31mSignature:[0m
[0mmne[0m[0;34m.[0m[0mio[0m[0;34m.[0m[0mread_raw_eeglab[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minput_fname[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0meog[0m[0;34m=[0m[0;34m([0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpreload[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0muint16_codec[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read an EEGLAB .set file.

Parameters
----------
input_fname : str
    Path to the .set file. If the data is stored in a separate .fdt file,
    it is expected to be in the same folder as the .set file.
eog : list | tuple | 'auto'
    Names or indices of channels that should be designated EOG channels.
    If 'auto', the channel names containing ``EOG`` or ``EYE`` are used.
    Defaults to empty tuple.

preload : bool

## The Raw object

We can get a summary of our `Raw` object by printing it! Also, we will take this chance to have a look at the raw's `info`. This a dictionary-like object that is preserved across data structures (e.g. when going from `Raw` to `Epochs`). This object contains all the relevant metadata of our file, with things like the channel names and the sampling frequency. Check [this page](https://mne.tools/stable/auto_tutorials/intro/30_info.html#tut-info-class) for a complete overview of the `info` data structure.

In [31]:
print(raw)
print()
print(raw.info)

<RawEEGLAB | example_data.fdt, 32 x 143273 (286.5 s), ~35.0 MB, data loaded>

<Info | 8 non-empty values
 bads: []
 ch_names: Fp1, Fz, F3, F7, FT9, FC5, FC1, C3, T7, TP9, CP5, CP1, Pz, P3, ...
 chs: 32 EEG
 custom_ref_applied: False
 dig: 32 items (32 EEG)
 highpass: 0.0 Hz
 lowpass: 250.0 Hz
 meas_date: unspecified
 nchan: 32
 projs: []
 sfreq: 500.0 Hz
>


The `Raw` object has many built-in methods that are useful, we will show some of them next. You can check all those handy functions on the [Raw object's documentation page](https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw)