##OpenMIIR Experimentation Log

The goal of exploring the OpenMIIR github repository was to take a glance at how EEG data is generally pipelined for data cleaning.  In exploring the repository and testing a python notebook that pipelined sample data, the repository has a few dependencies that need to be addressed.

**Dependencies**

The address of the GitHub repository https://github.com/sstober/openmiir, and the notebook I tested was in the directory "/eeg/preprocessing/notebooks".  I specifically tested the notebook "Subject P01.ipynb".  

Once the github repository is cloned or downloaded and unzipped, make sure that the root folder is named "OpenMIIR", and if not, rename it; certain packages depend on the root folder having this name.

First, download the raw .fif data.  There is a torrent file in the eeg folder.  Download the corresponding data file that the notebook processes through the torrent file.  For example, for the notebook I tested, the file is "P01-raw.fif".  Then, move the .fif file to the mne folder inside the eeg folder (the directory "eeg/mne/" from root).  The pipeline object will try to find the raw notebook file in the "eeg/mne/" directory.

The preprocessing notebooks require the user to have two python modules to be installed: deepthought and pylearn2.  These can be found at https://github.com/sstober/deepthought and https://github.com/lisa-lab/pylearn2.  Once they are cloned or downloaded, add the root folder to a directory and add the directory path to the PYTHONPATH variable in the bash file (".bash_profile" in user directory for Macs) for the computer environment.

The notebooks require at least two custom environment variables: DEEPTHOUGHT_DATA_PATH and DEEPTHOUGHT_OUTPUT_PATH.  To set these custom environment variables, add the following lines after importing os, replacing the parentheses with the directory that contains the OpenMIIR root folder:

In [1]:
import os
os.environ['DEEPTHOUGHT_DATA_PATH'] = "(directory that contains the OpenMIIR root folder)"
os.environ['DEEPTHOUGHT_OUTPUT_PATH'] = "(directory that contains the OpenMIIR root folder)"

In addition, when initializing the settings variable to pass to the Pipeline object, data_root must be set as the path to the OpenMIIR root folder from the notebook, which is "../../../".  Modify the line where the variable "settings" is set so that data_root="../../../":

In [None]:
settings = dict(debug=False, mne_log_level='Info', sfreq=64, data_root="../../../") # optional pipeline settings

So the first notebook block should be modified to resemble something like the code below:

In [None]:
"""
General workflow for importing a new session
"""
subject = 'P01' # TODO: change this for each subject
verbose = True  # change this for debugging

import matplotlib
%matplotlib inline

import os
os.environ['DEEPTHOUGHT_DATA_PATH'] = "/Users/mrincredible/Documents/pearson_lab"
os.environ['DEEPTHOUGHT_OUTPUT_PATH'] = "/Users/mrincredible/Documents/pearson_lab"
print os.environ

from deepthought.datasets.openmiir.preprocessing.pipeline import Pipeline
settings = dict(debug=False, mne_log_level='Info', sfreq=64, data_root="../../../") # optional pipeline settings
pipeline = Pipeline(subject, settings)

The preprocessing notebooks and the deepthought/pylearn2 packages also require a number of other packages. However, they can all be installed through pip. Pip install the following packages:
- Theano
- watchdog
- librosa

After following the above steps, the notebook should run. If not, continue running the first notebook block and see what error python throws and what packages are needed.

**Observations**

In OpenMIIR, much of the pipelining code is actually contained in the MNE object.  The data is generally pipelined in two steps: bandpass filtering and ICA.  First, the time series EEG data is observed manually and checked for any bad channels.  Then, the pipeline interpolates the bad channels with its surrounding channels.

Then, a power spectral density graph is plotted to check for unusual spikes at certain frequencies, and one channel is plotted to check for drift.  Then, a bandpass filter is applied to remove unnecessary frequencies (for this data, only frequencies in the 0.5-30Hz range were kept).

This data also had EOG data, which checks for eye blinks and is used to check for artifacts by epoching the data at the eyeblinks.  After data cleaning, spikes at the epochs at eyeblinks should be reduced.

ICA is performed to find bad components that are artifacts.  In order to perform ICA, the data is first downsampled.  Unfortunately, I tried to run the code for downsampling, but was never able to finish running the code.  However, if downsampling is not performed, performing ICA on the data also takes an extremely long time, so in terms of pipeline performance, downsampling may be a bottleneck if it is not optimized.

After ICA is applied, the pipeline first finds components that correlate with the EOG data.  It then tries to find artifact components by checking for statistics associated with the component, such as skewness, kurtosis, and variance.

The ICA data with the unwanted components excluded can then be saved through MNE, and also retrieved back and reconstructed into time series data.