# Preprocessing Neurophysiology Data [ReTune Dyskinesia Project]


This notebook contains a step-by-step overview of the preprocessing workflow for ECoG- and LFP-data within the ReTune-Project work package B04. This step-wise structure is provided to understand, visualize, and adjust the single steps. Besides this notebook, another script provides execution of the preprocessing steps at once.


<b> Data is required to converted into the BIDS-standard. </b>


### 0. Loading packages and functions, defining paths



In [120]:
# Importing Python and external packages
import os
import sys
import importlib
import json
from abc import ABCMeta, abstractmethod
from dataclasses import dataclass, field, fields
from collections import namedtuple
from typing import Any
from itertools import compress
from pathlib import Path
import pandas as pd
import numpy as np
import sklearn as sk
import scipy
import matplotlib.pyplot as plt
from scipy import signal
import csv

#mne
import mne_bids
import mne

In [121]:
# check some package versions for documentation and reproducability
print('Python sys', sys.version)
print('pandas', pd.__version__)
print('numpy', np.__version__)
print('mne_bids', mne_bids.__version__)
print('mne', mne.__version__)
print('sci-py', scipy.__version__)
print('sci-kit learn', sk.__version__)

Python sys 3.9.7 (default, Sep 16 2021, 08:50:36) 
[Clang 10.0.0 ]
pandas 1.3.4
numpy 1.20.3
mne_bids 0.9
mne 0.24.1
sci-py 1.7.1
sci-kit learn 1.0.1


In [122]:
# define local storage directories
projectpath = '/Users/jeroenhabets/Research/CHARITE/projects/dyskinesia_neurophys'
codepath = os.path.join(projectpath, 'code')
pynmd_path = os.path.join(codepath, 'py_neuromodulation')
rawdatapath = '/Users/jeroenhabets/OneDrive - Charité - Universitätsmedizin Berlin/BIDS_Berlin_ECOG_LFP/rawdata'

# define external storage directories
ext_projectpath = '/Volumes/JH/Research/CHARITE/projects/dyskinesia_neurophys'
ext_datapath = os.path.join(ext_projectpath, 'data/BIDS_Berlin_ECOG_LFP/rawdata')

# change working directory to project-code folder
os.chdir(codepath)
os.getcwd()

'/Users/jeroenhabets/Research/CHARITE/projects/dyskinesia_neurophys/code'

In [123]:
import lfpecog_preproc.preproc_data_management as dataMng
import lfpecog_preproc.preproc_reref as reref
import lfpecog_preproc.preproc_artefacts as artefacts
import lfpecog_preproc.preproc_filters as fltrs
import lfpecog_preproc.preproc_resample as resample

In [4]:
# # import from py_neuromodulation after setting directory
# # PM the directory of py_neuromodulation has to be added to sys.PATHS
# os.chdir(pynmd_path)
# print(os.getcwd())
# # run from dyskinesia branch-folder in py_nmd
# import dyskinesia.preprocessing as preproc
# import dyskinesia.preproc_reref as reref
# import dyskinesia.preproc_artefacts as artefacts
# import dyskinesia.preproc_filters as fltrs


/Users/jeroenhabets/Research/CHARITE/projects/dyskinesia_neurophys/code/py_neuromodulation


### 1. Data selection, defining Settings



Relevant info on BIDS-structure and the handling data-classes


- Note that the resulting Data-Class Objects below do not contain actual data yet (!)
- Create RawBrainVision data-objects: load data with rawRun1.ecog.load_data() (incl. internal mne-functionality)
- Create np.array's: load data with rawRun1.ecog.get_data(), use return_times=True to return two tuples (data, times); (used in preprocessing.py functions)

BIDS-RAW Data Structure Info:
- Grouped MNE BIDS Raw Object consists all channels within the group,
e.g. lfp_left, lfp_left, ecog, acc. Each channel (rawRun1.ecog[0])
is a tuple with the first object a ndarray of shape 1, N_samples.
- Calling rawRun1.ecog[0][0] gives the ndarray containing only data-points.
- Calling rawRun1.ecog[1] gives the ndarray containing the time stamps.


#### 1A. Define Preprocess Settings


Create data-structures (named-tuples) which contain the defined settings for the preprocessing. These settings contain the parameters of the preprocessing analyses:
- win_len (float): Length of single windows in which the data is binned (Default: 1 sec)
- artfct_sd_tresh (float): how many std-dev's are used as artefact removal threshold
- bandpass_f (int, int): lower and higher borders of freq bandpass filter
- transBW (int): transition bandwidth for notch-filter (is full width, 50% above and 50% below the chosen frequencies to filter)
- notchW (int): Notch width of notch filter
- Fs_orig (int): original sampling frequency (Hz)
- Fs_resample (int): sampling frequency (Hz) to which data is resampled
- settings_version (str): Abbreviation/codename for this specific version of settings (do not use spaces but rather underscores), e.g. 'v0.0_Jan22'

In [124]:
'''
Notes to setting definition:
- std dev: 2.5 - 3: removes higher parts of signal which does not look
    like artefact;
- notch transition width 10 and notch width 2 leaves in 50-100Hz peaks
    during ftextraction visualization
'''

default_lfp_settings = [1, 4, (1, 120), 20, 5, 4000, 800, 'debug_v0.3_Jan22']
default_ecog_settings = [1, 4, (1, 120), 20, 5, 4000, 800, 'debug_v0.3_Jan22']

In [125]:
# importlib.reload() is used everywhere to be sure that the latest
# saved version of a module/function is import during coding/debugging
importlib.reload(dataMng)

settings = dataMng.Settings(
    dataMng.PreprocSettings(*default_lfp_settings),
    dataMng.PreprocSettings(*default_lfp_settings),
    dataMng.PreprocSettings(*default_ecog_settings),
# '*' before lists unpacks the list-values as seperate args
)
groups = list(settings._fields)

#### 1B. Define Patient and Recording Settings

- First DataClass (RunInfo) gets Patient-Run specific input variables to define which run/data-file should be used
    - sub (str): patient number
    - ses (str): session code (new version e.g. 'LfpEcogMedOn01', old version e.g. 'EphysMedOn01')
    - task (str): performed task, e.g. 'Rest'
    - acq (str): acquisition, aka state of recording, usually indicates Stimulation status, but also contains time after Dopamine-intake in case of Dyskinesia-Protocol, e.g. 'StimOn01', or 'StimOn02Dopa30'
    - run (str): run number, e.g. '01'
    - raw_path (str): directory where the raw-BIDS-data is stored (Poly5-files etc), needs to direct to '/.../BIDS_Berlin_ECOG_LFP/rawdata'
    - project_path (str): directory where created files and figures are saved; should be main-project-directory, containing sub-folders 'data', 'code', 'figures'
    - preproc_sett (str): code of preprocessing settings, is extracted from PreprocSettings DataClass

- Second DataClass (RunRawData) creates the MNE-objects which are used in the following function to load the data

In [126]:
# DEFINE PTATIENT-RUN SETTINGS
sub = '008'
ses = 'EphysMedOn01'  # 'EphysMedOn02'
task = 'Rest'
acq = 'StimOff'  # 'StimOffLD00'
run = '01'
rawpath = rawdatapath  # ext_datapath

In [177]:
# create specific patient-run BIDS-Object for further pre-processing
importlib.reload(dataMng)
runInfo0 = dataMng.RunInfo(
    sub=sub,
    ses=ses,
    task=task,
    acq=acq,
    run=run,
    raw_path=rawpath,  # used to import the source-bids-data
    preproc_sett=settings.lfp_left.settings_version,
    project_path=projectpath,  # used to write the created figures and processed data
)
rawRun = dataMng.RunRawData(bidspath=runInfo0.bidspath)



------------ BIDS DATA INFO ------------
The raw-bids-object contains 49 channels with 1208604 datapoints and sample freq  4000.0 Hz
Bad channels are: ['LFP_R_16_STN_BS', 'LFP_L_10_STN_BS', 'LFP_L_13_STN_BS', 'LFP_L_14_STN_BS'] 

BIDS contains:
6 ECOG channels,
28 DBS channels: (13 left, 15 right), 
2 EMG channels, 
1 ECG channel(s), 
6 Accelerometry (misc) channels.





The search_str was "/Users/jeroenhabets/OneDrive - Charité - Universitätsmedizin Berlin/BIDS_Berlin_ECOG_LFP/rawdata/sub-008/**/ieeg/sub-008_ses-EphysMedOn01*events.tsv"
  print('\n\n------------ BIDS DATA INFO ------------\n'
  print('\n\n------------ BIDS DATA INFO ------------\n'
  print('\n\n------------ BIDS DATA INFO ------------\n'

['EEG_Cz_TM', 'EEG_Fz_TM'].

Consider using inst.set_channel_types if these are not EEG channels, or use the on_missing parameter if the channel positions are allowed to be unknown in your analyses.
  print('\n\n------------ BIDS DATA INFO ------------\n'


#### Optional viewer for un-processed data with MNE's interactive viewer

NOT USED

In [68]:
# to load grouped BIDS-Objects:
# rawRun1.ecog.load_data()

# to visualize non-pre-processed data PSD's
# for interactive plotter: activate matplotlib qt line
# %matplotlib qt
# %matplotlib inline

# rawRun1.lfp_left.plot()
# rawRun1.lfp_left.plot_psd(n_fft=1024)
# rawRun1.ecog.plot_psd(n_fft=1024)

### 2. Automated Artefact Removal (incl. Visualization)


In [128]:
# Actual Loading of the Data from BIDS-files

# data_raw is filled with loaded mne-bids data per group
data_raw = {}
for field in rawRun.__dataclass_fields__:
    print(field)
    # loops over variables within the data class
    if str(field)[:4] == 'lfp_':
        data_raw[str(field)] = getattr(rawRun, field).load_data()
    elif str(field)[:4] == 'ecog':
        data_raw[str(field)] = getattr(rawRun, field).load_data()

ch_names = {}
for group in groups:
    ch_names[group] = data_raw[group].info['ch_names']

bidspath
bids
lfp
lfp_left
Reading 0 ... 1208603  =      0.000 ...   302.151 secs...
lfp_right
Reading 0 ... 1208603  =      0.000 ...   302.151 secs...
ecog
Reading 0 ... 1208603  =      0.000 ...   302.151 secs...
acc
emg
ecg


In [129]:
# Artefact Removal

importlib.reload(artefacts)
data_clean = {}
ch_nms_clean = {}
save_dir = runInfo0.fig_path
saveNot = None
for group in groups:
    data_clean[group], ch_nms_clean[group] = artefacts.artefact_selection(
        data_bids=data_raw[group],  # raw BIDS group to process
        group=group,
        win_len=getattr(settings, group).win_len,
        n_stds_cut=getattr(settings, group).artfct_sd_tresh,  # number of std-dev from mean that is used as cut-off
        # to save: give directory, to show inline: give 'show', w/o fig: None
        save=saveNot,  # if None: no figure saved
        RunInfo=runInfo0,
    )

START ARTEFACT REMOVAL: lfp_left
Ch LFP_L_1_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_2_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_3_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_4_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_5_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_6_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_7_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_8_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_9_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_11_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_12_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_15_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_L_16_STN_BS: 0.0% is NaN (artefact or zero)
START ARTEFACT REMOVAL: lfp_right
Ch LFP_R_1_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_R_2_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_R_3_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_R_4_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_R_5_STN_BS: 0.0% is NaN (artefact or zero)
Ch LFP_R_6_STN_BS: 0.0% is Na

In [131]:
# Quality check: delete groups without valid channels
to_del = []
for group in data_clean.keys():
    if data_clean[group].shape[1] <= 1:
        to_del.append(group)
for group in to_del:
    del(data_clean[group])
    del(ch_nms_clean[group])
    groups.remove(group)
print(f'Group(s) removed: {to_del}')

Group(s) removed: []


### 3. Bandpass Filtering

In [132]:
importlib.reload(fltrs)

data_bp = {}
for group in groups:
    data_bp[group] = fltrs.bp_filter(
        data=data_clean[group],
        sfreq=getattr(settings, group).Fs_orig,
        l_freq=getattr(settings, group).bandpass_f[0],
        h_freq=getattr(settings, group).bandpass_f[1],
        method='iir',  # faster than fir
    )

### 4. Notch-filtering for Powerline Noise

In [134]:
# notch filtering in BLOCKS

importlib.reload(fltrs)
save_dir = runInfo0.fig_path
saveNOT = None
data_nf = {}
for group in data_bp.keys():
    print(f'Start Notch-Filter GROUP: {group}')
    data_nf[group] = fltrs.notch_filter(
        data=data_bp[group],
        ch_names=ch_nms_clean[group],
        group=group,
        transBW=getattr(settings, group).transBW,
        notchW=getattr(settings, group).notchW,
        method='fir',  #iir (8th or. Butterwidth) takes too long
        save=saveNOT,  # if None: no figures made and saved
        verbose=False,
        RunInfo=runInfo0,
    )

Start Notch-Filter GROUP: lfp_left
Start Notch-Filter GROUP: lfp_right
Start Notch-Filter GROUP: ecog


### 5. Resampling


Since freq's of interest are up to +/- 100 - 120 Hz, according to the Nyquist-theorem the max sample freq does not need to be more than double (~ 250 Hz).

Check differences with resampling to 400 or 800 Hz later. Or working with wider windows.
- Swann '16: 800 Hz
- Heger/ Herff: 600 Hz (https://www.csl.uni-bremen.de/cms/images/documents/publications/IS2015_brain2text.pdf)


In [135]:
importlib.reload(resample)

# resampling one run at a time
data_rs = {}  # dict to store resampled data
for group in groups:
    data_rs[group] = resample.resample(
        data=data_nf[group],
        Fs_orig=getattr(settings, 'ecog').Fs_orig,
        Fs_new = getattr(settings, 'ecog').Fs_resample,
    )

### 6. Rereferencing



Common Practice LFP Re-referencing: difference between two nieghbouring contacts
- For segmented Leads: average every level


Relevant ECOG-rereferencing literature used: 
- Common Average Rereferencing (Liu ea, J Neural Eng 2015 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485665/)
- ECOG is local sign with spread +/- 3mm (Dubey, J Neurosc 2019): https://www.jneurosci.org/content/39/22/4299 
- READ ON - DATA ANALYSIS: Relevance of data-driven spatial filtering for invasive EEG. For gamma: CAR is probably sufficient. For alpha-beta: ... Hihg inter-subject variability in ECOG. (Shaworonko & Voytek, PLOS Comp Biol 2021: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009298)
- Submilimeter (micro)ECOG: http://iebl.ucsd.edu/sites/iebl.ucsd.edu/files/2018-06/Sub-millimeter%20ECoG%20pitch%20in%20human%20enables%20higher%20%EF%AC%81delity%20cognitiveneural%20state%20estimation.pdf


Check rereferencing methods:
- de Cheveigne/Arzounian NeuroImage 2018
- pre-prints Merk 2021 and Petersen 2021 (AG Kühn / AG Neumann)
- pre-print epilepsy ecog movement (MUMC)


P.M. Check further in to Spatial Filtering:
- Spatial filter estimation via spatio-spectral decomposition: ............ TO READ   (Nikulin & Curio, NeuroImage 2011, https://www.sciencedirect.com/science/article/pii/S1053811911000930?via%3Dihub)
- Spatio-Spectral Decomposition: proposed dimensionality-reduction instead of PCA (Haufe, ..., Nikulin, https://www.sciencedirect.com/science/article/pii/S1053811914005503?via%3Dihub)
- Also check: SPoC (Castano et al NeuroImage Clin 2020)


In [176]:
importlib.reload(reref)
lfp_reref='segments'
data_rrf = {}
names = {}

# deleting possible existing report-file
if 'reref_report.txt' in os.listdir(
        runInfo0.data_path):
    with open(os.path.join(runInfo0.data_path,
            'reref_report.txt'), 'w'):
        pass

for group in groups:
    data_rrf[group], names[group] = reref.rereferencing(
        data=data_rs[group],
        group=group,
        runInfo=runInfo0,
        lfp_reref=lfp_reref,
        chs_clean=ch_nms_clean[group],
    )

LFP_L_1_STN_BS	LFP_L_2_STN_BS	LFP_L_3_STN_BS	LFP_L_4_STN_BS	LFP_L_5_STN_BS	LFP_L_6_STN_BS	LFP_L_7_STN_BS	LFP_L_8_STN_BS	LFP_L_9_STN_BS	LFP_L_11_STN_BS	LFP_L_12_STN_BS	LFP_L_15_STN_BS	LFP_L_16_STN_BS
{0: ['LFP_L_1_', 'LFP_L_2_', 'LFP_L_3_'], 1: ['LFP_L_4_', 'LFP_L_5_', 'LFP_L_6_'], 2: ['LFP_L_7_', 'LFP_L_8_', 'LFP_L_9_'], 3: ['LFP_L_11_', 'LFP_L_12_'], 4: ['LFP_L_15_'], 5: ['LFP_L_16_']}
{0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9], 3: [10, 11], 4: [12], 5: [13]}

 Rereferencing BS Vercise Cartesia X (L) against other contacts of same level
Row REFS [2, 3], SHAPE (302, 14, 800)
Row REFS [1, 3], SHAPE (302, 14, 800)
Row REFS [1, 2], SHAPE (302, 14, 800)
Row REFS [5, 6], SHAPE (302, 14, 800)
Row REFS [4, 6], SHAPE (302, 14, 800)
Row REFS [4, 5], SHAPE (302, 14, 800)
Row REFS [8, 9], SHAPE (302, 14, 800)
Row REFS [7, 9], SHAPE (302, 14, 800)
Row REFS [7, 8], SHAPE (302, 14, 800)
Row REFS [11], SHAPE (302, 14, 800)
Row REFS [10], SHAPE (302, 14, 800)
TAKE LEVEL HIGHER
ref rows [13]
(302, 14, 8

### 7. Saving Preprocessed Signals

In [186]:
importlib.reload(dataMng)
for group in groups:
    dataMng.save_arrays(
        data=data_rrf[group],
        names=names[group],
        group=group,
        runInfo=runInfo0,
        lfp_reref=lfp_reref,
    )