# EEG preprocessing 

In this notebook: 
- Necessary imports
- Data loader for events, eeg and meta data
- Filtering algorithm
- EEG raw to epochs
- saving filtered data to `metadata.csv`

Preprocessing steps: 
+ Prepare EEG (1. Subtract reference (mastoids), 2. Detrend, 3. Filter, 4. Remove bad channels)
+ Segment EEG into standard and deviant epochs (ERPs) (1. subtract baseline, 2. Reject artefacts, 3. Average (for each marker/subject/channel separately))
+ Calculate Mismatch response (deviant - standard for a single subject) (check differences between channels and subjects)

## Imports

The data will be processed using the mne library. Also there are libraries made in eegyolk in order to load the metadata, eeg data and the event markers. Those libraries need to be imported

In [1]:
import mne      # toolbox for analyzing and visualizing EEG data
import os       # using operating system dependent functionality (folders)
import pandas as pd # data analysis and manipulation
import ipywidgets as widgets
from IPython.display import display
import matplotlib.pyplot as plt
import json

import eegyolk
import eegyolk.helper_functions as hf # library useful for eeg and erp data cleaning
import eegyolk.initialization_functions #library to import data
import eegyolk.epod_helper


## Load metadata and eeg files

First the different pathways for the different datasets need to be defined. There are three pathways: eeg, metadata and events. The files can be loaded using the initialization_functions library. All event markers needs to be saved in a seperate folder. If not saved already, the event markers will be saved using the initialization_function library. The data must be saved in a separate folder called "epod_data_not_pushed" in the ePodium repository. 

In [27]:
with open("config_workspace.json") as jsonFile:
    jsonObject = json.load(jsonFile)
    jsonFile.close()

data_path = jsonObject['root']
path_eeg = jsonObject['dataset']
path_metadata = jsonObject['metadata']
path_eventmarkers = jsonObject['events']
path_epochs = jsonObject['preprocessed']

print(path_eeg)
print(path_metadata)
print(path_eventmarkers)
print(path_epochs)

../../volume-ceph/ePodium_projectfolder/dataset
../../volume-ceph/ePodium_projectfolder/metadata
../../volume-ceph/ePodium_projectfolder/events
../../volume-ceph/nadine_storage/processed_epochs


In [9]:
# load metadata
files_metadata = ["children.txt", "cdi.txt", "parents.txt", "CODES_overview.txt"]  
children, cdi, parents, codes = eegyolk.initialization_functions.i_load_metadata(path_metadata, files_metadata)

In [12]:
# load eeg
eeg, eeg_filename =  eegyolk.initialization_functions.load_dataset(path_eeg, preload=False) # preload must be set to True once on the cloud

TypeError: load_dataset() got an unexpected keyword argument 'verbose'

In [11]:
# load events 
events_files = os.listdir(path_eventmarkers)
if len(events_files) == 0 or path_eventmarkers == False: # check if event markers are saved in a seperate folder
    eegyolk.initialization_functions.save_event_markers(path_eventmarkers, eeg, eeg_filename) # save event markers

event_markers = eegyolk.initialization_functions.load_events(path_eventmarkers, eeg_filename) # load event markers
event_markers_simplified = eegyolk.epod_helper.group_events_12(event_markers) # simplify events

248 Event Marker files loaded


## Filtering raw EEG 

### Set filter parameters

Below you can define the frequencies for the bandpass filter. The lowpass can not be below 0 and the highpass can not be higher then 100. Most common bandpass filter is filtering between 0.1 and 30. 

In [13]:
lowpass = widgets.BoundedFloatText(
    value=0.1,
    min=0,
    max=100,
    step=0.1,
    description='lowpass:',
    disabled=False
)

highpass = widgets.BoundedFloatText(
    value=40,
    min=0,
    max=100,
    step=0.1,
    description='highpass:',
    disabled=False
)

widgets.VBox([lowpass,highpass])


VBox(children=(BoundedFloatText(value=0.1, description='lowpass:', step=0.1), BoundedFloatText(value=40.0, des…

In [14]:
# change type to integer
lowpass = float(lowpass.value)
highpass = float(highpass.value)

The number of freqs can vary and be adjusted by changing `n`. The used frequency for this analysis is `[60, 120, 180, 240]`.

In [15]:
n = 4
freq = list(widgets.BoundedIntText(
    description='freq[{}]'.format(i),
    min=0,
    max=300,
    step=1,
    value=(i+1)*60)
    for i in range(n))

widgets.VBox(children=freq)

VBox(children=(BoundedIntText(value=60, description='freq[0]', max=300), BoundedIntText(value=120, description…

In [16]:
freqs= [f.value for f in freq]

Epochs are created with joining the eeg data with a specific event.  mne.Epochs automaticaly create a baseline correction and artefact rejection. 

In [17]:
event_dictionary = eegyolk.epod_helper.event_dictionary
event_dictionary

{'GiepMT_FS': 1,
 'GiepMT_S': 2,
 'GiepMT_D': 3,
 'GiepST_FS': 4,
 'GiepST_S': 5,
 'GiepST_D': 6,
 'GopMT_FS': 7,
 'GopMT_S': 8,
 'GopMT_D': 9,
 'GopST_FS': 10,
 'GopST_S': 11,
 'GopST_D': 12}

In order to create the epochs, the time before `tmin` and after an event `tmax` needs to be defined. The default values are set to -0.2 and 0.8. `tmin` and `tmax` are the start and stop time relative to each event.

In [18]:
tmin = widgets.BoundedFloatText(
    value=-0.2,
    min=-1,
    max=1,
    step=0.1,
    description='tmin:',
    disabled=False
)

tmax = widgets.BoundedFloatText(
    value=0.8,
    min=-1,
    max=1,
    step=0.1,
    description='tmax:',
    disabled=False
)

widgets.VBox([tmin,tmax])

VBox(children=(BoundedFloatText(value=-0.2, description='tmin:', max=1.0, min=-1.0, step=0.1), BoundedFloatTex…

In [19]:
tmin = float(tmin.value)
tmax = float(tmax.value)

### Filter generator

This filter contains a bandpass filter, with as input the parameters `lowpass` and `highpass`. It also contains a notch filter to filter out power line noise and needs as input `freqs` for frequencies to apply the notch filter on. The next input is `mastoid_channels`, to subtract the reference from the raw eeg data. Finally, channels from the eeg can be dropped by adjusting the `drop_ch`. 

For a selected event, an interval is created with a time before and after event. This represents an epoch. The function automatically performs a baseline correction. 

In [15]:
mastoid_channels = ['EXG1', 'EXG2']
drop_ch = ['EXG1', 'EXG2','EXG3', 'EXG4', 'EXG5', 'EXG6', 'EXG7', 'EXG8', 'Status']

def filter_raweeg_gen(eeg, lowpass, highpass, freqs, mastoid_channels, drop_ch):
    for i in range(len(eeg)): 
        processed_file = os.path.join(epoch_folder, eeg_filename[i]+"_epo.fif")
        if not os.path.exists(processed_file):
            yield hf.filter_eeg_raw(eeg[i].load_data(), lowpass, highpass, freqs, mastoid_channels, drop_ch)
        else: 
            yield print(f"File {processed_file} already processed \n", end = '')
            

In [16]:
filtered_eegs = filter_raweeg_gen(eeg, lowpass, highpass, freqs, mastoid_channels, drop_ch)
epoch_folder = os.path.join(data_path, "epochs")

if not os.path.exists(epoch_folder):
        os.mkdir(epoch_folder)
        
for idx, single_eeg in enumerate(filtered_eegs):
    processed_file = os.path.join(epoch_folder, eeg_filename[idx]+"_epo.fif")
    if not os.path.exists(processed_file):
        epoch = hf.create_epoch(single_eeg, event_markers_simplified[idx], tmin, tmax)
        epoch_file = os.path.join(epoch_folder, eeg_filename[idx]+"_epo.fif")
        epoch.save(epoch_file, overwrite=True)
        print("\n", idx+1, " saved.")


File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\101a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\101b_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\102a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\102b_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\103a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\103b_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\104a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\104b_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\105a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\105b_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodium_projectfolder\epochs\106a_epo.fif already processed 
File F:/Stage/ePODIUM/Data/ePodi

## Create DataFrame with metadata and eeg/epoch paths

In [40]:
parents.rename(columns = {'child':'ParticipantID'}, inplace=True)
cdi.rename(columns = {'participant':'ParticipantID'}, inplace=True)

In [41]:
metadata = pd.merge(cdi, children, on="ParticipantID")
metadata = pd.merge(metadata, parents, on="ParticipantID")

In [42]:
metadata.columns

Index(['ParticipantID', 'test', 'sex', 'age_months', 'age_months_days',
       'dyslexic_parent', 'b01 - Geluidseffecten en dierengeluiden',
       'b02 - Dierennamen', 'b03 - Voertuigen', 'b04 - Speelgoed',
       'b05 - Eten en drinken', 'b06 - Kleding', 'b07 - Delen van het lichaam',
       'b08 - Kleine huishoudelijke voorwerpen', 'b09 - Meubels en kamers',
       'b10 - Voorwerpen buitenshuis', 'b11 - Plaatsen buitenshuis',
       'b12 - Mensen', 'b13 - Spelletjes en routines',
       'b14 - Omschrijvende woorden', 'b15 - Werkwoorden',
       'b16 - Woorden over tijd', 'b17 - Voornaamwoorden',
       'b18 - Vragende woordjes', 'b19 - Voorzetsels en plaatsbepalingen',
       'b20 - Hoeveelheden en lidwoorden', 'b21 - Hulpwerkwoorden',
       'b22 - Voegwoorden', 'p01 - Geluidseffecten en dierengeluiden',
       'p02 - Dierennamen', 'p03 - Voertuigen', 'p04 - Speelgoed',
       'p05 - Eten en drinken', 'p06 - Kleding', 'p07 - Delen van het lichaam',
       'p08 - Kleine huishoudelij

In [55]:
metadata = metadata.drop(['age_months_days','b01 - Geluidseffecten en dierengeluiden',
       'b02 - Dierennamen', 'b03 - Voertuigen', 'b04 - Speelgoed',
       'b05 - Eten en drinken', 'b06 - Kleding', 'b07 - Delen van het lichaam',
       'b08 - Kleine huishoudelijke voorwerpen', 'b09 - Meubels en kamers',
       'b10 - Voorwerpen buitenshuis', 'b11 - Plaatsen buitenshuis',
       'b12 - Mensen', 'b13 - Spelletjes en routines',
       'b14 - Omschrijvende woorden', 'b15 - Werkwoorden',
       'b16 - Woorden over tijd', 'b17 - Voornaamwoorden',
       'b18 - Vragende woordjes', 'b19 - Voorzetsels en plaatsbepalingen',
       'b20 - Hoeveelheden en lidwoorden', 'b21 - Hulpwerkwoorden',
       'b22 - Voegwoorden', 'p01 - Geluidseffecten en dierengeluiden',
       'p02 - Dierennamen', 'p03 - Voertuigen', 'p04 - Speelgoed',
       'p05 - Eten en drinken', 'p06 - Kleding', 'p07 - Delen van het lichaam',
       'p08 - Kleine huishoudelijke voorwerpen', 'p09 - Meubels en kamers',
       'p10 - Voorwerpen buitenshuis', 'p11 - Plaatsen buitenshuis',
       'p12 - Mensen', 'p13 - Spelletjes en routines',
       'p14 - Omschrijvende woorden', 'p15 - Werkwoorden',
       'p16 - Woorden over tijd', 'p17 - Voornaamwoorden',
       'p18 - Vragende woordjes', 'p19 - Voorzetsels en plaatsbepalingen',
       'p20 - Hoeveelheden en lidwoorden', 'p21 - Hulpwerkwoorden',
       'p22 - Voegwoorden', 'wB - Woorduiteinden', 'wC - Woordvormen',
       'zE - Zinnen',  'Woordvormen - Ruwe score', 'Woordvormen - Percentiel',
       'Woordvormen - Taalleeftijd', 'Zinnen - Ruwe score',
       'Zinnen - Percentiel', 'Zinnen - Taalleeftijd', 'Woordenschatbegrip - Ruwe score',
       'Woordenschatbegrip - Percentiel', 'Woordenschatbegrip - Taalleeftijd',
       'Woordenschatproductie - Ruwe score',
       'Woordenschatproductie - Percentiel',
       'Woordenschatproductie - Taalleeftijd', 'CDIpresent_a', 'CDIpresent_b',  'Sex', 'Age_original_a', 'Age_days_a', 'Age_months_a',
       'Age_original_b', 'Age_days_b', 'Age_months_b', 'emt_mother','klepel_mother', 'vc_mother', 'dyslexia_mother_accToMother',
       'emt_father', 'klepel_father', 'vc_father',
       'dyslexia_father_accToFather'], axis = 1)

KeyError: "['age_months_days', 'b01 - Geluidseffecten en dierengeluiden', 'b02 - Dierennamen', 'b03 - Voertuigen', 'b04 - Speelgoed', 'b05 - Eten en drinken', 'b06 - Kleding', 'b07 - Delen van het lichaam', 'b08 - Kleine huishoudelijke voorwerpen', 'b09 - Meubels en kamers', 'b10 - Voorwerpen buitenshuis', 'b11 - Plaatsen buitenshuis', 'b12 - Mensen', 'b13 - Spelletjes en routines', 'b14 - Omschrijvende woorden', 'b15 - Werkwoorden', 'b16 - Woorden over tijd', 'b17 - Voornaamwoorden', 'b18 - Vragende woordjes', 'b19 - Voorzetsels en plaatsbepalingen', 'b20 - Hoeveelheden en lidwoorden', 'b21 - Hulpwerkwoorden', 'b22 - Voegwoorden', 'p01 - Geluidseffecten en dierengeluiden', 'p02 - Dierennamen', 'p03 - Voertuigen', 'p04 - Speelgoed', 'p05 - Eten en drinken', 'p06 - Kleding', 'p07 - Delen van het lichaam', 'p08 - Kleine huishoudelijke voorwerpen', 'p09 - Meubels en kamers', 'p10 - Voorwerpen buitenshuis', 'p11 - Plaatsen buitenshuis', 'p12 - Mensen', 'p13 - Spelletjes en routines', 'p14 - Omschrijvende woorden', 'p15 - Werkwoorden', 'p16 - Woorden over tijd', 'p17 - Voornaamwoorden', 'p18 - Vragende woordjes', 'p19 - Voorzetsels en plaatsbepalingen', 'p20 - Hoeveelheden en lidwoorden', 'p21 - Hulpwerkwoorden', 'p22 - Voegwoorden', 'wB - Woorduiteinden', 'wC - Woordvormen', 'zE - Zinnen', 'Woordvormen - Ruwe score', 'Woordvormen - Percentiel', 'Woordvormen - Taalleeftijd', 'Zinnen - Ruwe score', 'Zinnen - Percentiel', 'Zinnen - Taalleeftijd', 'Woordenschatbegrip - Ruwe score', 'Woordenschatbegrip - Percentiel', 'Woordenschatbegrip - Taalleeftijd', 'Woordenschatproductie - Ruwe score', 'Woordenschatproductie - Percentiel', 'Woordenschatproductie - Taalleeftijd', 'CDIpresent_a', 'CDIpresent_b', 'Sex', 'Age_original_a', 'Age_days_a', 'Age_months_a', 'Age_original_b', 'Age_days_b', 'Age_months_b', 'emt_mother', 'klepel_mother', 'vc_mother', 'dyslexia_mother_accToMother', 'emt_father', 'klepel_father', 'vc_father', 'dyslexia_father_accToFather'] not found in axis"

In [56]:
metadata['eeg_file']= metadata['ParticipantID'].astype(str) + metadata['test']

In [25]:
epoch_folder = os.path.join(data_path, "epochs_clean").replace("\\","/") 

In [57]:
epoch_filename = []

# Iterate directory
for path in os.listdir(path_epochs):
    # check if current path is a file
    if os.path.isfile(os.path.join(path_epochs, path)):
        epoch_filename.append(path)

In [58]:
df_eegfilenames = pd.DataFrame(eeg_filename, columns=['eeg_file'])
df_epochfilenames = pd.DataFrame(epoch_filename, columns=['epoch_file'])

In [59]:
df_epochfilenames['eeg_file'] = df_epochfilenames.epoch_file.str[:4]

In [60]:
metadata['path_eeg'] = path_eeg
metadata['path_epoch'] = path_epochs#epoch_folder
metadata['path_eventmarkers'] = path_eventmarkers

In [61]:
metadata['path_epoch']

0      ../../volume-ceph/nadine_storage/processed_epochs
1      ../../volume-ceph/nadine_storage/processed_epochs
2      ../../volume-ceph/nadine_storage/processed_epochs
3      ../../volume-ceph/nadine_storage/processed_epochs
4      ../../volume-ceph/nadine_storage/processed_epochs
                             ...                        
217    ../../volume-ceph/nadine_storage/processed_epochs
218    ../../volume-ceph/nadine_storage/processed_epochs
219    ../../volume-ceph/nadine_storage/processed_epochs
220    ../../volume-ceph/nadine_storage/processed_epochs
221    ../../volume-ceph/nadine_storage/processed_epochs
Name: path_epoch, Length: 222, dtype: object

In [62]:
df = pd.merge(df_eegfilenames, metadata, on='eeg_file')
df = pd.merge(df, df_epochfilenames, on='eeg_file')

In [63]:
drop_files = ["102a","113a", "107b (deel 1+2)", "132a", "121b(2)", "113b", "107b (deel 3+4)", "147a",
                "121a", "134a", "143b", "121b(1)","136a", "145b", "150a","152a", "184a", "165a", "151a", "163a", "179a","179b", "182b", "186a", "193b"]

df = df[~df['eeg_file'].isin(drop_files)]
df = df.drop(df[df['test'] == "b"].index).reset_index(drop=True)

In [64]:
df.to_csv('metadata.csv', index=False)

In [65]:
df

Unnamed: 0,eeg_file,ParticipantID,test,sex,age_months,dyslexic_parent,Group_AccToParents,path_eeg,path_epoch,path_eventmarkers,epoch_file
0,105a,105,a,f,17,f,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,105a_epo.fif
1,107a,107,a,f,16,m,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,107a_epo.fif
2,106a,106,a,m,19,f,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,106a_epo.fif
3,109a,109,a,m,21,m,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,109a_epo.fif
4,110a,110,a,m,17,m,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,110a_epo.fif
...,...,...,...,...,...,...,...,...,...,...,...
96,209a,209,a,m,17,f,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,209a_epo.fif
97,210a,210,a,m,19,m,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,210a_epo.fif
98,202a,202,a,f,16,Nee,Control,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,202a_epo.fif
99,201a,201,a,m,18,m,At risk,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,201a_epo.fif
