# Model input ERP format preparation 

In this notebook: 
- Necessary inputs
- Read all epochs
- Function to create dataframe with average mismatch response for all participants (needs to be transformed to function)
- Formatting dataframe as suitable model input

## Imports

In [1]:
import mne      # toolbox for analyzing and visualizing EEG data
import os       # using operating system dependent functionality (folders)
import pandas as pd # data analysis and manipulation
import numpy as np    # numerical computing (manipulating and performing operations on arrays of data)
import ipywidgets as widgets
from IPython.display import display
from numpy import trapz
from varname import nameof 

import sys
sys.path.insert(0, '../eegyolk') # path to helper functions
#import eegyolk
import helper_functions as hf # library useful for eeg and erp data cleaning
import initialization_functions #library to import data
import epod_helper

In [2]:
metadata = pd.read_csv('metadata.csv', sep = ',')

In [3]:
metadata.head()

Unnamed: 0,eeg_file,ParticipantID,test,sex,age_months,dyslexic_parent,Group_AccToParents,path_eeg,path_epoch,path_eventmarkers,epoch_file
0,105a,105,a,f,17,f,1,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,105a_epo.fif
1,107a,107,a,f,16,m,1,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,107a_epo.fif
2,106a,106,a,m,19,f,0,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,106a_epo.fif
3,109a,109,a,m,21,m,0,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,109a_epo.fif
4,110a,110,a,m,17,m,1,../../volume-ceph/ePodium_projectfolder/dataset,../../volume-ceph/nadine_storage/processed_epochs,../../volume-ceph/ePodium_projectfolder/events,110a_epo.fif


## Create pandas dataframe with the average difference between standard and deviant responses

The function below needs `metadata`, the loaded `epochs` and the definition of the standard and deviant events as input. You should define your standard and deviant events as an array. In the function `input_mmr_prep` it's important to know that the assumption is made that the deviant follows after a standard event. Therefore the deviant belonging to the standard is the  standard event number + 1. Make sure your events are numbered like this, else the function won't calculate the mismatch response.  

In [5]:
def input_mmr_prep(metadata, chnames_list, standard_events, deviant_events): 
    # create dataframe with expected columns 
    df = pd.DataFrame(columns=["eeg_file",  "channel", 'std', 'sur', 'min', 'max']) 
    epochs = initialization_functions.read_filtered_data(metadata)

    # loop over all participants
    for i in range(len(metadata['eeg_file'])):
        
        std_evoked = epochs[i][standard_events].average() 
        dev_evoked = epochs[i][deviant_events].average()
        
        
        for channel in chnames_list: 
            evoked_diff = mne.combine_evoked([std_evoked, dev_evoked], weights=[1, -1]).get_data(picks=channel) # calculate the mismatch response between standard and deviant evoked
            evoked_diff = np.reshape(evoked_diff, 2049)

            mmr_std = evoked_diff.std()
            mmr_sur = trapz(evoked_diff)
            mmr_min = evoked_diff.min()
            mmr_max = evoked_diff.max()
            
            #zerocross= 0
            #for j in range(1, len(evoked_diff)): 
            #    
            #    if ((evoked_diff[j-1]) > 0 and evoked_diff[j] < 0):
            #        zerocross +=1
            #    if ((evoked_diff[j-1]) < 0 and evoked_diff[j] > 0):
            #        zerocross +=1
            #                   
            #mmr_zero = zerocross
            
            df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms 
    return df

In [22]:
# define the events for standard and deviant
standard_events = ['GiepM_S','GiepS_S','GopM_S','GopS_S'] #'GiepM_S','GiepS_S','GopM_S','GopS_S'
deviant_events = ['GiepM_D','GiepS_D','GopM_D','GopS_D'] #'GiepM_D','GiepS_D','GopM_D','GopS_D'
ch_complete = ['Fp1', 'AF3', 'F7', 'F3', 'FC1', 'FC5', 'T7', 'C3', 'CP1', 'CP5', 'P7', 'P3', 'Pz', 'PO3', 'O1', 'Oz', 'O2', 'PO4', 'P4', 'P8', 'CP6', 'CP2', 'C4', 'T8', 'FC6', 'FC2', 'F4', 'F8', 'AF4', 'Fp2', 'Fz', 'Cz']
ch_ttest = [ 'FC5', 'Pz', 'O1', 'PO4', 'AF4']
ch_literature = ['Fp1', 'F7', 'F3', 'FC1', 'FC5', 'O1', 'Oz', 'O2', 'Fz']
ch_connectivity = ['Fz', 'Oz']

# specify which one you want 
ch_name = nameof(ch_complete)
ch_type = ch_complete

df = input_mmr_prep(metadata, ch_type, standard_events, deviant_events)

Checking out file: 105a_epo.fif
Checking out file: 107a_epo.fif
Checking out file: 106a_epo.fif
Checking out file: 109a_epo.fif
Checking out file: 110a_epo.fif
Checking out file: 112a_epo.fif
Checking out file: 111a_epo.fif
Checking out file: 114a_epo.fif
Checking out file: 115a_epo.fif
Checking out file: 117a_epo.fif
Checking out file: 116a_epo.fif
Checking out file: 118a_epo.fif
Checking out file: 119a_epo.fif
Checking out file: 123a_epo.fif
Checking out file: 122a_epo.fif
Checking out file: 124a_epo.fif
Checking out file: 127a_epo.fif
Checking out file: 125a_epo.fif
Checking out file: 126a_epo.fif
Checking out file: 130a_epo.fif
Checking out file: 128a_epo.fif
Checking out file: 129a_epo.fif
Checking out file: 131a_epo.fif
Checking out file: 135a_epo.fif
Checking out file: 133a_epo.fif
Checking out file: 137a_epo.fif
Checking out file: 138a_epo.fif
Checking out file: 139a_epo.fif
Checking out file: 141a_epo.fif
Checking out file: 144a_epo.fif
Checking out file: 143a_epo.fif
Checking

  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_s

## Transpose dataframe into combination of channel per participant and merge

We now want a single row for every participant containing the paradigm and corresponding channels. The code below generates this dataframe. Also, we still need to merge some of the metadata into the dataframe, so we have the information of the age, gender and label of the participant. 

In [23]:
df

Unnamed: 0,eeg_file,channel,std,sur,min,max
0,105a,Fp1,2.775038e-06,0.002521,-3.144580e-06,8.302819e-06
1,105a,AF3,1.957511e-06,0.003108,-1.822005e-06,6.940042e-06
2,105a,F7,4.514743e-06,0.009751,-1.726272e-06,1.665358e-05
3,105a,F3,2.575544e-06,0.005209,-1.083537e-06,9.371592e-06
4,105a,FC1,6.107233e-07,-0.000770,-2.035549e-06,8.182521e-07
...,...,...,...,...,...,...
3227,221a,F8,1.078274e-06,-0.000175,-2.980455e-06,2.999743e-06
3228,221a,AF4,1.042879e-06,-0.000597,-2.944089e-06,2.021073e-06
3229,221a,Fp2,8.770587e-07,-0.000533,-3.053459e-06,1.699875e-06
3230,221a,Fz,9.737539e-07,-0.001715,-2.970754e-06,1.104267e-06


In [24]:
# drop duplicates 
df = df.drop_duplicates(subset=['eeg_file','channel']) 

# transformation of the dataframe
df = df.pivot(index='eeg_file', columns=['channel'])
df.columns = ['_'.join(str(s).strip() for s in col if s) for col in df.columns]
df.reset_index(inplace=True)

# merge data with dependent feature 
df = pd.merge(df, metadata, on='eeg_file')
pd.set_option('display.max_columns', None)

# some cleaning
df = df.drop(['eeg_file',
       'dyslexic_parent', 'path_eeg','path_epoch',
       'epoch_file', 'path_eventmarkers', 'age_months','test','ParticipantID'], axis =1)
df['sex'] = np.where(
    (df['sex']=='m'), 1,0)
first = df.pop('Group_AccToParents')
df.insert(0, 'Group_AccToParents', first)

## Safe dataframe

In [25]:
df.to_csv('df_mmr_' + ch_name + '.csv', index=False) # safe dataframe

In [None]:
df