# Model input ERP format preparation 

In this notebook: 
- Necessary inputs
- Read epochs from `metadata.csv`
- Select approach
- Function to create dataframe with average mismatch response for all participants
- Reformatting dataframe as suitable model input
- Save output to `df_mmr_ch_approach.csv` 

## Imports

In [1]:
import mne      # toolbox for analyzing and visualizing EEG data
import os       # using operating system dependent functionality (folders)
import pandas as pd # data analysis and manipulation
import numpy as np    # numerical computing (manipulating and performing operations on arrays of data)
import ipywidgets as widgets
from IPython.display import display
from numpy import trapz
from varname import nameof 
import scipy.stats as stats

import eegyolk
from eegyolk import initialization_functions #library to import data

In [2]:
metadata = pd.read_csv('metadata.csv', sep = ',')

## Create pandas dataframe with the average difference between standard and deviant responses

The function below needs `metadata` and the standard and deviant events as input. You should define your standard and deviant events as an array. The function also takes as input the channels of interest. For our research there are four different approaches and therefore four different sets of channels. 

In [3]:
def input_mmr_prep(metadata, chnames_list, standard_events, deviant_events): 
    # create dataframe with expected columns 
    df = pd.DataFrame(columns=["eeg_file",  "channel", 'std', 'sur', 'min', 'max']) 
    epochs = initialization_functions.read_filtered_data(metadata)

    # loop over all participants
    for i in range(len(metadata['eeg_file'])):
        std_evoked = epochs[i][standard_events].average() 
        dev_evoked = epochs[i][deviant_events].average()
        
        for channel in chnames_list: 
            evoked_diff = mne.combine_evoked([std_evoked, dev_evoked], weights=[1, -1]).get_data(picks=channel) # calculate the mismatch response between standard and deviant evoked
            evoked_diff = np.reshape(evoked_diff, 2049)

            mmr_std = evoked_diff.std()
            mmr_sur = trapz(evoked_diff)
            mmr_min = evoked_diff.min()
            mmr_max = evoked_diff.max()
            
            df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms 
    return df

The 4 different approaches are saved in the variables: `ch_complete`, `ch_ttest`, `ch_literature` and `ch_connectivity`. The first one is based on all channels and therefore is used as a baseline. The second set of channels are based on related literature. The third and fourth sets are obtained from the `data_analysis.ipynb` notebook. Make sure to change the variabele names of ch_name and ch_type to the approach you want to use. This will change the input and the name of the output file. 

In [4]:
# define the events for standard and deviant
standard_events = ['GopS_S'] #'GiepM_S','GiepS_S','GopM_S','GopS_S'
deviant_events = ['GopS_D'] #'GiepM_D','GiepS_D','GopM_D','GopS_D'

# define 4 different approaches
ch_complete = ['Fp1', 'AF3', 'F7', 'F3', 'FC1', 'FC5', 'T7', 'C3', 'CP1', 'CP5', 'P7', 'P3', 'Pz', 'PO3', 'O1', 'Oz', 'O2', 'PO4', 'P4', 'P8', 'CP6', 'CP2', 'C4', 'T8', 'FC6', 'FC2', 'F4', 'F8', 'AF4', 'Fp2', 'Fz', 'Cz']
ch_ttest = [ 'F3', 'FC5', 'T8', 'F8', 'AF4']
ch_literature = ['Fp1', 'F3', 'Fz', 'F4', 'F8', 'T7', 'C3', 'Cz', 'C4', 'AF3', 'P7']
ch_connectivity = ['Fz', 'Fp1', 'Fp2']

# specify which one you want 
ch_name = nameof(ch_connectivity)
ch_type = ch_connectivity

df = input_mmr_prep(metadata, ch_type, standard_events, deviant_events)

Checking out file: 105a_epo.fif
Checking out file: 107a_epo.fif
Checking out file: 106a_epo.fif
Checking out file: 109a_epo.fif
Checking out file: 110a_epo.fif
Checking out file: 112a_epo.fif
Checking out file: 111a_epo.fif
Checking out file: 114a_epo.fif
Checking out file: 115a_epo.fif
Checking out file: 117a_epo.fif
Checking out file: 116a_epo.fif
Checking out file: 118a_epo.fif
Checking out file: 119a_epo.fif
Checking out file: 123a_epo.fif
Checking out file: 122a_epo.fif
Checking out file: 124a_epo.fif
Checking out file: 127a_epo.fif
Checking out file: 125a_epo.fif
Checking out file: 126a_epo.fif
Checking out file: 130a_epo.fif
Checking out file: 128a_epo.fif
Checking out file: 129a_epo.fif
Checking out file: 131a_epo.fif
Checking out file: 135a_epo.fif
Checking out file: 133a_epo.fif
Checking out file: 137a_epo.fif
Checking out file: 138a_epo.fif
Checking out file: 139a_epo.fif
Checking out file: 141a_epo.fif
Checking out file: 144a_epo.fif
Checking out file: 143a_epo.fif
Checking

  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_std, 'sur' : mmr_sur, 'min' : mmr_min, 'max' : mmr_max}, ignore_index=True) # add 'paradigm : paradigm' if we want to separate the paradigms
  df = df.append({'eeg_file': metadata['eeg_file'][i], 'channel': channel, 'std' : mmr_s

## Transpose dataframe into combination of channel per participant and merge

We now want a single row for every participant containing the paradigm and corresponding channels. The code below generates this dataframe. Also, we still need to merge some of the metadata into the dataframe, so we have the information of the age, gender and label of the participant. 

In [5]:
# drop duplicates 
df = df.drop_duplicates(subset=['eeg_file','channel']) 

# transformation of the dataframe
df = df.pivot(index='eeg_file', columns=['channel'])
df.columns = ['_'.join(str(s).strip() for s in col if s) for col in df.columns]
df.reset_index(inplace=True)

# merge data with dependent feature 
df = pd.merge(df, metadata, on='eeg_file')
pd.set_option('display.max_columns', None)

# some cleaning
df = df.drop(['eeg_file',
       'dyslexic_parent', 'path_eeg','path_epoch',
       'epoch_file', 'path_eventmarkers', 'age_months','test','ParticipantID', 'sex'], axis =1)
first = df.pop('Group_AccToParents')
df.insert(0, 'Group_AccToParents', first)

## Safe dataframe

Here the dataframe will be saved with the selected channel as name. Make sure that the csv is saved in the same folder as this notebook, so it can be used in the next notebook `model_prediction_ml.ipynb`.

In [6]:
df.to_csv('df_mmr_' + ch_name + '.csv', index=False) # safe dataframe

In [7]:
df

Unnamed: 0,Group_AccToParents,std_Fp1,std_Fp2,std_Fz,sur_Fp1,sur_Fp2,sur_Fz,min_Fp1,min_Fp2,min_Fz,max_Fp1,max_Fp2,max_Fz
0,1,0.000002,0.000002,0.000002,0.001055,-0.000709,-0.003491,-0.000004,-0.000005,-0.000008,0.000006,0.000004,0.000004
1,1,0.000003,0.000003,0.000008,-0.004084,-0.007121,0.003053,-0.000010,-0.000009,-0.000032,0.000006,0.000007,0.000019
2,1,0.000003,0.000003,0.000002,-0.003703,-0.003768,-0.004645,-0.000008,-0.000009,-0.000006,0.000004,0.000005,0.000003
3,1,0.000004,0.000004,0.000002,0.007855,0.009944,0.002387,-0.000005,-0.000004,-0.000003,0.000010,0.000011,0.000005
4,0,0.000003,0.000004,0.000005,0.001865,0.007250,0.011927,-0.000006,-0.000005,-0.000005,0.000007,0.000012,0.000016
...,...,...,...,...,...,...,...,...,...,...,...,...,...
104,0,0.000007,0.000007,0.000006,-0.009766,-0.007542,0.009053,-0.000023,-0.000023,-0.000010,0.000009,0.000011,0.000020
105,1,0.000003,0.000003,0.000002,0.003111,0.002956,-0.000141,-0.000004,-0.000004,-0.000004,0.000010,0.000010,0.000005
106,1,0.000005,0.000005,0.000003,-0.015419,-0.015060,-0.005327,-0.000017,-0.000015,-0.000009,0.000002,0.000003,0.000003
107,0,0.000007,0.000006,0.000005,-0.024567,-0.020106,-0.013568,-0.000024,-0.000020,-0.000016,0.000004,0.000006,0.000003
