Before analysing the data we have to do some - more - datawrangling. We have to take our two dataframes of interest, and somehow merge them into one timedomain dataframe. We then have to import this combined dataframe into MNE - making it understandable for this toolbox by creating whats called 'event_arrays'.

We assume to have a `stim_df` (in stimulus domain - one row per stimulation) and a `data_df` (in time domain). The stim_df should have all the regressors we want to use in, whereass the data_df should have all predicted ('y'-timecourse) data in it. 

### Step 1. merging dataframes into timedomain data
As a first step we have to map the `stim_df` into the timedomain data, going from a structure where we have one row per stimulus to a structure where we have many timepoints per stimulus.

Here we assume to have collumn in both the `stim_df` and the `data_df` with clock information (that match, see at the end of this notebook for more information).

In [None]:
## MOVE FUNCTIONS FROM STIMULUS DOMAIN INTO TIMEDOMAIN
## MAP 'STIMULUS' TO INDEX OF WHAT STIMULI, FOR EASY MAPPING

def stim_save_segments(df_beh, groupby_nm=['block', 'segment']):
    """apply a new column to behavioural dataframe with segment_all - 
    indicating a continious numerical indicator of what segment we are on"""

    # predefine segement all in df_beh
    df_beh['segment_all'] = np.nan
    
    # loop over block and segment combinations
    for idx, row in df_beh.groupby(groupby_nm).first().reset_index().iterrows():

        # save new segment all 
        df_beh.loc[(df_beh['block'] == row['block']) & (df_beh['segment'] == row['segment']), 'segment_all'] = idx 

    # return the dataframe
    return(df_beh)


def map_stim_to_time(df, df_beh, cn_stim='stimulus', cn_ts='TIMESTAMP', 
                     beh_cn_onset='timing_meg', beh_cn_offset='timing_offset_meg'):
    """transform / map stimulus dataframe into the time domain
    input: df: Pandas dataframe - time domain
           df_beh: Pandas dataframe - stim domain
           cn_stim: (optional) column name for stimulus indicator
           cn_ts: (optional) column name for timestamp indicater in time df
           beh_cn_onset: (optional) column name for onset time in beh_df
           beh_cn_offset: (optional) column name for offset time in beh_df"""
    
    # Apply the function to create the 'stimulus' column in df_time
    df['stimulus'] = df['TIMESTAMP'].apply(_assign_stimulus, 
                                           df_beh=df_beh,
                                           cn_onset=beh_cn_onset,
                                           cn_offset=beh_cn_offset,
                                           cn_stimulus=cn_stim)
    # Return the dataframe
    return(df)


def map_columns_to_time(df, df_beh, col_to_trans, indicator_nm='stimulus'):
    """map columns of interest to transfer to timedomain
    df: dataframe in timedomain
    df_beh: dataframe in stim domain
    col_to_trans: all columns to transfer
    indicator_nm: (optional) indicator name - what to use for the mapping"""
    
    # loop over columns to transfer
    for colnm in col_to_trans:

        # create a dictionary to map 'stimulus' to all conditions I want to transfer to the other df
        stimulus_to_col = df_beh.set_index(indicator_nm)[colnm].to_dict()

        # map function to go from one to another
        df[colnm] = df[indicator_nm].map(stimulus_to_col).fillna(0)
        
    # returns dataframe
    return(df)


def map_block_to_run(df, df_beh, run_nm='run', block_nm_beh='block', block_nm='BLOCK'):
    """map from blocknumber to run number in timedomain"""

    # create a dictionary to map 'block' to run
    map_block_run = df_beh.set_index(block_nm_beh)[run_nm].to_dict()

    # map the df
    df[run_nm] = df[block_nm].map(map_block_run).fillna(0).astype(int)

    # return dataframe
    return(df)


def time_save_segments(df, df_beh,
                       groupby_nm=['block', 'segment'],
                       onset_nm='timing_meg',
                       offset_nm='timing_offset_meg',
                       timing_mm='TIMESTAMP'
                      ):
    """save segments into the time domain dataframe
    df: timedomain dataframe
    df_beh: stimdomain dataframe
    groupby_nm: (optional) list of names to groupby
    onset_nm: (optional) what to use as onset timings - in same timeframe
    offset_nm: (optional) what to use as offset timings - in same timeframe
    timing_nm: (optional) time indicator in original dataframe"""

    # get dataframe of onset and offset timings only
    onset_df = df_beh.groupby(groupby_nm).first()[onset_nm].reset_index()
    offset_df = df_beh.groupby(groupby_nm).last()[offset_nm].reset_index()

    # predefine all new columns in our timedomain dataframe
    df['block'] = np.nan
    df['segment'] = np.nan
    df['segment_all'] = np.nan

    # loop over all index (combinations)
    for idx, row in onset_df.iterrows():

        # get start and endtime of groupby section
        cur_onset = onset_df[onset_nm].iloc[idx]
        cur_offset = offset_df[offset_nm].iloc[idx]

        # map to OG dataframe
        df.loc[(df[timing_mm] >= cur_onset) & (df[timing_mm] <= cur_offset), 'segment'] = onset_df['segment'].iloc[idx]

        # save per segment indicator
        df.loc[(df[timing_mm] >= cur_onset) & (df[timing_mm] <= cur_offset), 'segment_all'] = idx
    
    return(df)


def time_save_onoff(df, onoff_nm='onoff', indicator='stimulus'):
    """save onoff value (bool), based on indicator value"""

    # predefine
    df['onoff'] = 0
    # take wherever there is any stimulus - set to 1
    df.loc[(df['stimulus'] > 0), 'onoff'] = 1
    
    return(df)


# create a function to assign stimuli based on timing
def _assign_stimulus(timing, 
                     df_beh, 
                     cn_onset='timing_meg', 
                     cn_offset='timing_offset_meg',
                     cn_stimulus='stimulus'):
    """pandas apply function to get stimuli into the time domain
    input df_beh, cn_onset (optional columnname of onset time),
    cn_offset (optional columnname of offset time), cn_stimulus (optional columnname of stimulus)"""
    idx = np.searchsorted(df_beh[cn_onset], timing)
    if idx == 0 or timing >= df_beh[cn_offset].iloc[idx - 1]:
        return 0
    return df_beh[cn_stimulus].iloc[idx - 1]


We use these functions in two discrete steps: first to create a mapping of each stimulus from the stimulus dataframe into the timingbased dataframe. And second to map collumns of interest into the new dataframe

`map_stim_to_time()` Example:

"stim_df"
| 'stim_nr' | 'timing' | 'timing_offset' |
| --- | --- | --- |
| 1 | 1.24 | 1.248 |
| 2 | 1.25 | 1.258 |
| 3 | 1.26 | 1.268 |
| 4 | 1.27 | 1.278 |

"data_df" returned
| 'old_columns' | 'stim_nr' | 'timing' |
| --- | --- | --- | 
| ... | 0 | 1.238 |
| ... | 1 | 1.240 |
| ... | 1 | 1.242 |
| ... | 1 | 1.244 |
| ... | 1 | 1.246 |
| ... | 0 | 1.248 |
| ... | 2 | 1.250 |
| ... | 2 | 1.252 |
| ... | 2 | 1.254 |
| ... | 2 | 1.256 |
| ... | 2 | 1.258 |
| ... | 0 | 1.260 |
| ... | ... | ... |

In [None]:
# map stimulus indexing to timedomain (123 > 00011100222000333)
df = map_stim_to_time(df, df_beh, cn_stim='stimulus', cn_ts='TIMESTAMP', 
                     beh_cn_onset='timing_meg', beh_cn_offset='timing_offset_meg')

# get columns of interest to transfer and apply stim specific mapping
col_to_trans = ['surprisal_a', 'surprisal_b']
df = map_columns_to_time(df, df_beh, col_to_trans)

Additionally, we can do the same in different domains, so for example per segment, block, miniblock, etc.

In [None]:
# use the blocknumber to runnumber pairing in stimulus domain to map block to run in time domain
df = map_block_to_run(df, df_beh)

# map segment specific data onto current segment
col_to_trans = ['center_freq_a', 'center_freq_b', 'center_freq_a_oct', 'center_freq_b_oct', 'probability_a', 'probability_b']
df = map_columns_to_time(df, df_beh, col_to_trans, indicator_nm='segment_all')

### Step 1b. Impulse mapping
For now we mapped the full duration of onset and offset in the following manner: (123 > 00011100222000333). However, sometimes we just want to know the onset of a stimulus, or in other words; the impulse. We can map impulses using `time_transform_FIR()`

In [2]:
def time_transform_FIR(df, indicator, fir_columns,
                       prefix_fir = 'FIR_',
                       prefix_fir_offset = 'OFFSET_FIR_'):
    """within a dataframe, groupby indicator > loop over fir columns and make a boxplot a impulse (start + end)
    input: df: timedomain dataframe
           indicator: indicator for groupby (unqiue indicator that binds the fir_columns)
           fir_columns: the fir columns to loop over and add
           prefix_fir: (optional) prefix naming for new fir naming
           prefix_fir_offset: (optional) prefix naming for new fir ofset naming
    return: return adjusted dataframe
    """

    # create onset and ofset arrays for FIR modelling - per stimulus
    onset_idx  = df[df[indicator] > 0].groupby(indicator).apply(lambda x: x.index[0]).to_numpy()  # onset
    offset_idx = df[df[indicator] > 0].groupby(indicator).apply(lambda x: x.index[-1]).to_numpy() # offset

    # loop over columns of interest to re-insert into df as FIR
    for col in fir_columns:

        # predefine columns
        df[f'{prefix_fir}{col}'] = 0
        df[f'{prefix_fir_offset}{col}'] = 0

        # take first or last value from boxplot and make FIR
        df.loc[onset_idx, f'{prefix_fir}{col}'] = df.loc[onset_idx, col]
        df.loc[offset_idx, f'{prefix_fir_offset}{col}'] = df.loc[offset_idx, col]
        
    return(df)

and can use it as follows

In [None]:
# get columns we want to take first instanse impulse for 
fir_columns  =  ['surprisal_a', 'surprisal_b']

# add fir impulses to dataframe
df = time_transform_FIR(df, 'stimulus', fir_columns)

### Step 2. Generate event dataframes
MNE uses event dataframes (https://mne.tools/dev/auto_tutorials/raw/20_event_arrays.html), so yet another way of coding onsets and offsets. Events have a shape of [nr_events, 3], where the 0th column are the indexes (sample number in timedomain), and the 2th column are the values (magnitude). The middle column (1st column) is a value indicating what the event code was on the immediately preceding sample. In practice, that value is almost always 0, but it can be used to detect the endpoint of an event whose duration is longer than one sample..

In [3]:
def add_event_from_df(raw, df, event_nm):
    """within the raw MNE object, using the matching nonzero indexes in df to add mne events
    input: raw: mne raw object - must include 'sfreq', here we add stim to
            df: dataframe with our stimulus (impulse/block) information
            event_nm: the name of the column in the dataframe which is the event
    output: returns raw mne object with object"""

    # calulate index position of events and the value of those events
    idxs      = np.where(df[event_nm] > 0)
    value     = df[event_nm].to_numpy()[idxs]

    # predefine numpy array in correct shape for mne
    mne_arr = np.zeros((len(idxs[0]), 3))
    mne_arr[:,0] = idxs[0]
    mne_arr[:,2] = value

    # # create new stimulus channel - ONOFF
    if event_nm not in raw.ch_names:
        temp_info = mne.create_info([event_nm], raw.info['sfreq'], ['stim'])
        stim_raw = mne.io.RawArray(df[[event_nm]].to_numpy().T, temp_info)
        raw.add_channels([stim_raw], force_update_info=True)

    # add actuall events
    raw.add_events(mne_arr, stim_channel=event_nm, replace=True)
    return(raw)

We can simply add them by sellecting the column of intest. This approach allows us the retrieve events using the inbuild MNE function `mne.find_events(raw, stim_channel='FIR_onoff')`

In [None]:
event_columns = ['FIR_surprisal_a',
                'FIR_surprisal_b',
                'FIR_onoff',
                'onoff']

# add events for all 
for evnt in event_columns:
    raw = add_event_from_df(raw, df, evnt)

### Step 3. Doing testing within MNE
Testing in MNE is fairly straighforward if we want to do simple linear regressions.

In [None]:
import mne
from mne.stats import linear_regression

# Find events
events = mne.find_events(raw, stim_channel='FIR_onoff')

# Create epochs ('timelocked data' per 'event')
epochs = mne.Epochs(raw, events, tmin=-0.1, tmax=0.5, preload=True)

# Get the data for the 'fir_surprisal' channel
fir_surprisal_data = raw.copy().pick_channels(['FIR_onoff']).get_data()

# Perform linear regression - in this case for on-off
res = linear_regression(epochs, fir_surprisal_data[fir_surprisal_data > 0, np.newaxis], names=['FIR_onoff'])

# Access the beta values and related information
beta_values = res['FIR_onoff'].beta
t_values = res['FIR_onoff'].t_val
p_values = res['FIR_onoff'].p_val

# Access and plot the beta values directly
res['FIR_onoff'].beta.plot()

Note that you can even do this approach in MNE when you have some other kind of timecourse data instead of EEG or MEG data (for example pupil responses). We just create some dummy info and plug in our timecourse data. You may want to adjust this dummy information function to suite your need.

In [None]:
def create_dummy_info():
    # Step 1: Read the layout
    layout = mne.channels.read_layout('CTF275.lay')

    # Step 2: Create channel information
    # Assuming all channels are MEG channels
    ch_names = layout.names
    ch_types = ['mag'] * len(ch_names)  # Adjust if you have different types of channels

    info = mne.create_info(ch_names=ch_names, sfreq=60, ch_types=ch_types)
    info._unlocked = True
    info['highpass'] = 0.0
    info['lowpass'] = 600.0
    return(info)

# create dummy information
info = create_dummy_info()

# create evoked mne data from whatever data we want
evoked = mne.EvokedArray(data.T, 
                         info,                         #dummy info
                         tmin=MNE_evokeds[0][0].tmin,  #the t=0-point
                         baseline=(-0.02, 0.02))       #demean over 0

-----

note on timing: since the MEG system and the stimulus pc both CLOCK stimuli seperately the clocks have to be synced. We can do so as long as we have a known point in both modelities - which will give us a `sync_val` (a value denoting the difference between one and the other timing).

we can then simply: add this delay to get the timing in the other modality.

In [1]:
def sync_timing(df, sync_val, timingname='timing', new_timingname='timing_meg',
                              timingname_offset='timing_offset', new_timingname_offset='timing_offset_meg'):
    """use syncing value to get timings from stimpc domain into the MEG clock domain
    input df and sync value, returns adjusted dataframe"""

    # create new column in old dataframe
    df[new_timingname] = df[timingname] + sync_val
    df[new_timingname_offset] = df[timingname_offset] + sync_val
    # and return
    return(df)