Exploring the task emotional regulation file.

From Jessica's thesis:

> For the emotion regulation
> task, the model included events for viewing the attend instructions, viewing
> the suppress instructions, viewing the three different image types (attend
> neutral, attend negative, and suppress negative), viewing the rating screen,
> and a de-meaned parametric regressor weighted by the participant's response.
> All events began at stimulus onset and lasted either the duration of the
> stimulus (1 second for instructions, 5 seconds for images) or the
> participant's RT (viewing the rating screen and the parametric regressor of
> the participant's rating).  Missed rating trials were included in a nuisance
> regressor with a duration of 3 seconds. For the image regressors, only
> images rated as 1-3 (all neutral images) or 5-7 (all negative images) were
> included. Three participants had less than two instances of at least one of
> the events and were therefore excluded from the analysis.

See also page 70 of Jessica's thesis where she describes the ER task.  The detail of the procedure starts on page 72.  She says that the participants rated "how negative they felt after either suppressing or attending to the image on a 4-point likert scale from 1 (very slightly or not at all negative) to 4 (extremely negative)".

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_table('ds000009_R2.0.3/sub-03/func/sub-03_task-emotionalregulation_run-02_events.tsv')
df

Unnamed: 0,onset,duration,trial_type,image_type,image_num,response,reaction_time,trial_num,onset_orig,duration_orig,trial_type_orig,rating_par_orig
0,1.006116,5,attend,negative,66.0,,,1,1.0061,5,att_neg,
1,8.009531,3,rate,,,121.0,0.444869896000455,1,8.0095,3,rating_par,0.32
2,13.004863,5,attend,neutral,99.0,,,2,13.0049,5,att_neut,
3,19.012506,3,rate,,,98.0,0.832420872999137,2,19.0125,3,rating_par,-0.68
4,29.003229,5,attend,negative,72.0,,,3,29.0032,5,att_neg,
5,37.002353,3,rate,,,103.0,0.647769164001147,3,37.0024,3,rating_par,1.32
6,43.009871,5,suppress,negative,7.0,,,4,43.0099,5,suppr_neg,
7,49.001151,3,rate,,,121.0,0.633277414999611,4,49.0012,3,rating_par,0.32
8,53.017403,5,suppress,negative,17.0,,,5,53.0174,5,suppr_neg,
9,60.004128,3,rate,,,121.0,0.328617319999466,5,60.0041,3,rating_par,0.32


In the old onsets (from a previous version of the ds009 dataset), `cond001` is attend negative:

In [3]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond001.txt

1.0061	5	1
29.0032	5	1
99.0042	5	1
111.0030	5	1
151.0155	5	1
183.0122	5	1
248.0014	5	1
309.0075	5	1
353.0030	5	1
367.0097	5	1


`cond002` is attend neutral:

In [4]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond002.txt

13.0049	5	1
89.0136	5	1
141.0081	5	1
170.0177	5	1
194.0151	5	1
205.0016	5	1
233.0153	5	1
272.0154	5	1
285.0100	5	1
380.0043	5	1


`cond003` is the rate event (without parametric scaling).  The durations are all 3 (they do not use the reaction time).  In fact this is so for most but not all subjects.  Subjects 13 through 17 do use the RT.

In [5]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond003.txt

8.0095	3	1
19.0125	3	1
37.0024	3	1
49.0012	3	1
60.0041	3	1
71.0070	3	1
84.0017	3	1
95.0047	3	1
105.0119	3	1
119.0022	3	1
134.0047	3	1
147.0158	3	1
159.0147	3	1
177.0045	3	1
190.0156	3	1
223.0079	3	1
241.0145	3	1
254.0090	3	1
266.0078	3	1
280.0147	3	1
316.0109	3	1
326.0016	3	1
347.0118	3	1
362.0144	3	1
386.0120	3	1


`cond004` is the rate regressor (with parametric scaling of amplitude).

In [6]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond004.txt

8.0095	3	0.3200
19.0125	3	-0.6800
37.0024	3	1.3200
49.0012	3	0.3200
60.0041	3	0.3200
71.0070	3	1.3200
84.0017	3	-0.6800
95.0047	3	-0.6800
105.0119	3	0.3200
119.0022	3	0.3200
134.0047	3	0.3200
147.0158	3	-0.6800
159.0147	3	0.3200
177.0045	3	-0.6800
190.0156	3	-0.6800
223.0079	3	0.3200
241.0145	3	-0.6800
254.0090	3	1.3200
266.0078	3	0.3200
280.0147	3	-0.6800
316.0109	3	0.3200
326.0016	3	-0.6800
347.0118	3	-0.6800
362.0144	3	0.3200
386.0120	3	-0.6800


`cond05` is suppress negative.

In [7]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond005.txt

43.0099	5	1
53.0174	5	1
65.0160	5	1
77.0148	5	1
127.0013	5	1
217.0170	5	1
259.0043	5	1
297.0088	5	1
320.0105	5	1
339.0127	5	1


`cond006` may be the missed ratings, with `trial_type_orig` field value of `junk_rating`, and rating 0.

In [8]:
cat old_onsets/sub003/model/model001/onsets/task003_run002/cond006.txt

201.0020	3	1
212.0051	3	1
292.0134	3	1
305.0079	3	1
375.0090	3	1


Differences between .tsv file, old onsets and Jessica's description:

* no "instructions" events in .tsv file or in old onsets;
* rating events have standard 3 second duration, not the reaction time;

In the next cell, I confirm that there are no other event types in the current `.tsv` file.  I'm using `trial_type_orig` - `trial_type` has fewer options.

In [9]:
# All trial types noted in "trial_type_orig"
set(df['trial_type_orig'])

{'att_neg', 'att_neut', 'junk_rating', 'rating_par', 'suppr_neg'}

Let's collect the trial types and responses from the dataframes of all the .tsv files.  We also look at the association of the `response` variable value with the `rating_par_orig`.

In [10]:
from glob import glob
from os.path import join as pjoin

all_trial_types = set()
all_responses = set()
all_response_pairs = set()
par_vals = []

for tsv_path in glob(pjoin('ds000009_R2.0.3',
                           'sub-*',
                           'func',
                           'sub*emotionalregulation*.tsv')):
    this_df = pd.read_table(tsv_path)
    # Add any unknown trial types from this tsv
    all_trial_types = all_trial_types.union(this_df['trial_type_orig'])
    # Now analyze the rating trials
    is_rate = this_df['trial_type'] == 'rate'
    rate_trials = this_df[is_rate]
    responses = rate_trials['response']
    # Get old parametric regressor, recoded from 1 - 4.
    old_par = pd.to_numeric(this_df['rating_par_orig'], 'coerce')
    old_par = old_par - np.nanmin(old_par) + 1
    all_responses = all_responses.union(responses)
    response_pairs = zip(responses, old_par[is_rate])
    all_response_pairs = all_response_pairs.union(response_pairs)
    # Look at response for different trial types
    last_valence = this_df['image_type'].copy()
    last_valence[this_df['trial_type'] == 'suppress'] = 'suppress'
    last_valence[1:] = last_valence[:-1]
    # Store for later analysis (see below)
    par_vals.append(pd.concat([old_par, last_valence], axis=1).dropna())

There are no trial types for the instructions for any subject:

In [11]:
all_trial_types

{'att_neg', 'att_neut', 'junk_rating', 'rating_par', 'suppr_neg'}

There are 4 responses occurring across all the .tsv files (and 0):

In [12]:
all_responses

{'0', '103', '114', '121', '98'}

The pairings of the recoded `rating_par_orig` and response numbers are:

In [13]:
all_response_pairs

{('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('0', nan),
 ('103', 3.0),
 ('114', 4.0),
 ('121', 2.0),
 ('98', 1.0)}

Notice that the button codes are not in order - although 98 maps to 1, 103 maps to 3 and 121 maps to 2.

It looks as if 4 is the most negative, 1 is the least negative:

In [14]:
all_par_vals = pd.concat(par_vals, axis=0, ignore_index=True)
all_par_vals.groupby('image_type').mean()

Unnamed: 0_level_0,rating_par_orig
image_type,Unnamed: 1_level_1
negative,2.469345
neutral,1.042373
suppress,1.687234


Now let's try recreating the old condition files:

In [15]:
ER_RESPONSE_MAP = {'114': 4, '103': 3, '121': 2, '98': 1, '0': 0,
                   'n/a': np.nan}

def er_preprocessor(df):
    """ Process dataframe for ER trial types """
    onset, duration, trial_type, image_type, response, rt = [
        df[name].copy() for name in
        ['onset', 'duration', 'trial_type', 'image_type',
         'response', 'reaction_time']]
    # Recode the response values using the map above.
    response = response.map(ER_RESPONSE_MAP)
    tt = trial_type.copy()  # A pandas series
    tt[(trial_type == 'attend') & (image_type == 'negative')] = 'attendneg'
    tt[(trial_type == 'attend') & (image_type == 'neutral')] = 'attendneu'
    tt[(trial_type == 'suppress') & (image_type == 'negative')] = 'suppressneg'
    assert not any((trial_type == 'suppress') & (image_type == 'neutral'))
    tt[(trial_type == "rate") & (response == 0)] = 'ratemiss'
    tt[(trial_type == "rate") & (response != 0)] = 'rate'
    # Use RTs as durations for rate
    good_rates = tt == 'rate'
    duration[good_rates] = pd.to_numeric(rt[good_rates])
    # Make main set of events (excluding parametric regressor)
    amplitude = pd.Series(np.ones(len(df)), name='amplitude')
    main_trials = pd.concat([tt, onset, duration, amplitude], axis=1)
    # Add parametric trial type
    good_onsets = onset[good_rates]
    good_durations = duration[good_rates]
    good_responses = response[good_rates]
    amp_extra = good_responses - np.mean(good_responses)
    amp_extra.name = 'amplitude'
    tt_extra = tt[good_rates]
    tt_extra[:] = 'ratepar'
    # Put the new trials at the end
    extra = pd.concat([tt_extra, good_onsets, good_durations, amp_extra], axis=1)
    return pd.concat([main_trials, extra], axis=0, ignore_index=True)

In [16]:
events = er_preprocessor(df)
events

Unnamed: 0,trial_type,onset,duration,amplitude
0,attendneg,1.006116,5.000000,1.00
1,rate,8.009531,0.444870,1.00
2,attendneu,13.004863,5.000000,1.00
3,rate,19.012506,0.832421,1.00
4,attendneg,29.003229,5.000000,1.00
5,rate,37.002353,0.647769,1.00
6,suppressneg,43.009871,5.000000,1.00
7,rate,49.001151,0.633277,1.00
8,suppressneg,53.017403,5.000000,1.00
9,rate,60.004128,0.328617,1.00


Compare the `rating_par_orig` to the recalculated rating.

In [17]:
good_rates = (df['trial_type'] == "rate") & (df['response'] != '0')
orig = df['rating_par_orig'][good_rates].astype(float)
orig

1     0.32
3    -0.68
5     1.32
7     0.32
9     0.32
11    1.32
13   -0.68
15   -0.68
17    0.32
19    0.32
21    0.32
23   -0.68
25    0.32
27   -0.68
29   -0.68
35    0.32
37   -0.68
39    1.32
41    0.32
43   -0.68
49    0.32
51   -0.68
53   -0.68
55    0.32
59   -0.68
Name: rating_par_orig, dtype: float64

In [18]:
new_par = events[events['trial_type'] == 'ratepar']['amplitude']
new_par

60    0.32
61   -0.68
62    1.32
63    0.32
64    0.32
65    1.32
66   -0.68
67   -0.68
68    0.32
69    0.32
70    0.32
71   -0.68
72    0.32
73   -0.68
74   -0.68
75    0.32
76   -0.68
77    1.32
78    0.32
79   -0.68
80    0.32
81   -0.68
82   -0.68
83    0.32
84   -0.68
Name: amplitude, dtype: float64

The new values are (almost) the same as the old.

In [19]:
np.allclose(orig, new_par)

True