This is an annotated version of the original script as found on https://github.com/mckenziephagen/ABCD_Stop_Signal
Explcit comments will be made new code cells.

this takes in the raw E-Prime files, which have been concatonated together into one large file (raw_concat.csv), and does minimal processing 

In [1]:
import pandas as pd 
import numpy as np

import matplotlib.pyplot as plt

In [2]:
#this makes it so that the WHOLE dataframe prints, just fyi, this can bog down your code

pd.set_option('display.max_columns', 1000)  # or 1000
pd.set_option('display.max_rows', None)  # or 1000
pd.set_option('display.max_colwidth', 199)  # or 199

############################################################################################################################


The script begins assuming raw_concat has been loaded into memory. Replicating how the original authors constructed raw_concat from the fast track data was **Non-trivial**, and involved digging into deleted files from the authors github commit history, i.e., for downloading from fasttrack / determining what release they used / which subjects they tried to download: https://github.com/mckenziephagen/ABCD_Stop_Signal/blob/eaf3c8d500971ba8c458c59f743fd3040eeb1133/scripts/General/s3_keys.ipynb (Slight accidental public sharing on the last dataframe here). and

For creating a merged concat file: 
https://github.com/mckenziephagen/ABCD_Stop_Signal/blob/eaf3c8d500971ba8c458c59f743fd3040eeb1133/scripts/SST_manuscript/concat.ipynb (Note the concat is only being performed on a subset of 100 subjects, so we had to assume either they corrected the mistakes they made in loading these 100 subjects, e.g., missing all event files coded as csv, handled errors related to differently formatted files, dealt with two exact copies of the eventfile being saved with each subject - or we have to assume they missed some/all of these issues - which is likely the case, as we assume they do not, and end up with ~700 more subjects.

Of relevance is this section from their paper:

"Of these, 8,776 files were successfully downloaded, but a subset did not
include stop-signal data, leaving ​ 7,906 subjects. Of these, only 7,347 included ​ summary scores
from the Stop Signal Task in the ABCD Data Release 2.0. Finally, 26 subjects were removed who did not have two complete runs with 180 trials each, leaving us with a total of 7,321
complete datasets."

Which provided some aid in our efforts to replicate this process.

#### Our full effort in replicating these steps can be found in 'Match Data.ipynb'

In [3]:
# We make the assumption in this 'raw_concat' that they corrected all merging issues
# which while erroneous should not lead to many issues
raw_concat = pd.read_csv('merged_data/all_concat.csv', low_memory=False)

############################################################################################################################

############################################################################################################################
The below section of code was originally fully commented out at the time of the original bioarvix post https://github.com/mckenziephagen/ABCD_Stop_Signal/blob/e10d76332136b64cb6ed19a3da48f2bfd6121002/scripts/SST_manuscript/clean_SST.ipynb

In the current version most of it is uncommented, but still they comment out the computation of the stop trial mask, and importantly the 'StopTrial' column (which they use heavily in SST_problems, and compute nowhere else but here. In light of its later usage... we will leave first the original code for reference, but run a version fo the code with the internal comments taken out (as they are later used). Based on further evidence from the order of the cells run, the author did not re-run this step after performing un-commenting (likely because it takes a long time to run...)

As a general reccomendation, analysis code like this should strive to be 'easily rerfreshable', i.e., at certain points in the analysis one should just re-run everything if possible in order to make sure mistakes were not introduced (incredibly common in jupyter notebooks where cells can be run out of order). We understand why the author introduces non-linearities through intermediary save files, but suggest that 1st: potential optimizations be looked into (i.e., making it so the internal saved representations are not neccissary). In this case the most obvious optimization is that a merged dataframe with all subjects run information should not be used, almost all of these calculations are embarrasingly parrellel, which means they can easily be chunked and computed on a by subject basis - this would also likely reduce the complexity of the code in regards to simple operations.

2nd: That even if intermediary save files are used, it should be clear how to create that saved file. I.e., if the initial logic leading up to the creation of incomplete_subjects.csv is put into functions, then only the code which calls the functions, ideally nested within another function, should be run. In that way, there can be one cell, which either calls to function to compute and save incomplete_subjects.csv, or that one line commented out and replaced with code to load incomplete_subjects.csv.

**That said, with analysis like these, they are often done iteratively and in an exploratory fashion - which makes any lack of optimizations highly understandable!**

In [None]:
## Original ##

incomplete_runs_subj = [] 
SST_concat = pd.DataFrame()
incomplete_runs_df = pd.DataFrame()
raw_concat = raw_concat[np.logical_and(raw_concat.TrialCode != 'BeginFix', \
                                       raw_concat.TrialCode != 'EndFix')]

raw_concat = raw_concat[~raw_concat['TrialCode'].isnull()]

for i in raw_concat['NARGUID'].unique(): 
    sub_df = raw_concat.loc[raw_concat['NARGUID'] == i]
    #add TrialNum col from 1-360
    sub_df['TrialNum'] = np.arange(1, len(sub_df)+1)
#     stop_trial_mask = (sub_df['TrialCode'] == 'IncorrectStop') | \
#                       (sub_df['TrialCode'] == 'CorrectStop')
#     stop_trial_idx = stop_trial_mask[stop_trial_mask == True].index
#     sub_df['StopTrial'] = ""
#     sub_df['StopTrial'][stop_trial_idx] = np.arange(1, len(sub_df.loc[stop_trial_idx])+ 1)
    if len(~sub_df['TrialNum'].isnull()) == 360: 
        if 'StopTooEarly' not in sub_df['TrialCode'].unique(): 
            SST_concat = SST_concat.append(sub_df)
    else: 
        incomplete_runs_df = incomplete_runs_df.append(sub_df)
        incomplete_runs_subj.append(i)
    

incomplete_runs_df.to_csv('incomplete_subjects.csv')

In [None]:
## Original ##

#this saves as a csv so you can avoid running that loops multiple times 
#SST_concat.to_csv('reran_partially_cleaned.csv')

In [4]:
## modfied

incomplete_runs_subj = [] 
SST_concat = pd.DataFrame()
incomplete_runs_df = pd.DataFrame()
raw_concat = raw_concat[np.logical_and(raw_concat.TrialCode != 'BeginFix', \
                                       raw_concat.TrialCode != 'EndFix')]

raw_concat = raw_concat[~raw_concat['TrialCode'].isnull()]

for i in raw_concat['NARGUID'].unique(): 
    sub_df = raw_concat.loc[raw_concat['NARGUID'] == i]
    #add TrialNum col from 1-360
    sub_df['TrialNum'] = np.arange(1, len(sub_df)+1)
    stop_trial_mask = (sub_df['TrialCode'] == 'IncorrectStop') | \
                      (sub_df['TrialCode'] == 'CorrectStop')
    stop_trial_idx = stop_trial_mask[stop_trial_mask == True].index
    sub_df['StopTrial'] = ""
    sub_df['StopTrial'][stop_trial_idx] = np.arange(1, len(sub_df.loc[stop_trial_idx])+ 1)
    
    if len(~sub_df['TrialNum'].isnull()) == 360: 
        if 'StopTooEarly' not in sub_df['TrialCode'].unique(): 
            SST_concat = SST_concat.append(sub_df)
    else: 
        incomplete_runs_df = incomplete_runs_df.append(sub_df)
        incomplete_runs_subj.append(i)

incomplete_runs_df.to_csv('incomplete_subjects.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_labels(key, value)
A value is trying to be set 

In [5]:
## modfied

#this saves as a csv so you can avoid running that loops multiple times 
SST_concat.to_csv('merged_data/reran_partially_cleaned.csv')

While obnoxious, the SettingWithCopyWarning is a general indicator of bad practices. In this case it is inconsequential - but it arises from this line sub_df = raw_concat.loc[raw_concat['NARGUID'] == i],
what happens is that the author grabs a view of the original raw_concat dataframe for just one unique subject, and then modifies that view. All the error is saying is that these changes to value likely are not being made on the original raw_concat, which is not technically a problem as the author simply creates a new dataframe (albiet, in a very ineffecient manner - it is considered poor practice to call keep appending to a dataframe as one would a python list, as pandas append operation creates a new copy everytime see:

"Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once." - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Likewise, an even more effecient solution would operate with still every subjects data stored seperately as mentioned before.

While these concerns as mentioned in this cell do not directly effect the integrity of the results, they are  related to the ease of replicability of the code. I.e., if it takes someone a long time to run a chunk of code, and they have to back-track and de-cipher the origins of where saved csv's were made, and where raw_concat comes from, it increases the burden of replication and serves to unintentionally obscificate the entire analysis.
############################################################################################################################

In [8]:
#this reads in the file above, to avoid running that loop 
SST_concat = pd.read_csv('merged_data/reran_partially_cleaned.csv')

### Check "TrialCode" accuracy

In [9]:
#some subjects have their subtrial under Procedure[Trial]
SST_concat['Procedure[SubTrial]'].loc[SST_concat['Procedure[SubTrial]'].isnull()] = SST_concat['Procedure[Trial]'] 
#the trial type column has inconsistent notation for stop trials 
SST_concat['trial_type'] = SST_concat['Procedure[SubTrial]'].replace('VariableStopTrial.*', 'StopTrial', regex=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [10]:
#find all go trials 
go_trial_mask = SST_concat.loc[SST_concat['Procedure[SubTrial]'] == 'GoTrial']
go_trial_idx = go_trial_mask[go_trial_mask==True].index

In [11]:
#the response recordings are inconsistent in type (str, int, float) this fixes that 
cresp_replace = {'2.0': 2.0,
                 '1.0': 1.0,
                 '3.0': 3.0,
                 '4.0': 4.0,
                 '1,{LEFTARROW}': 1.0,
                 '2,{RIGHTARROW}': 2.0}

resp_replace = {'2.0': 2.0,
                '1.0': 1.0,
                '3.0': 3.0,
                '4.0': 4.0,
                '{LEFTARROW}': 1.0,
                '{RIGHTARROW}': 2.0}

SST_concat['Go.RESP'].replace(to_replace = resp_replace, inplace=True)
SST_concat['Go.CRESP'].replace(to_replace = cresp_replace, inplace=True)

In [12]:
# New / Added, cast to float - one reason I might be having to do these additional casts
# is I loaded the initial dataframe in w/ low_memory_mode set to False, where perhaps the original
# author did not
print(SST_concat['Go.RESP'].unique(), SST_concat['Go.CRESP'].unique())

SST_concat['Go.RESP'] = SST_concat['Go.RESP'].astype(float)
SST_concat['Go.CRESP'] = SST_concat['Go.CRESP'].astype(float)

print(SST_concat['Go.RESP'].unique(), SST_concat['Go.CRESP'].unique())

[4.0 3.0 nan 1.0 2.0 '1' '2'] [ 4.  3. nan  1.  2.]
[ 4.  3. nan  1.  2.] [ 4.  3. nan  1.  2.]


In [13]:
# New / added check stop responses for same error
print(SST_concat['Fix.RESP'].unique(), SST_concat['StopSignal.RESP'].unique(), SST_concat['SSD.RESP'].unique())

# And fix
SST_concat['Fix.RESP'].replace(to_replace=resp_replace, inplace=True)
SST_concat['Fix.RESP'] = SST_concat['Fix.RESP'].astype(float)

SST_concat['StopSignal.RESP'].replace(to_replace=resp_replace, inplace=True)
SST_concat['StopSignal.RESP'] = SST_concat['StopSignal.RESP'].astype(float)

SST_concat['SSD.RESP'].replace(to_replace=resp_replace, inplace=True)
SST_concat['SSD.RESP'] = SST_concat['SSD.RESP'].astype(float)

print(SST_concat['Fix.RESP'].unique(), SST_concat['StopSignal.RESP'].unique(), SST_concat['SSD.RESP'].unique())

[nan 4.0 3.0 2.0 1.0 '4.0' '3.0' '1.0' '2.0' '{RIGHTARROW}' '{LEFTARROW}'
 '2' '1'] [nan 4.0 3.0 1.0 2.0 '4.0' '3.0' '1.0' '2.0' '{RIGHTARROW}' '{LEFTARROW}'] [nan 3.0 1.0 2.0 4.0 '3.0' '4.0' '1.0' '2.0' '{LEFTARROW}' '{RIGHTARROW}']
[nan  4.  3.  2.  1.] [nan  4.  3.  1.  2.] [nan  3.  1.  2.  4.]


In [14]:
#create my own correct response column 
SST_concat['correct_go_response'] = np.NaN

SST_concat['correct_go_response'].loc[(~SST_concat['Go.RESP'].isnull()) & 
                                      (SST_concat['Go.CRESP'] == SST_concat['Go.RESP'])] = float(1)

SST_concat['correct_go_response'].loc[(SST_concat['Go.RESP'].isnull()) & 
                                      (SST_concat['Go.CRESP'] == SST_concat['Fix.RESP'])] = float(1)

SST_concat['correct_go_response'].loc[(~SST_concat['Go.RESP'].isnull()) & 
                                      (SST_concat['Go.CRESP'] != SST_concat['Go.RESP']) &
                                      (SST_concat['trial_type'] == 'GoTrial')] = float(0)


SST_concat['correct_go_response'].loc[(SST_concat['Go.RESP'].isnull()) & 
                                      (SST_concat['Go.CRESP'] != SST_concat['Fix.RESP']) &
                                      (SST_concat['trial_type'] == 'GoTrial')] = float(0)


SST_concat['correct_go_response'].loc[(SST_concat['Go.RESP'].isnull()) & (SST_concat['Fix.RESP'].isnull()) & 
                                      (SST_concat['trial_type'] == 'GoTrial')] = 'omission'

In [15]:
#check that I only have ones, zeros, and omissions
SST_concat.loc[SST_concat['trial_type'] == 'GoTrial']['correct_go_response'].unique() 

array([1.0, 0.0, 'omission'], dtype=object)

In [16]:
#correct stop column 
SST_concat['correct_stop'] = np.NaN

SST_concat['correct_stop'].loc[(SST_concat['StopSignal.RESP'].isnull()) & (SST_concat['Fix.RESP'].isnull()) & \
                               (SST_concat['trial_type'] == 'StopTrial') & (SST_concat['SSD.RESP'].isnull())] = float(1)

SST_concat['correct_stop'].loc[(~(SST_concat['StopSignal.RESP'].isnull()) | ~(SST_concat['Fix.RESP'].isnull()) \
                                | ~(SST_concat['SSD.RESP'].isnull())) & (SST_concat['trial_type'] == 'StopTrial')] = float(0)

In [17]:
SST_concat['correct_stimulus_mapping_1'] = np.NaN
SST_concat['correct_stimulus_mapping_2'] = np.NaN

In [18]:
SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp', 'correct_stimulus_mapping_1'] = SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp']['Go.CRESP'].dropna().unique()[0]
SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp', 'correct_stimulus_mapping_2'] = SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp']['Go.CRESP'].dropna().unique()[1]

SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp', 'correct_stimulus_mapping_1'] = SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp']['Go.CRESP'].dropna().unique()[0]
SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp', 'correct_stimulus_mapping_2'] = SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp']['Go.CRESP'].dropna().unique()[1]

In [19]:
# New / added
print(SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp']['Go.CRESP'].dropna().unique()[0])
print(SST_concat.loc[SST_concat['Stimulus'] == 'images/Right_Arrow.bmp']['Go.CRESP'].dropna().unique()[1])
print(SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp']['Go.CRESP'].dropna().unique()[0])
print(SST_concat.loc[SST_concat['Stimulus'] == 'images/Left_Arrow.bmp']['Go.CRESP'].dropna().unique()[1])

3.0
2.0
4.0
1.0


In [20]:
#correct stop choice response 
SST_concat['correct_stop_mapping'] = np.NaN 
SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (~SST_concat['SSD.RESP'].isnull()) &\
                                       ((SST_concat['SSD.RESP'] == SST_concat['correct_stimulus_mapping_1']) | (SST_concat['SSD.RESP'] == SST_concat['correct_stimulus_mapping_2']))] = float(1)

SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (~SST_concat['SSD.RESP'].isnull()) &\
                                      (SST_concat['SSD.RESP'] != SST_concat['correct_stimulus_mapping_1']) & (SST_concat['SSD.RESP'] != SST_concat['correct_stimulus_mapping_2'])] = float(0)

In [21]:
SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (SST_concat['SSD.RESP'].isnull()) & (~SST_concat['StopSignal.RESP'].isnull()) &\
                                       ((SST_concat['StopSignal.RESP'] == SST_concat['correct_stimulus_mapping_1']) | (SST_concat['StopSignal.RESP'] == SST_concat['correct_stimulus_mapping_2']))] = float(1)

SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (SST_concat['SSD.RESP'].isnull()) & (~SST_concat['StopSignal.RESP'].isnull()) &\
                                       ((SST_concat['StopSignal.RESP'] != SST_concat['correct_stimulus_mapping_1']) & (SST_concat['StopSignal.RESP'] != SST_concat['correct_stimulus_mapping_2']))] = float(0)

In [22]:
SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (SST_concat['SSD.RESP'].isnull()) & (SST_concat['StopSignal.RESP'].isnull()) & (~SST_concat['Fix.RESP'].isnull()) &\
                                       ((SST_concat['Fix.RESP'] == SST_concat['correct_stimulus_mapping_1']) | (SST_concat['Fix.RESP'] == SST_concat['correct_stimulus_mapping_2']))] = float(1)

SST_concat['correct_stop_mapping'].loc[(SST_concat['correct_stop'] == 0) & (SST_concat['SSD.RESP'].isnull()) & (SST_concat['StopSignal.RESP'].isnull()) & (~SST_concat['Fix.RESP'].isnull()) &\
                                       ((SST_concat['Fix.RESP'] != SST_concat['correct_stimulus_mapping_1']) & (SST_concat['Fix.RESP'] != SST_concat['correct_stimulus_mapping_2']))] = float(0)

### fix the go.rt calculation

since the trial unfolds over several different sub trials, it's necessary to add some of these together to get correct rt

In [23]:
#find the index of where there's a fixation response on go trials 
go_fix_resp = (~SST_concat['Fix.RESP'].isnull()) & \
              (SST_concat['Go.RESP'].isnull()) & \
              (SST_concat['trial_type'] == 'GoTrial')
go_fix_idx = go_fix_resp[go_fix_resp == True].index

In [24]:
#make column from Go.RT values
SST_concat['go_rt_adjusted'] = SST_concat['Go.RT'].copy()
#for long response trials, add the fix.rt to the go.duration
SST_concat['go_rt_adjusted'][go_fix_idx] = SST_concat.loc[go_fix_idx]['Go.Duration'] +  \
                                           SST_concat.loc[go_fix_idx]['Fix.RT']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


#### fix the stop sig rt calculation

In [25]:
#find all stop trials
stop_trial_mask = (SST_concat['trial_type'] == 'StopTrial')
#make an index of all stop trials 
stop_trial_idx = stop_trial_mask[stop_trial_mask == True].index

############################################################################################################################

A few problems with the below calculations

In [26]:
## Original

SST_concat['stop_rt_adjusted'] = SST_concat['StopSignal.RT']
#find all stop failure trials with resp during fix.resp, and no resp during stopsignal.resp
stop_fix_resp = (~SST_concat['Fix.RESP'].isnull()) & \
                (SST_concat['StopSignal.RESP'].isnull()) & \
                ((SST_concat['trial_type'] == 'StopTrial') & SST_concat['correct_stop'] == 0)

stop_fix_idx = stop_fix_resp[stop_fix_resp == True].index

stop_SSD_resp = (~SST_concat['SSD.RESP'].isnull()) & \
                (~(SST_concat['StopSignal.RESP'].isnull()) | ~(SST_concat['Fix.RESP'].isnull()) \
                    | ~(SST_concat['SSD.RESP'].isnull()))

In [27]:
## New / added

print('original trues', np.sum(stop_fix_resp))

last_line = (SST_concat['trial_type'] == 'StopTrial') & SST_concat['correct_stop'] == 0
print('last line trues', np.sum(last_line))

what_it_should_be = ((SST_concat['trial_type'] == 'StopTrial') & (SST_concat['correct_stop'] == 0))
print('fixed last line trues', np.sum(what_it_should_be))

true_stop_fix_resp = (~SST_concat['Fix.RESP'].isnull()) & \
                (SST_concat['StopSignal.RESP'].isnull()) & \
                ((SST_concat['trial_type'] == 'StopTrial') & (SST_concat['correct_stop'] == 0))
print('stop fix resp fixed trues', np.sum(true_stop_fix_resp))

original trues 246516
last line trues 2783795
fixed last line trues 248435
stop fix resp fixed trues 52569


The calculation of stop_fix_resp has a programming error, misuse of brackets, that does not actually apply the condition that the trial type be a stop trial or that correct_stop be == 0. By potentially coincidence problems seem to have been avoided, i.e., in the next calculation this mask is used to add StopSignal.Duration and Fix.RT, st. by all of the extra Trues in the mask, they also have a StopSignal.Duration of NaN, so NaN + a number == NaN in pandas.

In [28]:
## New / added

stop_SSD_resp = (~SST_concat['SSD.RESP'].isnull()) & \
                (~(SST_concat['StopSignal.RESP'].isnull()) | ~(SST_concat['Fix.RESP'].isnull()) \
                    | ~(SST_concat['SSD.RESP'].isnull()))

(stop_SSD_resp == ~SST_concat['SSD.RESP'].isnull()).all()

True

stop_SSD_resp can be logically reduced to if 'SSD.RESP' is not null, i.e.,
A and (B or C or A) == A, via formal logic.

Not sure what their actual intention was, so I am not sure how to fix it.
############################################################################################################################

In [29]:
#use that index to add stop signal duration to get correct stop signal RT on these trials
SST_concat['stop_rt_adjusted'][stop_fix_resp] = SST_concat.loc[stop_fix_resp]['StopSignal.Duration'] +\
                                                SST_concat.loc[stop_fix_resp]['Fix.RT']

SST_concat['stop_rt_adjusted'][stop_SSD_resp] = SST_concat.loc[stop_SSD_resp]['SSD.RT']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [30]:
## New / added

# Confirm bad calc of stop_fix mask doesnt change anything

SST_concat['alt_stop_rt'] = SST_concat['stop_rt_adjusted']

SST_concat['alt_stop_rt'][true_stop_fix_resp] = SST_concat.loc[true_stop_fix_resp]['StopSignal.Duration'] +\
                                                SST_concat.loc[true_stop_fix_resp]['Fix.RT']

np.sum(SST_concat['alt_stop_rt'][true_stop_fix_resp] != np.nan) == np.sum(~SST_concat['stop_rt_adjusted'][stop_fix_resp].isnull())

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


True

############################################################################################################################

A large conceptual error is made below. The stop_resp_mask is created w/ SSD.RESP included, and then SSDDur added.
Clearly the SSDDur should not be added to SSD responses, this would be the equivilent of adding the full Go duration to all Go response times...

A brief note on code style too, by defining the pandas column stop_rt_adjusted in a different cell, it means if this cell were run multiple times, the 'stop_rt_adjusted' would continue to increment higher and higher... not that I did this or anything (I did)

In [31]:
## Original

#find the stop failure trials and add the stop signal duration for correct stop fail RT
stop_resp_mask = ((~(SST_concat['StopSignal.RESP'].isnull()) | ~(SST_concat['Fix.RESP'].isnull()) \
                                | ~(SST_concat['SSD.RESP'].isnull())) & \
                (~SST_concat['stop_rt_adjusted'].isnull()))
        
stop_resp_idx = stop_resp_mask[stop_resp_mask == True].index

SST_concat['stop_rt_adjusted'][stop_resp_idx] = \
        SST_concat['stop_rt_adjusted'][stop_resp_idx] + SST_concat['SSDDur'][stop_resp_idx]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


In [32]:
true_stop_resp_mask = ((~(SST_concat['StopSignal.RESP'].isnull()) | ~(SST_concat['Fix.RESP'].isnull())) & \
                      (~SST_concat['stop_rt_adjusted'].isnull()))
true_stop_resp_idx = true_stop_resp_mask[true_stop_resp_mask == True].index

In [33]:
np.sum(~(SST_concat['SSD.RESP'].isnull()))

26106

20k+ is a lot of trials to have artificially inflated. We will note the impact on the original authors claims later on.

We will re-create a fixed stop_rt_adjusted below, to compare with later on

In [34]:
SST_concat['true_stop_rt_adjusted'] = SST_concat['StopSignal.RT']
SST_concat['true_stop_rt_adjusted'][true_stop_fix_resp] =\
    SST_concat.loc[true_stop_fix_resp]['StopSignal.Duration'] + SST_concat.loc[true_stop_fix_resp]['Fix.RT']
SST_concat['true_stop_rt_adjusted'][stop_SSD_resp] = SST_concat.loc[stop_SSD_resp]['SSD.RT']

SST_concat['true_stop_rt_adjusted'][true_stop_resp_idx] =\
    SST_concat['true_stop_rt_adjusted'][true_stop_resp_idx] + SST_concat['SSDDur'][true_stop_resp_idx]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [35]:
np.nanmean(SST_concat['true_stop_rt_adjusted']), np.nanmean(SST_concat['stop_rt_adjusted'])

(220.1271249950696, 235.56405553583403)

In [36]:
np.mean(SST_concat[SST_concat['correct_stop'] == 0]['true_stop_rt_adjusted']), np.mean(SST_concat[SST_concat['correct_stop'] == 0]['stop_rt_adjusted'])

(449.24474409805384, 480.7517781311007)

In [37]:
np.sum(~SST_concat['true_stop_rt_adjusted'].isnull()) == np.sum(~SST_concat['stop_rt_adjusted'].isnull())

True

############################################################################################################################

### create go stimuli duration

In [38]:
#create column for Go.Stim.Duration - when there's no go response, this should be 1000ms
SST_concat['go_stim_duration'] = SST_concat['Go.RT'].copy()
SST_concat['go_stim_duration'].loc[SST_concat['Go.RT'] == 0] = SST_concat['Go.Duration'].loc[SST_concat['Go.RT'] == 0]

SST_concat['go_stim_duration'].loc[~(SST_concat['SSD.RESP'].isnull())] = SST_concat.loc[~SST_concat['SSD.RESP'].isnull()]['SSD.RT']
SST_concat['go_stim_duration'].loc[(SST_concat['SSD.RT']  == 0)] = SST_concat.loc[SST_concat['SSD.RT'] == 0]['SSDDur']

In [39]:
#this saves it to csvs that are small enough to be viewed by excel
#SST_concat.iloc[:867600].to_csv('SST_cleaned_7231_1.csv')
#SST_concat.iloc[867600:1735560, :].to_csv('SST_cleaned_7231_2.csv')
#SST_concat.iloc[1735560:].to_csv('SST_cleaned_7231_3.csv')

In [40]:
SST_concat.to_csv('merged_data/SST_cleaned_7231_all_rows_all_columns.csv')