# Speed-Accuracy Tradeoff in a Random Dot-Motion Experiment (using the EZ diffusion model)

## Import modules

In [1]:
import pandas as pd
import numpy as np

## Dataset context

Citation: Desender, K., Vermeylen, L., & Verguts, T. (2022). Dynamic influences on static measures of metacognition. Nature Communications. 
* csv: https://osf.io/4npq9
* experiment context: https://osf.io/vup59
* github repository: https://github.com/kdesende/dynamic_influences_on_static_measures/

Setup
* **data**: are from experiment 1; they were collected online due to COVID-19
* **stimulus**: random dot motion at a fixed coherence level (.2), leftwards vs rightwards motion
* **manipulations**: there were two conditions (refered to in the column Condition): participants were instructed to i) focus on speed ("Speed"), or ii) focus on "Accuracy" ("Accuracy") when making their primary decision
* **confidence scale**: six-point scale ("certainly wrong","probably wrong","guess wrong","guess correct","probably correct","certainly correct")
* **block size**: participants completed (at least) 3 practice blocks of 24 trials each, followed by 10 blocks of 60 trials
* **feedback**: no feedback was given during the actual task
* **training**: the task started with 24 practice trials (binary choices with feedback, coherence of .5), which was repeated until participants achieved >85% correct; next participants performed 24 practice trials (binary choices with feedback, coherence of .2), which was repeated until participants achieved >60% correct; finally, participants performed 24 practice trials (binary choices without feedback, but with confidence rating, coherence of .2), which was repeated until participants achieved >60% correct 

## Explore the data

In [2]:
df = pd.read_csv('data_Desender_2022_Exp1.csv')
df.head()

Unnamed: 0,Subj_idx,Stimulus,Response,Confidence,RT_dec,RT_conf,Coherence,Condition,Training
0,143,1,0,,,,0.5,speed,1
1,143,1,1,,2.611815,,0.5,speed,1
2,143,1,0,,3.106495,,0.5,speed,1
3,143,0,0,,1.05079,,0.5,speed,1
4,143,0,0,,1.07762,,0.5,speed,1


In [3]:
len(df)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30744 entries, 0 to 30743
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Subj_idx    30744 non-null  int64  
 1   Stimulus    30744 non-null  int64  
 2   Response    30744 non-null  int64  
 3   Confidence  26976 non-null  float64
 4   RT_dec      30645 non-null  float64
 5   RT_conf     26976 non-null  float64
 6   Coherence   30744 non-null  float64
 7   Condition   30744 non-null  object 
 8   Training    30744 non-null  int64  
dtypes: float64(4), int64(4), object(1)
memory usage: 2.1+ MB


In [4]:
print(df['Subj_idx'].unique())
print(df['Subj_idx'].nunique())

[143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126
 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108
 107 106 105 104 103 102 101]
43


## Data cleaning

In [5]:
df.duplicated().sum()
df = df.drop_duplicates(keep=False)
print(df[df.duplicated()])

Empty DataFrame
Columns: [Subj_idx, Stimulus, Response, Confidence, RT_dec, RT_conf, Coherence, Condition, Training]
Index: []


In [6]:
df = df.dropna()
df.isnull().sum()

Subj_idx      0
Stimulus      0
Response      0
Confidence    0
RT_dec        0
RT_conf       0
Coherence     0
Condition     0
Training      0
dtype: int64

In [7]:
df = df.drop(df[df['Training'] == 1].index) #drop training trials
df = df.drop(['RT_conf', 'Confidence', 'Training'], axis=1) #drop columns
df = df.drop(df[df['Coherence'] != .2].index) #drop trials where coherence levels aren't .2 (some training trials are .5)
df

Unnamed: 0,Subj_idx,Stimulus,Response,RT_dec,Coherence,Condition
72,143,1,1,0.628250,0.2,speed
73,143,0,0,0.772160,0.2,speed
74,143,1,1,0.847125,0.2,speed
75,143,1,0,0.719910,0.2,speed
76,143,1,1,0.759140,0.2,speed
...,...,...,...,...,...,...
30739,101,0,1,0.229660,0.2,accuracy
30740,101,0,1,0.255830,0.2,accuracy
30741,101,0,0,0.215480,0.2,accuracy
30742,101,1,0,0.010320,0.2,accuracy


## Hierarchical multiple decision makers (or conditions) for the EZ Diffusion Model

For 1 subject:
* trials should all be summed into 1 row
* correct measures (hits and correct rejections) should all be summed into 1 row
* meanRT is only for correct trials
* varRT is only for correct trials

### Split conditions into two separate data frames

In [8]:
# split conditions into 2 dataframes
df_speed = df[df['Condition']=='speed']
df_accuracy = df[df['Condition']=='accuracy']

print(len(df_speed), len(df_accuracy))
print((len(df_speed) + len(df_accuracy)) == len(df)) #make sure it matches

12576 13046
True


### Condense sujects so that 1 row = 1 subject

In [9]:
df_speed = df_speed.groupby(['Subj_idx'])
df_accuracy = df_accuracy.groupby(['Subj_idx'])

### Calculate measures for both groups

For each subject (which counts as 1 row):
* column 1: total number of trials
* column 2: correct number of trials
* column 3: mean RT for correct trials only
* column 4: variance RT for correct trials only

In [10]:
# calculate measures for speed condition
ntrials_speed = df_speed.size()
correct_speed = df_speed.apply(lambda x: (x['Stimulus'] == x['Response']).sum())
meanRT_speed = df_speed.apply(lambda x: x[(x['Stimulus'] == x['Response'])]['RT_dec'].mean())
varRT_speed = df_speed.apply(lambda x: x[(x['Stimulus'] == x['Response'])]['RT_dec'].var())

# print(df_speed.apply(lambda x: x['RT_dec'].mean()))

# calculate measures for accuracy condition
ntrials_accuracy = df_accuracy.size()
correct_accuracy = df_accuracy.apply(lambda x: (x['Stimulus'] == x['Response']).sum())
meanRT_accuracy = df_accuracy.apply(lambda x: x[((x['Stimulus']==1) & (x['Response']==1)) | ((x['Stimulus']==0) & (x['Response']==0))]['RT_dec'].mean())
varRT_accuracy = df_accuracy.apply(lambda x: x[((x['Stimulus']==1) & (x['Response']==1)) | ((x['Stimulus']==0) & (x['Response']==0))]['RT_dec'].var())

In [11]:
# proportion of correct/total
# plot distn of RTs in speed and accuracy conditions
# also take mean RT of speed condition and accuracy condition (instead of for individual subjects)

# compare measures overall using group accuracy and mean RT of whole group
overallcorrect_speed = correct_speed.sum() / ntrials_speed.sum()
print("proportion correct in speed group:", overallcorrect_speed)
speed_rts = df[df['Condition']=='speed']["RT_dec"].mean()
print("mean RT for all speed:", speed_rts)

overallcorrect_accuracy = correct_accuracy.sum() / ntrials_accuracy.sum()
print("proportion correct in accuracy group:", overallcorrect_accuracy)
accuracy_rts = df[df['Condition']=='accuracy']["RT_dec"].mean()
print("mean RT for all accuracy:", accuracy_rts)


proportion correct in speed group: 0.7154103053435115
mean RT for all speed: 0.912929205629773
proportion correct in accuracy group: 0.7274260309673463
mean RT for all accuracy: 0.8252389517857569


### Make hierarchical EZ diffusion dataframe

In [12]:
# create EZ diffusion model df for speed condition
df_speed = pd.DataFrame({
    'trials' : ntrials_speed,
    'correct' : correct_speed,
    'meanRT' : meanRT_speed,
    'varRT' : varRT_speed
}).reset_index()

# create EZ diffusion model df for accuracy condition
df_accuracy = pd.DataFrame({
    'trials' : ntrials_accuracy,
    'correct' : correct_accuracy,
    'meanRT' : meanRT_accuracy,
    'varRT' : varRT_accuracy
}).reset_index()

## Save files

In [13]:
# write data out to 2 separate csv files
filename = 'COGS107_Desender_speed.csv'
df_speed.to_csv(filename, index=False) 
print('Data saved successfully to', filename)

filename = 'COGS107_Desender_accuracy.csv'
df_accuracy.to_csv(filename, index=False) 
print('Data saved successfully to', filename)

Data saved successfully to COGS107_Desender_speed.csv
Data saved successfully to COGS107_Desender_accuracy.csv
