In [None]:
# default_exp visualization

# Review

> API details.

In [None]:
#hide
from nbdev.showdoc import *

In [1]:
%load_ext autoreload
%autoreload 2

In [14]:
#export
from literature_review.article import make_articles
import pandas as pd

In [34]:
articles = make_articles('../data/interim/article_dicts/')

## Pubmed search
(("approach-avoidance task") OR ("approach-avoidance tendency") OR ("approach-avoidance bias") OR ("approach-avoidance conflict") OR ("approach tendency") OR ("approach bias"))

> Yielded 541 Results

## Selecting AAT studies (based on abstract)
> 239 actually used an AAT (note that different versions, e.g. joystick AAT, mouse AAT, SRC, fMRI AAT, etc. are included).

In [35]:
df = pd.DataFrame([dict(a) for a in articles])
df = pd.concat([df, pd.json_normalize(df['annotations'])], axis = 1)


In [36]:
df['AAT'] = (df.AAT==True) & (df.manikin != True)
df['AAT_uncertain'] = (df.AAT_uncertain==True) & (df.manikin != True)

more than half of the studies did not use an AAT

In [37]:
df[df.reviewed==True].AAT.value_counts()

False    336
True     205
Name: AAT, dtype: int64

about a third of those were animal studies.

In [38]:
df.animal.value_counts()

False    446
True      95
Name: animal, dtype: int64

The articles that are in both uncertain categories should be checked.

about 10% were manikin SRC tasks.

In [39]:
df.manikin.value_counts()

True     44
False    25
Name: manikin, dtype: int64

## Selecting studies that report reliability (based on fulltext)

### Still uncertain

In [40]:
df[['AAT_uncertain','reliability_uncertain']].value_counts()

AAT_uncertain  reliability_uncertain
False          False                    479
True           False                     60
False          True                       2
dtype: int64

In [41]:
df[(df.AAT == True)]['reliability_uncertain'].value_counts()

False    204
True       1
Name: reliability_uncertain, dtype: int64

36 of these were manikin AATs.

### Split-half
Note, that these are articles that mention split-half reliability in their text.  Of these some did not actually report split-half reliability of AAT approach-avoidance tendencies.

In [42]:
df[(df.AAT == True)]['split-half'].value_counts()

False    179
True      26
Name: split-half, dtype: int64

### Retest
Note that Reinecke did not show up in the initial pubmed search (not listed in pubmed).  Brown (2014) also did not show (does not show even when searching approach-avoidance task).

In [43]:
df[(df.AAT == True)].retest.value_counts()

False    200
True       5
Name: retest, dtype: int64

In [44]:
df[(df.AAT == True) & (df.retest == True)][['path','note']]

Unnamed: 0,path,note
11,../data/interim/article_dicts/Kahveci_2020.json,rsb = .58 with 1000 bootstrapped splits
266,../data/interim/article_dicts/Reddy_2016.json,not sure if they did split half
291,../data/interim/article_dicts/Rinck_2018.json,The internal consistency of the scores was low...
352,../data/interim/article_dicts/Piercy_2021.json,
414,../data/interim/article_dicts/Peeters_2012.json,seems like this is a simple push minus pull di...


### Both

In [45]:
df[(df.AAT == True)][['split-half','retest']].value_counts()

split-half  retest
False       False     177
True        False      23
            True        3
False       True        2
dtype: int64

In [46]:
df[(df.AAT == True)]['split-half'].value_counts()

False    179
True      26
Name: split-half, dtype: int64

## Retest reliabilities
> Summary: Found two additional retest studies that both reported a retest reliabily of 0

### Kahveci 2020
> Reliability .23

Lastly, we estimated the reliability of the current AAT. For this purpose, we used the bootstrapped split-half reliability functionality available in the AATtools package (Kahveci, 2019) for R (R Core Team, 2019). First, we performed 1000 random splits on the data. In each split, we excluded outlying trials and errors as we described in the methods section. Unlike in the Methods section, however, we did not exclude entire participants’ sessions if they had excessive error/outlier rates; if we did, the halved size of the samples would lead to the exclusion of a high number of participants. Next, we computed an approach bias score for each half of each split by subtracting the mean reaching time difference of approach-object and avoid-object trials from the mean reaching time difference between approach-food and avoid-food trials. We excluded participants in each half that had an approach bias score deviating more than 3 SD from the sample mean. We then computed the correlation between the two resulting approach bias scores for each participant. After obtaining 1000 such correlations, we computed the mean correlation coefficient and applied a Spearman-Brown correction to account for the halved test length, thereby obtaining a split-half reliability value that is not biased due to the arbitrariness of a single split, or due to outliers that disproportionately affect the correlation.
<b>The split-half reliability of the full dataset (2 sessions per participant) was acceptable, (rSB = .58). When sessions were analyzed separately, reliability was lower (food-deprived rSB = .48, satiated rSB = .49). The test-retest reliability between food-deprived and satiated sessions was low (rretest = .23).</b>
Using the same methodology as described above, we additionally computed the bootstrapped split-half reliability for raw pull and push reaction times for foods and objects, as well as reliabilities for push-pull difference scores for foods and objects separately. The reliabilities for single conditions were close to 1 (pull food: rSB = .98; push food: rSB = .99; pull objects: rSB = .97; push objects: rSB = .99) and the reliabilities for push-pull difference scores were also very high (push food – pull food: rSB = .95; push object – pull object: rSB = .89).

### Reddy 2016
> No control stimui in analysis 

### Peeters 2012
> No control stimui in analysis 

### Rinck 2018
> Reliability close to 0 at 1-year follow-up

The internal consistency of these scores was determined by computing each participant’s AAT score for each picture and by computing Cronbach’s alpha for these values. The internal consistency of the scores was low but in the upper range of what is usually reported for RT tasks (Cronbach’s al- pha 􏱑 .58 for the pretest and .55 for the posttest). The retest reliability (computed for 143 patients in the no-training group only) was nonexistent, however (r 􏱑 .01, p 􏱑 .93). In addition, a change score (posttest minus pretest) was computed from these two scores. Negative values of this AAT change score indicate change in the intended direction, that is, a reduction of the alcohol ApB.

### Piercy 2021
> Reliability close to 0

Internal consistency of the IR-
AAT was calculated using the method reported by Kersbergen
et al. (2015). The internal consistency was low (Cronbach’s α =
0.35 for alcohol-related items and 0.34 for non-alcohol-related
items), and the test–retest reliability (calculated only for the
117 participants in the sham training control condition who
completed both baseline and post-test AAT assessments in the
larger RCT) was poor (r = 0.027, p = 0.774; see Manning et al.,
2021).

## ToDo:
ToDo: Mention that search was broad and there was not a lot, but now focus on more comparable tasks.; mention his pipeline

If Maria's method does not change outline and is easy to implement, I can add it otherwise not.

Add table for split-half

Perhaps go animal route in introduction. (can be short)
- then humans
- then exponential increase (use review paper)

Then broaden with discussion by introducing other tasks.

## Notes

In [108]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    display(df[df.note!=""][['path','note']])

Unnamed: 0,path,note
0,../data/interim/article_dicts/Bratti-van_der_W...,Only a study protocol.
1,../data/interim/article_dicts/Paige_2013.json,review
7,../data/interim/article_dicts/Schumacher_2018....,animal study
12,../data/interim/article_dicts/Wiesenfeller_202...,borderline; faces
13,../data/interim/article_dicts/Taylor_2012.json,"anxiety, mentiones that rinkck and becker are ..."
14,../data/interim/article_dicts/Wittekind_2015.json,reports retest reliability of other tasks but ...
15,../data/interim/article_dicts/CHAMPION_1961.json,"1961 study, probably no AAT"
16,../data/interim/article_dicts/Weidacker_2018.json,pedophilia
18,../data/interim/article_dicts/Effting_2016.json,mentions low reliability in discussion; intere...
19,../data/interim/article_dicts/WALTERS_1963.json,old study
