forked from ulyngs/oxforddown
-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-02-behavioural-experiments.Rmd
2546 lines (2085 loc) · 135 KB
/
02-02-behavioural-experiments.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
bookdown::pdf_document2:
template: templates/brief_template.tex
bookdown::word_document2: default
bookdown::html_document2: default
documentclass: book
---
```{r setup 02-02, echo = F, include = F}
source('scripts_and_filters/general_setup.R')
library(BayesFactor)
library(ez)
library(magrittr) # for %$% exploding pipe operator
library(patchwork)
library(lmerTest)
```
# Behavioural responses to advice contexts {#chapter-context-benevolence}
\adjustmtc
I suggest that the long-standing view that egocentric discounting reflects sub-optimal information processing is a consequence of taking a narrow view of the problem being solved.
While it is indeed demonstrable that people in Judge-Advisor System experiments would perform better and earn more reward if they took more advice [@sollStrategiesRevisingJudgment2009], the people who participate in those experiments also have to function in the real world where being too trusting can produce very negative outcomes.
Furthermore, advice-taking and decision-making are, like the behaviour of other organisms, ultimately mechanisms for the more efficient propagation of genes.
While participants in a Judge-Advisor System are working out how to best weigh their own experience with another opinion, the challenge faced by the genes in their cells is to somehow navigate a complex, often-unreliable, and frequently-changing world in order to produce more copies of themselves.
I suggest that egocentric discounting may result from genetic and cultural evolution favouring the default assumption that an individual's own information is a more reliable basis for decision-making for that individual than another individual's information.
The evolutionary models offered a proof-of-concept illustration that hyper-priors may evolve under a variety of contextual features that are almost always at least partially true of advice-taking situations.
Hyper-priors are expectations that are not changed as a function of experience within the scope of a given scenario, and help form the context within which a scenario occurs, similar to a frame of reference in physics.
In advice-taking contexts, a judge's view of their advisor's benevolence may fluctuate over the course of a few back-to-back exchanges, whereas their view of the general benevolence of people as a whole is unlikely to change in a meaningful way in that time.
The latter, therefore, is a hyper-prior because it contributes to the advice-taking behaviour without being altered by the situation.
Hyper-priors are so termed to distinguish them from priors: expectations that are updated following evidence.
In a chaotic world dominated by complex phenomena emerging from the interaction of agents with sophisticated mental processes, sometimes the best genetic strategy is to hedge your bets by building a phenotype that can respond flexibly to different contexts.
It may well be, therefore, that people not only have hyper-priors concerning the likely value of advice, but also priors that can respond to different contexts and to changes in context.
In this chapter, we explore the flexibility of egocentric discounting in the contexts presented in the previous chapter§\@ref(chapter-advice-taking-context).
We present three experiments that examine how advice-taking changes according to the benevolence and the identifiability of advisors.
We also include a brief discussion of the literature on advisor expertise.
## Benevolence of advisors
The evolutionary models discussed in the previous chapter§\@ref(chapter-advice-taking-context) demonstrated that optimal advice-taking strategies depend in part upon the advice one receives being a genuine effort to help.
Difference in benevolence, or the extent to which the interests of the advisor and the judge overlap, is one of the three pillars of the @mayerIntegrativeModelOrganizational1995 model of advice-taking.
Despite this, relatively little investigation has been made into the role of benevolence in trust, as discussed previously§\@ref(context-familiarity).
Advice-taking can be contingent on the properties of the advice, or on the properties of the advisor.
In order to maximise the value of advice while minimising the potential exposure to exploitation, advice-taking should be contingent on a combination of these factors.
Where advice is plausible it should be weighted relatively equally, whether it comes from an advisor who is sometimes misleading or not, but where advice is more implausible it should only be trusted when it comes from an advisor who is highly unlikely to be misleading.
To explore whether people's behaviour matches this pattern, participants were recruited for a series of behavioural experiments in which they were given advice on a date estimation task from advisors who were described as either always helpful or occasionally misleading.
We expected that advisor influence would be higher for advice that participants rated as 'honest' versus advice rated as 'deceptive'.
Likewise, we expected that influence would be higher for advisors who were described as 'always honest', even for advice rated as 'honest'.
In other words, we expect that participants' advice-taking depends upon both the plausibility of the advice and the benevolence of the advisor.
Early versions of the experiments we conducted used a minimal groups paradigm [@rabbieArousalIngroupoutgroupBias1969; @pinterComparisonMinimalGroup2011] in an attempt to induce an in-group/out-group distinction in participants.
We were not able to get this manipulation to produce difference perceptions of the advisors, as measured by participants' questionnaire responses, and so we resorted to directly cuing participants about the benevolence of advisors.
The experiments presented below, Experiments 5§\@ref(be-dba) and 6§\@ref(be-dbc), are the result of previous experiments exploring how we could represent the manipulation in a way that participants paid attention to and remembered.
### Experiment 5: benevolence of advisors {#be-dba}
```{r be-dba-load, include = F}
select_experiment(
project = 'datequiz',
function(x) filter(x, study == 'directBenevolence', version == 'v3-0-1')
)
## Version of which the analysed version is a replication
.prevStudy <- new.env()
select_experiment(
'datequiz',
function(x) filter(x, study == 'directBenevolence', version == 'v3-0-0'),
envir = .prevStudy
)
for (E in c(.GlobalEnv, .prevStudy)) {
E$AdvisedTrial <- annotate_responses(E$AdvisedTrial)
E$Trial <- annotate_responses(E$Trial)
}
```
In this experiment, participants were cued as to the benevolence of their advisors.
The advice on each trial came from one of two advisors the participants became familiar with over the course of the experiment.
Participants were asked to rate the advice prior to submitting their final decisions.
We expected that participants would rate advice from an advisor who was more benevolent as more honest, and that they would weigh that advisor's advice more heavily, even where the advice itself was rated the same.
##### Open scholarship practices
This experiment was preregistered at `r unique(datequiz$preregistration)`.
This is a replication of a study of identical design that produced the same results.
The data for both this and the original study can be obtained from the `esmData` R package [@jaquieryOxacclabEsmDataThesis2021].
A snapshot of the state of the code for running the experiment at the time the experiment was run can be obtained from https://github.com/oxacclab/ExploringSocialMetacognition/blob/b4289fea196f71ccf0ba0b2ae8fde12139a16301/ACv2/db.html.
There were two deviations from the preregistered analysis in this experiment.
Several participants used translation software to translate the experiment website.
We could not guarantee that the questions were accurately translated and so these participants were excluded.
Some participants never rated the Sometimes deceptive advisor's advice as 'Honest', and so they were excluded in the t-test comparing advisors.
#### Method {#be-dba-m}
```{r be-dba-m-structure}
tmp <- bind_rows(
mutate(practiceTrial, type = "practice", block = 0),
mutate(practiceAdvisedTrial, type = "advice_practice", block = 1),
mutate(Trial, type = "atn_check") %>% filter(!is.na(block)),
mutate(AdvisedTrial, type = "core") %>% filter(!is.na(block))
) %>%
group_by(pid, type, block) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(type, block) %>%
summarise(across(-pid, mean), .groups = "drop") %>%
arrange(block)
```
##### Procedure {#be-dba-m-procedure}
`r length(unique(AdvisedTrial$pid))` participants each completed `r round(sum(ceiling(tmp$n)))` trials over `r max(tmp$block)` blocks of the continuous version of the Dates task§\@ref(m-p-dates-c).
Participants' markers covered `r AdvisedTrial %>% filter(!is.na(responseMarkerWidth)) %>% pull(responseMarkerWidth) %>% unique()` years, meaning that it would cover an entire decade inclusively, e.g. 1965-1975.
When participants received advice, but before they submitted their final decision, they rated the honesty of the advice on a three-point scale (Figure \@ref(fig:be-dba-m-performance)).
```{r be-dba-m-performance, fig.align='center', fig.caption="Advice honesty rating. Participants rated advice on a three-point scale according to whether they thought the advice was deceptive or honest.", out.width="80%", echo=F}
knitr::include_graphics("figures/advice-honesty-rating.png")
```
Participants started with `r sum(tmp$type == "practice")` block of `r tmp %>% filter(type == "practice") %>% pull(n) %>% ceiling()` trials that contained no advice to allow them to familiarise themselves with the task.
All trials in this section included feedback for all participants indicating whether or not the participant's response was correct.
Participants then did `r tmp %>% filter(type == "advice_practice") %>% pull(n) %>% round()` trials with a practice advisor to get used to receiving advice.
They also received feedback on these trials.
They were informed that they would "get advice on the answers you give" and that the feedback they received would "tell you about how well the advisor does, as well as how well you do".
Before starting the main experiment they were told that they would receive advice from multiple advisors and that "advisors might behave in different ways, and it's up to you to decide how useful you think each advisor is, and to use their advice accordingly".
Participants then performed `r sum(tmp$type == "core")` blocks of trials that constituted the main experiment.
In each of these blocks participants had a single advisor for `r tmp %>% filter(block == 2, type == "core") %>% pull(n) %>% round()` trials, plus `r tmp %>% filter(block == 2, type == "atn_check") %>% pull(n) %>% round()` attention check.
No feedback was given on answers in the main experiment blocks.
The two advisors were identical in how they generated advice, but they were labelled differently.
The advisors had different coloured backgrounds (e.g., purple and green), and participants were told that the advisor whose background colour matched the participant's colour "will give you the **best advice that they can** [original emphasis]", while the advisor who did not match the participant's colour "might sometimes try to direct you **away from the correct answer** [original emphasis]".
The advisor whose background matched the participant's colour was labelled as being in 'group one', while the other advisor was labelled as being in 'group two'.
This visual presentation arose from earlier experiments that implemented (unsuccessfully) a minimal groups paradigm.
Colours and the order in which the advisors were encountered were counterbalanced.
##### Advice profiles
Despite differences in labelling, the advisors were identical in terms of how they actually produced advice.
The advisors offered advice by placing an `r AdvisedTrial %>% filter(!is.na(advisor0confidence)) %>% pull(advisor0confidence) %>% unique()`-year wide marker on the timeline.
The marker was placed with its centre on a point sampled from a normal distribution around the correct answer with a standard deviation of `r AdvisedTrial %>% filter(!is.na(advisor0confidenceVariation)) %>% pull(advisor0confidenceVariation) %>% unique()` years in the manner described earlier§\@ref(m-p-dates-c).
#### Results
###### Exclusions
```{r be-dba-r-exclusions}
nMaxOutliers <- 1
maxTrialRT <- 60000
qqLabelWhitelist = c( ## Advice questionnaire responses must be one of:
'Deceptive',
'Possibly Deceptive',
'Honest',
NA
)
minChangeRate = .1 ## some advice taken on 10%+ of trials
for (E in c(.GlobalEnv, .prevStudy)) {
E$exclusions <- E$AdvisedTrial %>%
nest(d = -pid) %>%
mutate(
atn_check = map_lgl(
pid,
~ !all(E$Trial$responseCorrect[E$Trial$pid == .]) |
any(E$Trial$responseMarkerWidth[E$Trial$pid == .] != 11)
),
`Attention check` = if_else(is.na(atn_check), T, atn_check),
`Multiple attempts` = map_lgl(
pid,
~ {
h <- unique(E$AdvisedTrial$pidHash[E$AdvisedTrial$pid == .])
p <- unique(E$AdvisedTrial$pid[E$AdvisedTrial$pidHash == h])
length(p) > 1
}
),
`Missing advice rating` = map_lgl(
d,
~ any(is.na(.$advisor0questionnaireHonestyLabel))
),
`Odd advice rating labels` = map_lgl(
d,
~ !all(.$advisor0questionnaireHonestyLabel %in% qqLabelWhitelist)
),
`Not enough changes` = map_lgl(
d,
~ {
diff <- mutate(
.,
diff = responseEstimateLeft != responseEstimateLeftFinal,
diff = if_else(is.na(diff), F, diff)
) %>%
pull(diff) %>%
mean()
diff < minChangeRate
}
)
) %>%
select(-atn_check, -d)
do_exclusions(E$exclusions, envir = E)
E$nMaxTrials <- E$AdvisedTrial %>%
group_by(pid) %>%
summarise(n = n(), .groups = 'drop') %>%
pull(n) %>%
max()
E$AdvisedTrial <- E$AdvisedTrial %>% filter(timeEnd < maxTrialRT)
E$exclusions <- E$exclusions %>%
mutate(
`Too many outlying trials` =
map_lgl(pid, ~ E$AdvisedTrial %>%
filter(pid == .x) %>%
summarise(n = n(), .groups = 'drop') %>%
pull(n) < (E$nMaxTrials - nMaxOutliers)),
# if participants were already dropped don't mark them here
`Too many outlying trials` = if_else(!(pid %in% E$AdvisedTrial$pid),
F, `Too many outlying trials`)
)
do_exclusions(E$exclusions, envir = E, backup = F)
}
```
Participants (total *n* = `r nrow(exclusions)`) could be excluded for a number of reasons: failing attention checks, having fewer than `r nMaxTrials - nMaxOutliers` trials which took less than `r maxTrialRT / 1000`s to complete, providing final decisions which were the same as the initial estimate on more than `r round((1 - minChangeRate) * 100)`% of trials, or using non-English labels for the honesty questionnaire.
The latter exclusion was added after data were collected because it was not anticipated that participants would use translation software in the task.
The numbers of participants who failed the various checks are detailed in Table \@ref(tab:be-dba-r-exclusion).
The final participant list consists of `r length(unique(AdvisedTrial$pid))` participants who completed an average of `r num2str(mean(aggregate(advisor0 ~ pid, AdvisedTrial, length)$advisor0))` trials each.
```{r be-dba-r-exclusion}
exclusions$`Total excluded` <- exclusions %>% select(-pid) %>% apply(1, any)
n <- ncol(exclusions)
exclusions %>%
summarise(across(where(is.logical), sum)) %>%
mutate(`Total remaining` = length(unique(AdvisedTrial$pid))) %>%
pivot_longer(everything(), names_to = "Reason", values_to = "Participants excluded") %>%
kable(caption = "Participant exclusions for Experiment 5") %>%
row_spec((n - 1):n, bold = T)
```
###### Task performance
```{r be-dba-r-split}
for (E in c(.GlobalEnv, .prevStudy)) {
fb <- E$AdvisedTrial %>%
group_by(pid) %>%
summarise(Feedback = !all(is.na(timeFeedbackOn))) %>%
mutate(Feedback = if_else(Feedback == 1, "Feedback", "No feedback"))
tmp <- E$AdvisedTrial %>%
left_join(fb, by = "pid") %>%
filter(!is.na(advisor0idDescription)) %>%
mutate(
Advisor = advisor_description_name(advisor0idDescription),
Advisor = factor(Advisor),
responseCentre = responseEstimateLeft + (responseMarkerWidth / 2),
Rating = factor(
advisor0questionnaireHonestyLabel
),
Rating = fct_relevel(
Rating,
"Deceptive",
"Possibly Deceptive",
"Honest"
)
) %>%
order_factors()
E$Familiarisation <- tmp
}
```
```{r be-dba-r-performance, fig.caption="Task performance for Experiment 5. A: Response error. Faint lines show individual participant mean error (the absolute difference between the participant's response and the correct answer), for which the violin and box plots show the distributions. The dashed line indicates chance performance. Dotted violin outlines show the distribution of participant means on the original study which this is a replication. The dependent variable here is error, the distance between the correct answer and the participant's answer, and consequently lower values represent better performance. The theoretical limit for error is around 100."}
gg <- list()
## Task performance figure
dw <- .2
## Accuracy
for (E in c(.GlobalEnv, .prevStudy))
E$tmp <- E$Familiarisation %>%
group_by(Advisor, pid) %>%
summarise(
`Initial estimate` = mean(responseError),
`Final decision` = mean(responseErrorFinal),
.groups = 'drop'
) %>%
pivot_longer(cols = c(`Initial estimate`, `Final decision`),
names_to = 'Response', values_to = 'Error') %>%
mutate(advisor = Advisor,
Advisor = factor(paste0(Advisor, '\n'))) %>%
mutate(Response = factor(Response)) %>%
order_factors()
bf <- tmp %>%
nest(d = -Advisor) %>%
mutate(d = map(d, as.data.frame),
bf = map(d, ~ ttestBF(x = .$Error[.$Response == 'Initial estimate'],
y = .$Error[.$Response == 'Final decision'],
data = ., paired = T)),
bf = map_chr(bf, ~ .@bayesFactor %>% .$bf %>% exp() %>% bf2str())) %>%
select(-d)
gg$acc <- tmp %>%
ggplot(aes(x = Response, y = Error, colour = advisor)) +
scale_y_continuous(limits = c(0, 50), expand = c(0, 0)) +
scale_x_discrete() +
scale_fill_advisor(name = 'Advisor', aesthetics = c('fill', 'colour')) +
coord_cartesian(clip = F) +
geom_line(aes(group = pid), alpha = .25) +
geom_split_violin(aes(x = nudge(Response, dw),
group = Response, fill = advisor),
width = .9, colour = NA) +
geom_split_violin(aes(x = nudge(Response, dw), group = Response),
fill = NA, width = .9, colour = 'black', linetype = 'dashed',
data = .prevStudy$tmp) +
geom_boxplot(aes(x = nudge(Response, dw), group = Response),
outlier.shape = NA, size = 1, width = dw/2, colour = 'black') +
geom_segment(x = 1 - dw, xend = 2 + dw, y = 50, yend = 50,
colour = 'black') +
geom_label(aes(label = paste0('BF = ', bf)),
x = 1.5, y = 50, colour = 'black', data = bf) +
facet_wrap(~Advisor) +
labs(y = 'Mean error') +
broken_axis_top
## Patchwork plots together
gg$acc
```
```{r be-dba-r-aov}
## Accuracy
tmp <- Familiarisation %>%
mutate(Initial = responseError, Final = responseErrorFinal) %>%
pivot_longer(c(Initial, Final), names_to = "Decision", values_to = "Error") %>%
select(pid, Error, Decision, Advisor) %>%
mutate(across(-Error, factor)) %>%
group_by(pid, Decision, Advisor) %>%
summarise(Error = mean(Error), .groups = "drop")
aov_acc <- tmp %>%
ezANOVA(
dv = Error,
wid = pid,
within = c(Decision, Advisor)
)
mm_acc <- tmp %>%
mutate(Decision = fct_rev(Decision)) %>%
marginalMeans(Error, pid, "Reduction")
# message(glue("Interaction calculation: {mm_acc$.interactionExpression}"))
s_acc <- summariseANOVA(aov_acc$ANOVA, mm_acc)
```
Participants performed as expected, decreasing the error between the midpoint of their answer and the true answer from the initial estimate to the final decision (`r s_acc$s[1]`), which suggests that they incorporated the advice, which was indicative of the correct answer (Figure \@ref(fig:be-dba-r-performance)).
The participants had less error on decisions made with the Always honest advisor than the Sometimes deceptive advisor (`r s_acc$s[2]`), although surprisingly there was no statistically significant interaction to indicate that this was due to greater reduction in error scores over time for that advisor (`r s_acc$s[3]`).
Participants only had one marker they could place, and separate confidence judgements were not asked for, so we cannot directly assess confidence in these data.
###### Advisor performance
```{r be-dba-r-advisor-performance}
tmp <- Familiarisation %>%
group_by(pid, Advisor) %>%
summarise(
Error = mean(advisor0Error),
Distance = mean(abs(advisor0adviceCentre - responseCentre)),
.groups = "drop"
)
tt <- tmp %>%
pivot_longer(c(Error, Distance), names_to = "Dimension") %>%
pivot_wider(names_from = Advisor) %>%
nest(d = -Dimension) %>%
mutate(
s = map_chr(
d,
~ md.ttestBF(
.$`Always honest`,
.$`Sometimes deceptive`,
labels = c("M~AlwaysHonest~", "M~SometimesDeceptive~"),
paired = T
)
)
)
```
The advice given by the advisors was generated stochastically from the same distribution.
Any differences will be random.
This was demonstrably the case in the domain of advice error (absolute distance between the centre of the advice marker and the correct year; `r tt$s[1]`).
The Bayes' Factor for the domain of agreement -- the absolute distance between the centre of the advice marker and the centre of the participant's initial estimate marker -- was essentially on the threshold (`r tt$s[2]`).
###### Advice ratings
```{r be-dba-r-advice-ratings, fig.caption="Advice rating in Experiment 5. The polar plots show the number of times each participant gave the given rating to the advice of each advisor. The colour density illustrates the number of participants who gave at least that many ratings to the advice of an advisor."}
axes <- tidyr::crossing(
Advisor = c("Always honest", "Sometimes deceptive"),
x = c(1.5, 2.5, 3.5),
y = c(3, 6)
)
suppressWarnings(
Familiarisation %>%
mutate(
Rating = Rating,
Rating = fct_rev(Rating)
) %>%
ggplot(aes(x = Rating, fill = Advisor)) +
# data
geom_histogram(
aes(group = pid, colour = Advisor),
stat = "count",
position = "identity",
alpha = .1
) +
# axis
geom_vline(aes(xintercept = x), data = axes) +
geom_label(
aes(x = x, y = y, label = y),
label.size = 0,
label.r = unit(.5, "lines"),
fill = "white",
data = axes
) +
coord_polar() +
facet_wrap(~Advisor) +
scale_fill_advisor(aesthetics = c('fill', 'colour')) +
theme(
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.y.left = element_blank(),
axis.title = element_blank()
)
)
## Stats
tmp <- table(Familiarisation$Advisor, Familiarisation$Rating)
chi <- chisq.test(tmp)
bf <- contingencyTableBF(
tmp,
sampleType = "indepMulti",
fixedMargin = "rows"
)
r <- apply(tmp, 2, \(x) x[1]/x[2])
s_r <- sapply(1:length(r), \(i) paste(names(r)[i], num2str(r[i]))) %>%
paste(collapse = ", ")
s <- glue("({chi$parameter}) = {num2str(chi$statistic)}, _p_{p2str(chi$p.value)}, BF~H1:H0~ = {bf2str(exp(bf@bayesFactor$bf))}; AlwaysHonest:SometimesDeceptive ratio: {s_r}")
```
The advisors' advice was rated differently by the participants, as shown in Figure \@ref(fig:be-dba-r-advice-ratings).
The difference was statistically significant, indicating that the patterns of ratings differed depending on the advisor giving the advice ($\chi^2$`r s`).
This indicates that participants understood the task and that the manipulation worked as intended, with more suspicion applied to the advice from the Sometimes deceptive advisor.
###### \OpenScience{prereg} Effect of advice
```{r be-dba-r-advice, results='asis'}
## Actual test
x <- Familiarisation %>%
select(
pid,
advisor0WOA,
Rating = Rating
) %>%
# filter(Rating %in% c("Deceptive", "Honest")) %>%
group_by(pid, Rating) %>%
summarise(advisor0WOA = mean(advisor0WOA), .groups = "drop") %>%
pivot_wider(names_from = Rating, values_from = advisor0WOA) %>%
drop_na(-pid) %>%
pivot_longer(-pid, names_to = "Rating", values_to = "WOA") %>%
mutate(across(-WOA, factor)) %>%
mutate(
Rating = fct_relevel(
Rating,
"Deceptive",
"Possibly Deceptive",
"Honest"
)
)
aov_advice <- ezANOVA(x, WOA, pid, Rating)
mm_advice <- marginalMeans(x, WOA, pid)
s_advice <- summariseANOVA(aov_advice, mm_advice)
```
Participants chose their own ratings for the advice, and it was common for participants not to use all ratings for all advisors (e.g. many participants never rated advice from the Always honest advisor as Deceptive).
This meant that the statistical tests preregistered for this hypothesis were broken down into separate contrasts of advice and advisor.
Using a more complete test, such as 2x3 ANOVA, would have suffered greatly from missing values.
To explore the effect of advice, a 1x3 ANOVA was run on Weight on Advice across ratings.
In all, `r length(unique(x$pid))`/`r length(unique(Familiarisation$pid))` (`r num2str(length(unique(x$pid)) / length(unique(Familiarisation$pid)) * 100)`%) participants had at least one trial rated with each of the three ratings.
The Weight on Advice differed according to the rating assigned the advice (`r s_advice$s[1]`; Mauchly's test for Sphericity W = `r prop2str(aov_advice[["Mauchly's Test for Sphericity"]][2][[1]])`, _p_`r p2str(aov_advice[["Mauchly's Test for Sphericity"]][3][[1]])`).
As expected, participants were more influenced by advice they rated as Honest compared to advice they rated as Deceptive.
###### \OpenScience{prereg} Effect of advisor {#be-dba-r-h-advisor}
```{r be-dba-r-advisor, fig.caption="Weight on Advice in Experiment 5. Shows the weight of the advice of the advisors. Only trials where the participant rated the advice as Honest are included. The shaded area and boxplots indicate the distribution of the individual participants' mean Weight on Advice. Individual means for each participant are shown with lines in the centre of the graph. The dashed outline shows the distribution of participant means in the original study of which this is a replication."}
for (E in c(.GlobalEnv, .prevStudy)) {
E$tmp <- E$Familiarisation %>%
filter(Rating == "Honest") %>%
group_by(pid, Advisor) %>%
summarise(WOA = mean(advisor0WOA), .groups = "drop") %>%
pivot_wider(names_from = Advisor, values_from = WOA)
# Stats
E$.T <- E$tmp %>%
drop_na(-pid) %$%
md.ttest(
`Always honest`,
`Sometimes deceptive`,
paired = T,
labels = c("M~AlwaysHonest~", "M~SometimesDeceptive~")
)
# Graph stats
E$bf <- E$tmp %>%
drop_na(-pid) %$%
ttestBF(
`Always honest`,
`Sometimes deceptive`,
paired = T
)
E$lost <- nrow(E$tmp) - nrow(drop_na(E$tmp, -pid))
E$tmp <- E$tmp %>%
pivot_longer(-pid, names_to = "Advisor", values_to = "WOA") %>%
mutate(Feedback = "No feedback")
}
bf <- bf2str(exp(bf@bayesFactor$bf))
## Graph
dw <- .1
tmp %>%
drop_na(everything()) %>%
ggplot(aes(x = Advisor, y = WOA, fill = Feedback, colour = Feedback)) +
geom_line(aes(group = pid), alpha = .25) +
geom_split_violin(aes(x = nudge(Advisor, dw),
group = Advisor), width = .9,
colour = NA) +
geom_split_violin(aes(x = nudge(Advisor, dw),
group = Advisor,
fill = Feedback),
colour = 'black', alpha = 0, width = .9,
linetype = 'dashed', size = 1,
data = .prevStudy$tmp) +
geom_boxplot(outlier.shape = NA, size = 1, width = dw/2,
aes(x = nudge(Advisor, dw),
group = Advisor),
colour = 'black') +
geom_segment(x = 1, xend = 2, y = 1, yend = 1, colour = 'black') +
geom_label(y = 1, x = 1.5, colour = 'black', fill = 'white',
label = paste0('BF = ', bf)) +
scale_y_continuous(limits = c(0, 1),
breaks = seq(0, 1, length.out = 6),
expand = expansion(add = c(0, 0.1))) +
scale_fill_feedback(aesthetics = c('fill', 'colour')) +
labs(x = 'Advisor advice profile', y = 'Weight of "Honest" advice')
```
To distinguish the effects of the advisor from the effects of the advice, we compared Weight on Advice for only those trials where the participant rated the advice as Honest.
This approach has the limitation that `r lost` (`r num2str(lost / length(unique(Familiarisation$pid)) * 100)`%) participants had to be dropped due to never rating advice from the Sometimes deceptive advisor as Honest.
Comparing the two (Figure \@ref(fig:be-dba-r-advisor)) showed that participants placed more weight on the Honest advice from the Always honest advisor (`r .T`).
##### Exploratory analyses {#be-dba-r-h-explore}
```{r be-dba-r-overall, fig.caption="Advice response in Experiment 5. Each point is a single trial, coloured according to the rating given to the advice by the participant. Coloured lines give the best fit linear regression for dots of that colour. The horizontal axis gives the distance between the centre of the participant's initial estimate marker and the centre of the advisor's advice marker, and the vertical axis gives the distance the participant's final decision had moved in the direction of the advisor's advice. Points that lie on the dashed line indicate that the participant moved their marker all the way to the advisor's marker, thus wholly adopting the advisory estimate as their final decision."}
tmp <- Familiarisation %>%
mutate(
Distance = abs(responseCentre - advisor0adviceCentre),
pid = factor(pid)
)
tmp %>%
ggplot(aes(x = Distance, y = advisor0Influence, colour = Rating)) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
geom_smooth(method = "lm", formula = y ~ x, se = F) +
geom_point(alpha = 1/3) +
coord_fixed() +
scale_x_continuous(limits = c(min(tmp$advisor0Influence), NA)) +
scale_y_continuous(limits = c(min(tmp$advisor0Influence), NA)) +
facet_wrap(~Advisor) +
labs(y = "Influence of advice") +
broken_axis_top +
broken_axis_right
m <- tmp %>%
mutate(Rating = fct_rev(Rating)) %$%
lmer(advisor0Influence ~ Advisor * Distance * Rating + (1|pid), data = .)
x <- m %>% summary()
coefs <- cbind(
x$coefficients,
suppressMessages(confint(m))[-(1:2),]
) %>%
as_tibble(rownames = "Effect") %>%
rowid_to_column("beta") %>%
mutate(
beta = beta - 1,
across(c(Estimate:last_col(), -`Pr(>|t|)`), num2str),
`Pr(>|t|)` = p2str(`Pr(>|t|)`),
s = glue(
"{Estimate} [{`2.5 %`}, {`97.5 %`}], df = {df}, _p_{`Pr(>|t|)`}"
)
)
BF <- tmp %>%
mutate(
rpd = Rating == "Possibly Deceptive",
rd = Rating == "Deceptive",
A = Advisor == "Sometimes deceptive",
D = Distance,
pid = factor(pid)
) %>%
as.data.frame %$%
generalTestBF(
advisor0Influence ~ A + D + rpd + rd + A:D + A:rpd + A:rd + D:rpd + D:rd + A:D:rpd + A:D:rd + pid,
data = .,
whichRandom = "pid",
whichModels = "top",
progress = F
)
ef <- BF@bayesFactor %>%
as_tibble(rownames = "Effect") %>%
mutate(
Effect = map_chr(
Effect,
~ {
ef <- BF@denominator@shortName %>% str_split(" ?\\+ ") %>% .[[1]]
E <- str_split(.x, " ?\\+ ")
out <- ef[!(ef %in% E[[1]])]
if (nchar(out)) out else ""
}
),
bf = 1/exp(bf)
) %>%
mutate(
Effect = str_replace(Effect, "A", "AdvisorSometimes deceptive"),
Effect = str_replace(Effect, "D", "Distance"),
Effect = str_replace(Effect, "rd", "RatingDeceptive"),
Effect = str_replace(Effect, "rpd", "RatingPossibly Deceptive")
)
coefs <- left_join(
coefs,
select(ef, Effect, bf, error),
by = "Effect"
) %>%
mutate(
bf = if_else(is.na(bf), 1, bf),
s = glue("{s}, BF~H1:H0~ = {bf2str(bf)}")
)
```
Figure \@ref(fig:be-dba-r-overall) shows the overall pattern of advice-taking in the experiment.
The difference in the number of dots of each colour reflects the difference in participants' ratings of advice between advisors.
The lines are best-fits to the dots, and illustrate the relationship between the distance of the advice from the participant's initial estimate and the amount the participant moved the marker towards the advisor's advice for their final decision.
A linear mixed model was used to better understand the significance of the relationships in the figure, predicting influence on a trial from the advisor, the distance of the advice, and rating (dummy-coded).
Random intercepts were included for participant.
A similar Bayesian model was included to produce Bayes factors.
Overall, the distance between the initial estimate and the advice was hugely influential, with an increase in distance leading to an approximately equal increase in influence ($\beta$ = `r coefs$s[3]`).
This indicates that the advice rated Honest from the Always honest advisor was generally followed, as is shown clearly in Figure \@ref(fig:be-dba-r-overall).
This pattern indicates that the participants see the difference between their own estimates and the advisor's as indicating that they themselves are mistaken.
Conversely, the relationship between distance and influence was decreased when the advice came from the Sometimes deceptive advisor, even for advice rated as Honest ($\beta$ = `r coefs$s[6]`).
Advice rated as Deceptive from the Sometimes deceptive advisor was substantially less influential ($\beta$ = `r coefs$s[8]`), and this was complemented by the relationship between distance and influence being substantially reduced in this case ($\beta$ = `r coefs$s[12]`).
There are other patterns in Figure \@ref(fig:be-dba-r-overall) that do not come out in the statistics, possibly due to the very low trial and participant numbers, but are suggestive of a pattern we might expect.
The relationship of advice rated Honest is similar between the advisors: as the advice gets further away it retains its level of influence and the participants are willing to go further to match it.
This is not tested directly in the statistics, but there is no significant effect of advisor alone ($\beta$ = `r coefs$s[2]`), the presence of which would indicate that Honest-rated advice differed between advisors.
The Bayesian analogue of the test indicated that there was not enough evidence to decide whether Advisor alone was an important factor.
The preregistered test of this relationship above§\@ref(be-dba-r-h-advisor), however, did indicate that there was a difference when tested directly.
Similarly, there may be a difference in the way the Possibly Deceptive advice was received from the Sometimes deceptive advisor as compared to the Always honest advisor (`r coefs$s[7]`), or in how the influence of this advice related to the distance between the advice and initial estimate markers (`r coefs$s[11]`).
The statistics did not support this, and the Bayesian statistics suggested that there was sufficient evidence against either effect being present.
#### Discussion {#be-dba-d}
This experiment provided a first test of the sensitivity of participants' egocentric discounting behaviour to the context of advice, specifically the likely benevolence (honesty) of the advisor and specific pieces of advice.
Both the source and the plausibility of the advice matter, and in this task they interacted such that discounting primarily occurred where advice was offered that differed substantially from the participant's initial estimate and came from the advisor the participants were told might mislead them.
Participants adapted to the context of the advice, both in terms of how readily they were to categorise the advice as dishonest and in terms of how much they were influenced by the advice.
As advice got more distant from the initial estimate, i.e. decreased on our measure of plausibility, participants had to decide whether the discrepancy was due to their own error or their advisor's.
Where they believed the advisor was trustworthy, they were more likely to ascribe the error to themselves, rating the advice as honest and adopting it for their final decision.
Where they believed the advisor was not trustworthy, they were more likely to label the advice as misleading and to retain their initial estimate as their final decision.
Going beyond the obvious, we also saw that participants were more sceptical of advice from the less trustworthy advisor _even when they believed the advice was well-meaning_.
These observations complement the evolutionary simulations and the benevolence component of the three-factor model of trust described by @mayerIntegrativeModelOrganizational1995.
Nevertheless, we have demonstrated sensitivity to context, but not universal discounting due to hyper-priors.
It remains a matter of speculation that people exercise epistemic hygiene by ensuring that information comes from trusted sources before integrating it, and that no source is as trusted as one's own mind.
This result is in keeping with evidence from experiments where initial estimates are labelled as advice (and vice-versa).
@sollJudgmentalAggregationStrategies2011 collected initial estimates from participants and then presented those estimates back to participants along with advice so participants could provide final decisions in a classic Judge-Advisor System.
Unbeknownst to the participants, for some of the questions, the participant's initial estimate was labelled as the advice, and the advice was labelled as the participant's initial estimate.
For those questions that were switched in this manner, the participants appeared to treat the advice as if it were their own initial decision -- placing more weight on the advice than their actual initial decision.
If egocentric discounting of advice were due to judges having better access to the reasons for their own estimates, for example, their own initial estimates ought to appear most reasonable and therefore be more influential, regardless of whether they were labelled as "initial estimate" or "advice".
On the other hand, if people use a heuristic that their own opinion is more trustworthy because it is _their own_ opinion, they will rely most on whichever figure is presented as their own initial estimate, as indeed they did.
##### Limitations
Even compared to the other experiments in this thesis, these experiments had a low participant count.
As with other Dates task studies, data collection stopped when the Bayes Factor for the main experimental hypothesis reached one of the two thresholds (1/3 > BF > 3).
This, combined with the low trial count for each participant, meant that there were interesting follow-up questions that the data were unable to address.
In keeping with the open science approach, we suggest that future investigations exploring those questions take them as preregistered hypotheses.
The Dates task is one that participants find challenging, leading to generally high levels of advice-taking in the absence of other effects [@ginoEffectsTaskDifficulty2007; @yonahTheyDonKnow2021].
This means that the Honest advisor seemed to be entirely trusted.
As discussed in the introduction to this section§\@ref(chapter-context), however, even greatly trusted advisors' advice is usually subject to some egocentric discounting.
### Experiment 6: benevolence of the advisor population {#be-dbc}
```{r be-dbc-load, include = F}
source('scripts_and_filters/general_setup.R')
select_experiment(
project = 'datequiz',
function(x) filter(x, study == 'directBenevolenceContexts', version == 'v0-0-1')
)
AdvisedTrial <- AdvisedTrial %>%
filter(studyId == "directBenevolenceContexts") %>%
annotate_responses()
Trial <- annotate_responses(Trial)
```
Experiment 5§\@ref(be-dba) showed that participants responded appropriately to benevolent versus less benevolent advisors.
In this experiment, participants' advisors are no longer the same individuals throughout the experiment but are members of two different groups, a benevolent group and a less benevolent group.
From the participant's perspective, one group's members are all benevolent, while some of the other group's members are less benevolent.
Encountering members of a group of advisors, as opposed to learning about a single advisor, is a closer representation of the situation faced by the agents in scenario 1§\@ref(models-scenario-1) of the evolutionary simulations.
Once again, participants rate the advice before entering their final decision.
We expect that advice from the less benevolent group will be less likely to be rated as honest and it will be weighted less in final decision-making.
##### Open scholarship practices
This experiment was preregistered at `r unique(datequiz$preregistration)`.
The experiment data are available in the `esmData` package for R [@jaquieryOxacclabEsmDataThesis2021].
A snapshot of the state of the code for running the experiment at the time the experiment was run can be obtained from https://github.com/oxacclab/ExploringSocialMetacognition/blob/1ba333b91366c63a8ab1aed889dd87ea9295a01d/ACv2/dbc.html.
In addition to the exclusion criteria listed in the preregistration, we excluded participants who used translation software when visiting the experiment webpage.
#### Method {#be-dbc-m}
```{r be-dbc-m-structure}
tmp <- bind_rows(
mutate(practiceTrial, type = "practice", block = 0),
mutate(practiceAdvisedTrial, type = "advice_practice", block = 1),
mutate(Trial, type = "atn_check") %>% filter(!is.na(block)),
mutate(AdvisedTrial, type = "core") %>% filter(!is.na(block))
) %>%
group_by(pid, type, block) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(type, block) %>%
summarise(across(-pid, mean), .groups = "drop") %>%
arrange(block)
```
##### Procedure
The procedure was the same as the procedure for Experiment 5§\@ref(be-dba-m-procedure).
Once again, the two advisor groups were identical in how they generated advice, but they were labelled differently.
Advisors had different names and avatar images on every trial.
The advisors' background colours indicated which group they were in, with the benevolent advisors' background matching the participant's own.
Participants were told at the start of each block which context they were in.
Before the block with benevolent advisors they were told that the advisors "will all try their best to help you".
Before the block with less benevolent advisors they were told that "some of the advisors may sometimes try to mislead you".
Colours and the order in which the advisor groups were encountered were counterbalanced.
##### Advice profiles
Despite differences in labelling, the advisors were identical in terms of how they actually produced advice.
The advisors offered advice by placing an `r AdvisedTrial %>% filter(!is.na(advisor0confidence)) %>% pull(advisor0confidence) %>% unique()`-year wide marker on the timeline.
The marker was placed with its centre on a point sampled from a roughly normal distribution around the correct answer with a standard deviation of `r AdvisedTrial %>% filter(!is.na(advisor0confidenceVariation)) %>% pull(advisor0confidenceVariation) %>% unique()` years.
#### Results
###### Exclusions
```{r be-dbc-r-exclusions}
nMaxOutliers <- 1
maxTrialRT <- 60000
qqLabelWhitelist = c( ## Advice questionnaire responses must be one of:
'Deceptive',
'Possibly Deceptive',
'Honest',
NA
)
minChangeRate = .1 ## some advice taken on 10%+ of trials
exclusions <- AdvisedTrial %>%
nest(d = -pid) %>%
mutate(
atn_check = map_lgl(
pid,
~ !all(Trial$responseCorrect[Trial$pid == .]) |
any(Trial$responseMarkerWidth[Trial$pid == .] != 11)
),
`Attention check` = if_else(is.na(atn_check), T, atn_check),
`Multiple attempts` = map_lgl(
pid,
~ {
h <- unique(AdvisedTrial$pidHash[AdvisedTrial$pid == .])
p <- unique(AdvisedTrial$pid[AdvisedTrial$pidHash == h])
length(p) > 1
}
),
`Missing advice rating` = map_lgl(
d,
~ any(is.na(.$advisor0questionnaireHonestyLabel))
),
`Odd advice rating labels` = map_lgl(
d,
~ !all(.$advisor0questionnaireHonestyLabel %in% qqLabelWhitelist)
),
`Not enough changes` = map_lgl(
d,
~ {
diff <- mutate(
.,
diff = responseEstimateLeft != responseEstimateLeftFinal,
diff = if_else(is.na(diff), F, diff)
) %>%
pull(diff) %>%
mean()
diff < minChangeRate
}
)
) %>%
select(-atn_check, -d)
do_exclusions(exclusions)
nMaxTrials <- AdvisedTrial %>%
group_by(pid) %>%
summarise(n = n(), .groups = 'drop') %>%
pull(n) %>%
max()
AdvisedTrial <- AdvisedTrial %>% filter(timeEnd < maxTrialRT)
exclusions <- exclusions %>%
mutate(
`Too many outlying trials` =
map_lgl(pid, ~ AdvisedTrial %>%
filter(pid == .x) %>%
summarise(n = n(), .groups = 'drop') %>%
pull(n) < (nMaxTrials - nMaxOutliers)),
# if participants were already dropped don't mark them here
`Too many outlying trials` = if_else(!(pid %in% AdvisedTrial$pid),
F, `Too many outlying trials`)
)
do_exclusions(exclusions, backup = F)
```
Participants (total *n* = `r nrow(exclusions)`) could be excluded for a number of reasons: failing attention checks, having fewer than `r nMaxTrials - nMaxOutliers` trials which took less than `r maxTrialRT / 1000`s to complete, providing final decisions which were the same as the initial estimate on more than `r round((1 - minChangeRate) * 100)`% of trials, or using non-English labels for the honesty questionnaire.
The latter exclusion was added after data were collected because it was not anticipated that participants would use translation software in the task.
The numbers of participants who failed the various checks are detailed in Table \@ref(tab:be-dbc-r-exclusion).
The number of participants excluded was quite high.
In part, this was due to an unexpectedly high number of participants completing the experiment using translation software.
It may also have been due to the study being run on a weekend, whereas most of the other studies were run during the working week, and it is possible that participants using the recruitment platform at the weekend are less well practised at taking experiments than those using it during the week.
The final participant list consists of `r length(unique(AdvisedTrial$pid))` participants who completed an average of `r num2str(mean(aggregate(advisor0 ~ pid, AdvisedTrial, length)$advisor0))` trials each.
```{r be-dbc-r-exclusion}
exclusions$`Total excluded` <- exclusions %>% select(-pid) %>% apply(1, any)
n <- ncol(exclusions)
exclusions %>%
summarise(across(where(is.logical), sum)) %>%
mutate(`Total remaining` = length(unique(AdvisedTrial$pid))) %>%
pivot_longer(everything(), names_to = "Reason", values_to = "Participants excluded") %>%
kable(caption = "Participant exclusions for Experiment 6") %>%
row_spec((n - 1):n, bold = T)
```
###### Task performance
```{r be-dbc-r-split}
fb <- AdvisedTrial %>%
group_by(pid) %>%
summarise(Feedback = !all(is.na(timeFeedbackOn))) %>%
mutate(Feedback = if_else(Feedback == 1, "Feedback", "No feedback"))
tmp <- AdvisedTrial %>%
left_join(fb, by = "pid") %>%
filter(!is.na(advisor0idDescription)) %>%
mutate(
Advisor = advisor_description_name(advisor0idDescription),
Advisor = factor(Advisor),
responseCentre = responseEstimateLeft + (responseMarkerWidth / 2),
Rating = factor(
advisor0questionnaireHonestyLabel
),
Rating = fct_relevel(
Rating,
"Deceptive",
"Possibly Deceptive",
"Honest"
)
) %>%
order_factors()
Familiarisation <- tmp
```
```{r be-dbc-r-performance, fig.caption="Task performance for Experiment 6. A: Response error. Faint lines show individual participant mean error (the absolute difference between the participant's response and the correct answer), for which the violin and box plots show the distributions. The dashed line indicates chance performance. Dotted violin outlines show the distribution of participant means on the original study which this is a replication. The dependent variable here is error, the distance between the correct answer and the participant's answer, and consequently lower values represent better performance. The theoretical limit for error is around 100."}
gg <- list()
## Task performance figure
dw <- .2
## Accuracy
tmp <- Familiarisation %>%
group_by(Advisor, pid) %>%
summarise(
`Initial estimate` = mean(responseError),
`Final decision` = mean(responseErrorFinal),
.groups = 'drop'
) %>%
pivot_longer(cols = c(`Initial estimate`, `Final decision`),
names_to = 'Response', values_to = 'Error') %>%
mutate(advisor = Advisor,
Advisor = factor(paste0(Advisor, '\n'))) %>%
mutate(Response = factor(Response)) %>%
order_factors()
bf <- tmp %>%
nest(d = -Advisor) %>%
mutate(d = map(d, as.data.frame),
bf = map(d, ~ ttestBF(x = .$Error[.$Response == 'Initial estimate'],
y = .$Error[.$Response == 'Final decision'],
data = ., paired = T)),
bf = map_chr(bf, ~ .@bayesFactor %>% .$bf %>% exp() %>% bf2str())) %>%
select(-d)
gg$acc <- tmp %>%
ggplot(aes(x = Response, y = Error, colour = advisor)) +
scale_y_continuous(limits = c(0, 50), expand = c(0, 0)) +
scale_x_discrete() +
coord_cartesian(clip = F) +
geom_line(aes(group = pid), alpha = .25) +
geom_split_violin(aes(x = nudge(Response, dw),
group = Response, fill = advisor),
width = .9, colour = NA) +
geom_boxplot(aes(x = nudge(Response, dw), group = Response),
outlier.shape = NA, size = 1, width = dw/2, colour = 'black') +
geom_segment(x = 1 - dw, xend = 2 + dw, y = 50, yend = 50,
colour = 'black') +
geom_label(aes(label = paste0('BF = ', bf)),
x = 1.5, y = 50, colour = 'black', data = bf) +