-
Notifications
You must be signed in to change notification settings - Fork 8
/
bmr_tutorial.tex
1208 lines (991 loc) · 71.1 KB
/
bmr_tutorial.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[nobib]{tufte-handout}
% \documentclass[fleqn,reqno,12pt]{article}
%========================================
% Packages
%========================================
\usepackage[nographicx, nohyperref, nosubcaption, nogb4e]{mfpackages}
\usepackage{mfenvironments}
\usepackage{mfcommands}
% for info boxes
\usepackage{newfloat, caption}
\DeclareCaptionType{InfoBox}
%========================================
% Bibliography
%========================================
\bibliography{references.bib}
%========================================
% General Layout Tweaks
%========================================
% \usepackage[margin=2cm]{geometry}
% Itemize
\renewcommand{\labelitemi}{\large{$\mathbf{\cdot}$}} % itemize symbols
\renewcommand{\labelitemii}{\large{$\mathbf{\cdot}$}}
\renewcommand{\labelitemiii}{\large{$\mathbf{\cdot}$}}
\renewcommand{\labelitemiv}{\large{$\mathbf{\cdot}$}}
% Description
\renewcommand{\descriptionlabel}[1]{\hspace\labelsep\textsc{#1}}
% Figure Captions
\usepackage{caption} % use corresponding myfiguresize!
\setlength{\captionmargin}{20pt}
\renewcommand{\captionfont}{\small}
\setlength{\belowcaptionskip}{7pt} % standard is 0pt
%========================================
% Define colors and comment functions
%========================================
\usepackage{xcolor}
\definecolor{firebrick}{RGB}{178,34,34}
\definecolor{DarkGreen}{RGB}{34,178,34}
\definecolor{DarkOrange}{RGB}{255,100,50}
\renewcommand{\mf}[1]{\textcolor{firebrick}{[mf: #1]}}
\newcommand{\tr}[1]{\textcolor{DarkOrange}{[tr: #1]}}
%========================================
% Configuring the R code presentation
%========================================
\usepackage{courier}
\usepackage{listings}
\usepackage{color}
% the following defines the layout for the R code
\lstset{ %
language=R, % the language of the code
basicstyle=\footnotesize\ttfamily, % size and type of the fonts that are used for the code
numbers=left, % where to put the line-numbers
numberstyle=\tiny\color{gray}, % the style that is used for the line-numbers
stepnumber=0, % the step between two line-numbers. If it's 1, each line
% will be numbered
numbersep=5pt, % how far the line-numbers are from the code
backgroundcolor=\color{white}, % choose the background color. You must add \usepackage{color}
showspaces=false, % show spaces adding particular underscores
showstringspaces=false, % underline spaces within strings
showtabs=false, % show tabs within strings adding particular underscores
frame=single, % adds a frame around the code
rulecolor=\color{white}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. commens (green here))
tabsize=2, % sets default tabsize to 2 spaces
captionpos=b, % sets the caption-position to bottom
breaklines=true, % sets automatic line breaking
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
title=\lstname, % show the filename of files included with \lstinputlisting;
% also try caption instead of title
keywordstyle=\color{black}, % keyword style
alsoletter={_}, % this treats _ as a letter
commentstyle=\color{DarkGreen}, % comment style
stringstyle=\color{DarkOrange}, % string literal style
escapeinside={\%*}{*)}, % if you want to add a comment within your code
morekeywords={*, ...}, % if you want to add more keywords to the set
deletekeywords={_} % remove keywords from list
}
% this is for showing the R output
\lstnewenvironment{rc}[1][]{\lstset{language=R, stepnumber=1}}{}
% this is for inline R code
\newcommand{\ri}[1]{\lstinline{#1}} %% Short for 'R inline'
%========================================
% Article Header
%========================================
\title{Bayesian regression modeling (for factorial designs): A tutorial}
\author{Michael Franke \& Timo Roettger}
\date{}
%========================================
% Article Body
%========================================
\begin{document}
\maketitle
\begin{abstract}
\noindent Generalized linear mixed models are handy tools for statistical inference, and Bayesian approaches to applying these become increasingly popular. This tutorial provides an accessible, non-technical introduction to the use and feel of Bayesian mixed effects regression models. The focus is on data from a factorial-design experiment. \\
\medskip
\noindent \textbf{This tutorial should take you about 1.5 hours.}
\end{abstract} \marginnote{We would like to thank Oliver Bott, Joseph Cassilas, Artur Czeszumski, Fabian Dablander, Judith Degen, Elisa Kreiss, and Bodo Winter for their invaluable comments and suggestions on an earlier draft.}
\section{Motivation \& intended audience}
This tutorial provides a very basic introduction to Bayesian regression modeling using R \citep{Manual}. We wrote this tutorial with a particular reader in mind. If you have used R before and if you have a basic understanding of linear regression, and now you want to find out what a Bayesian approach has to offer, this tutorial is for you. In comparison to other introductions \citep[e.g.][]{SorensenHohensteinb2016:Bayesian-linear}, this tutorial remains very conceptual. We don’t want to ``sell Bayes'' to you, and we do not want to scare you away with mathematical details. We just want to give you an impression of how a Bayesian regression analysis looks and feels. The tutorial covers the essential concepts and explains how to run and interpret the output of a Bayesian regression analysis using the wonderful R package \texttt{brms} written by Paul \citet{buerkner2016brms}.
\marginnote{This tutorial contains gray text boxes which contain additional background information.
The information is sometimes a bit technical but never absolutely necessary for understanding the main ideas.
So feel free to read or skip any of the text boxes to suit your needs.}
If you don’t have any experience with regression modeling, you will probably still be able to follow, but you might also want to consider doing a crash course. To bring you up to speed, we recommend the excellent two-part tutorial by Bodo \citet{Winter2013:Linear-models-a} on mixed effects regression in a non-Bayesian ---a.k.a.~frequentist--- paradigm. In a sense, this tutorial could be considered part three of the series started by Winter. We will for example use the same data set.
To actively follow this tutorial, you should have R installed on your computer (\url{https://www.r-project.org}).
Unless you already have a favorite editor for tinkering with R scripts, we recommend to try out RStudio (\url{https://www.rstudio.com}).
You will also need some packages,
%\marginnote[-1cm]{Remember that you can install a package called \texttt{XYZ} %with the command \texttt{install.packages("XYZ")}.}
which you can import with the following code:
\marginnote[-1cm]{All code and data for this tutorial is also
available for download here:
\url{https://tinyurl.com/bmr-tutorial}}
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# package for convenience functions (e.g. plotting)
library(tidyverse)
# package for Bayesian regression modeling
library(brms)
# option for Bayesian regression models:
# use all available cores for parallel computing
options(mc.cores = parallel::detectCores())
# package for credible interval computation
library(HDInterval)
# set the random seed in order to make sure
# you can reproduce the same results
set.seed(1702)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
\section{Data, research questions \& hypotheses}
\label{sec:data}
This tutorial looks at a data set relevant for investigating whether voice pitch differs across female and male speakers, and whether it differs across social contexts (say: informal and polite contexts).
To load the data into your R environment, run the following code:
%
\marginnote{The data is originally from research
presented by \citet{WinterGrawunder2012:The-Phonetic-Pr}. It is used in the tutorials by \citet{Winter2013:Linear-models-a} as well, but here we `massaged' the data a bit, i.e., we renamed variables and removed a line with missing data.}
%
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# load the data into variable "politedata"
politedata = read_csv("https://tinyurl.com/polite-data")
\end{lstlisting}
\end{minipage}
\vspace*{-0.5cm}
\noindent Type \ri{head(politedata)} and you should see the first lines of the
data:
\begin{minipage}[]{\textwidth}
\begin{rc}
> head(politedata)
subject gender sentence context pitch
<chr> <chr> <chr> <chr> <dbl>
1 F1 F S1 pol 213.
2 F1 F S1 inf 204.
3 F1 F S2 pol 285.
4 F1 F S2 inf 260.
5 F1 F S3 pol 204.
6 F1 F S3 inf 287.
\end{rc}
\end{minipage}
\vspace{-0.5cm}
\noindent This data set contains information about different subjects, with an anonymous identifier stored in variable \texttt{subject}.
Because voice pitch is highly dependent on gender (i.e. there are anatomical differences between women and men that affect voice pitch), the variable \texttt{gender} stores whether a subject is F(emale) or M(ale).
Subjects produced different sentences (stored in the variable \texttt{sentence}), and the experiment manipulated whether the sentence was produced in a polite or an informal context, indicated by the variable \texttt{context}. Crucially, each row contains a measurement of pitch in Hz stored in the variable \texttt{pitch}.
Often, we are interested in an \textbf{outcome variable} (also called response or dependent variable, here \texttt{pitch}). We want to know how this outcome variable behaves
across different conditions or groups, which we call \textbf{predictors} (also called independent variable, here \texttt{gender} and \texttt{context}). Before our data collection,
we might have formulated concrete predictions about the relationship between the outcome and predictors. Let's assume previous research suggests that pitch is an indicator of politeness. Informal speech is accompanied by higher pitch in both men and women. Because the physiological difference between men and women is very large, we also predict that even in informal contexts, men still have lower pitch than women in polite contexts. We formulate the following hypotheses:
% \marginnote{Notice that our hypotheses are formulated explicitly as comparisons of
% means / averages. The statistical model we will use indeed compares means, and this is common
% practice, albeit a particular assumption worth highlighting. Commonly, this assumption is implicit: For example we could simply formulate H1 as something like
% ``Female speakers' pitch is lower in polite than informal contexts.'' }
\begin{enumerate}[{H}1:]
\item Female speakers have a lower average pitch in polite than in informal contexts.
\item Male speakers have a lower average pitch in polite than in informal contexts.
\item Male speakers have a lower average pitch in informal than female speakers have in polite contexts.
\end{enumerate}
\section{Exploring the data visually}
Figure~\ref{fig:BasicPlotData_data} displays the mean pitch values for each
sentence (semi-transparent points) across gender and context.
%
\marginnote{Extensive plotting is always recommended to start data analysis. You need to
know your data inside out. Pictures often reveal relationships much better than
numbers can.}
%
The solid points indicate the average pitch values across all sentences and speakers.
Looking at the plot, we can see that pitch values from female speakers are generally higher
than those from male speakers (points in the left column are higher than in the right column).
We also see that pitch values in the informal context are slightly higher than those in the polite context (orange points are
slightly higher than purple points).
% \marginnote{The code needed to generated the picture in
% Figure~\ref{fig:BasicPlotData_data} is not reproduced here, but included in the script in the
% resources for this tutorial: \url{https://github.com/michael-franke/bayes_mixed_regression_tutorial}}
\begin{figure}[t]
\centering
\includegraphics[width = \textwidth]{pics/basic_data_plot.pdf}
\caption{Basic plot of the data displaying overall averages (thick points) and averages for individual sentences (smaller semitransparent points).}
\label{fig:BasicPlotData_data}
\end{figure}
Looking at the plot, we might want to shout: ``The data confirm all of our hypotheses!'' But,
of course, we need to be more careful. As Bayesians, we would like to translate the data into
an expression of \textbf{evidence}: does the data provide evidence for our research hypotheses?
-- Also, notice that there is quite a lot of variability between different sentences (the
semi-transparent points in Figure 1). For example, some values from the informal condition for
female speakers (orange points in left column) are lower than their corresponding polite
counterparts. Similarly, there could be differences between individual speakers. What we want
are precise estimates of potential differences between conditions. We also want a measure of
how confident we can be in these estimates.
\begin{figure}[h]
\centering
\includegraphics[width = \textwidth]{pics/table_mean_hypotheses_cropped.png}
\caption{Coefficients of a dummy-coded regression model for the factorial $2 \times 2$ design, together with research hypotheses as statements about ordinal relations between cell means.}
\label{fig:BasicPlotData_table}
\end{figure}
\section{A regression model for our data}
Another way of looking at the data in connection with our research hypotheses is displayed in
Figure~\ref{fig:BasicPlotData_table}. Each cell in this \textbf{design matrix} represents one
unique combination of the gender and the context factor.
%, and the table shows the mean of
%observed pitch values for each cell.
%
% \marginnote{In technical terms, this table is the \textbf{design matrix} of our experiment. We
% have two factors of interest \texttt{context} and \texttt{gender}, each with two levels. The
% table shows each combination of levels of all relevant factors. The cells in this table are
% therefore also called \textbf{design cells} \tr{is this marginnote essential or can we maybe
% get rid of it?}.}
%
Our hypotheses can be related to the comparison between the means of these cells.
H1 makes a statement about the comparison between C(ells) 1 and 2 (the context effect for female speakers); H2 makes a statement about C3 and C4 (the context effect for male speakers); and H3 makes a statement about C2 and C3 (the difference between informal male speakers and polite female speakers).
Before going into data analysis, let's look at the \textbf{regression model} we want to use.
%
\marginnote{The setup of this (non-hierarchical) regression model is not specific to a Bayesian approach. You would use the exact same for a frequentist analysis.}
%
Our regression model assumes that pitch values observed in each cell are
sampled from a population that is normally distributed around a mean, where each cell $c_i$ has its own mean $\mu_i$. We are interested in the probability of one cell mean being larger than another cell mean, i.e., the probability that $\mu_i > \mu_j$. Put differently, we are interested in the probability that the difference between $\mu_i$ and $\mu_j$ is larger than zero: $\mu_i - \mu_j > 0$.
%and since, let's say, it is easier to test whether a value is bigger than zero, we can encode the cell means like in Figure~\ref{fig:coefficients_table}. \tr{I cannot see this in the figure. The figure merely displays how we calculate the individual cell mean in a regression}
Figure~\ref{fig:BasicPlotData_table} illustrates the encoding scheme of our cell means in terms of a regression analysis. It assumes that
there is a \textbf{reference level} for each factor. Here it is the level \texttt{female} for
the factor \texttt{gender} and the level \texttt{informal} for the factor
\texttt{context}.\marginnote{This is so-called \textbf{dummy coding} (also referred to as \textbf{treatment coding}) of the regression
coefficients. Other coding schemes exist, but are not discussed here.}
All cell means can then be expressed in terms of differences between the \textbf{intercept} aka our reference level
$\beta_0$, deviations from this reference level for each individual factor
($\beta_{\text{male}}$, and $\beta_{\text{pol}}$), and a so-called \textbf{interaction term}
$\beta_{\text{pol\&male}}$. In
other words, our regression estimates the mean of the reference level and estimates how much we
need to adjust this mean when we change either the context level (C2), the gender level (C3),
or both (C4). The $\beta$ terms are also called \textbf{coefficients}. They are \textbf{free parameters} of the model.
%\begin{figure}[]
% \centering
% \includegraphics[width = 0.8\textwidth]{pics/table_coefficients.pdf}
% \caption{Coefficients of a dummy-coded regression model for the factorial $2 \times 2$ design.}
% \label{fig:coefficients_table}
%\end{figure}
%\begin{InfoBox}[]
%\centering
%\colorbox{mygray}{\centering
% \begin{minipage}{1.0\textwidth}
%
% \emph{Bayesian inference: priors, likelihoods and posteriors}
% \medskip
%
% Jones is a rational scientist and has recently become interested in speech patterns. She has read that women have higher pitch than men, but she is not quite convinced. She is determined to find out. How? Well, naturally, by rationally updating her
% \emph{prior beliefs} about the pitch differences between men and women to obtain a new \emph{posterior belief} based on
% empirical observation. Central to this updating is Jones'
% \emph{likelihood function}, which encodes how likely any pitch difference may have generated the observed data.
%
% \paragraph{Prior beliefs.} Jones is a roll model scientist and starts from the prior assumption that there is no relationship between voice pitch and gender. She beliefs that men and women have the same voice pitch or in other words that the difference between male and female voice pitch is, on average at least, zero. Obviously she knows that nature is a messy beast, so she also departs from the assumption that this average difference varies a little bit on a case by case basis. She assumes that the average difference has a standard deviation of say 50.
%
% \paragraph{Likelihood.} Let's assume that Jones takes a microphone and records five men and five women in front of her house. She measures their pitch values. Based on these data she can estimate the likelihood function (or simply likelihood). The likelihood is basically a function that maximizes the probability of getting an observed distribution.
%
% \paragraph{Posterior beliefs.} Jones' measurements show a mean difference in pitch of 100 Hz and a standard deviation of 70 Hz. Her empirical observations suggest women have higher pitch values than men. Based on these observations, should she
%still believe that there is no difference between male and female pitch? By \emph{Bayes rule} her posterior beliefs are proportional to the product of her prior and the likelihood of her empirical observations.
%
%%\begin{figure}[]
%% \centering
%% \includegraphics[width = 0.8\textwidth]{pics/prior_post.png}
%% \caption{Possible relationship between prior, likelihood, and posterior. After observing the data, Jones updates her prior and shifts her posterior beliefs towards the right.}
%% \label{fig:prior_post}
%%\end{figure}
%
%\tr{possible to integrate the plot here? Well now that I look at it, this entire box might be redundant as you kinda talk about it in the next section. What do you think? Maybe cut?}
% Jones' posterior belief in the voice pitch difference between men and women has shifted. After making her observation, rational Jones has updated her initial belief and is now more certain that there actually is a difference between men and women.
%
% \end{minipage} \par
% } \par
% \begin{center}
% Info Box 1: Priors, likelihood and posteriors in Bayesian inference.
% \end{center}
% % \caption{\label{InfoBox:asymptotic_CIs} Here is my caption}
%\end{InfoBox}
\section{A Bayesian analysis of a (fixed effects) regression model}
Having spelled out a model like the above, one way of testing our hypotheses in a Bayesian setting uses so-called \textbf{parameter inference}.
%
\marginnote[-2cm]{Another approach to testing hypotheses in a Bayesian setting is to use model comparison. This tutorial focuses on parameter estimation only for parallelism with the standard frequentist practice. We will briefly touch on model comparison at the end.}
%
Bayesian parameter inference asks: What should we believe about the values of the coefficients $\beta_0$, $\beta_{\text{pol}}$, $\beta_{\text{male}}$ and $\beta_{\text{pol\&male}}$ given the data, the model and whatever we believed before having seen the data?
%Info Box~1 provides some background on Bayesian reasoning.
Formally, if $\theta$ is a vector of parameter values of the model, we are interested in the \textbf{posterior distribution} $P(\theta \mid D)$, which assigns a non-negative number to each tuple of parameters in proportion to how likely that tuple is.
From a Bayesian point of view, the model consists of a \textbf{likelihood function} $P(D \mid \theta)$, which specifies how likely an observation of data $D$ is for each value of parameters $\theta$, and a \textbf{prior distribution} $P(\theta)$, which specifies how likely we (the rational reasoner) believe each tuple of parameter values is in the first place.
%
\marginnote{Supplying a prior, really just \textit{any} prior, is important to get a
Bayesian analysis off the ground; a circumstance which is discussed controversially. For many
practical purposes, however, the precise choice of priors is not decisive and tools like the
\texttt{brms} package (which we will use here) chooses generically reasonable default priors for the model (more on this below).}
%
With these ingredients, we can compute the posterior distribution by Bayes rule, as follows:
\begin{eqnarray*}
P(\theta \mid D) = \frac{P(\theta) \ P(D \mid \theta)}{ \int P(\theta') \ P(D \mid
\theta') \textrm{d}\theta'}
\end{eqnarray*}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Info Box 1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\begin{InfoBox}[]
%\centering
%\colorbox{mygray}{\centering
% \begin{minipage}{1.0\textwidth}
%
% \emph{Bayesian inference: priors, likelihoods and posteriors}
% \medskip
%
% Smith claims that the mean pitch frequency of female Korean speakers is 130\,Hz. Miller claims that it is 160\,Hz. Jones is a rational scientist, and she wants to use data to inform her beliefs about whose theory is more likely correct. She does this by updating her prior beliefs via a likelihood function to obtain posterior beliefs.
%
% \paragraph{Prior beliefs.} Jones' prior beliefs encode what she believes at the outset. She
% considers two hypotheses, namely: H1, that the mean pitch of female Korean speakers is $\mu
% = \mu_1 = 130$ and, H2, that it is $\mu = \mu_2 = 160$. Jones initially believes that H1 is nine times more likely than H2 (for whatever reason), so that $P(\mu = \mu_1) = 0.9$ and, consequently, $P(\mu = \mu_2) = 0.1$.
%
% \paragraph{Data.} Jones has recorded pitch frequencies of five speakers: \\ $D = \langle 140, 145, 150, 155, 160 \rangle$.
%
% \paragraph{Likelihood.} The likelihood function determines how likely any potential
% observation is. It can depend on parameters, like the unknown $\mu$. In the case at hand,
% let's assume that the data is generated by a normal distribution, with mean $\mu$ and a
% fixed/known standard deviation $\sigma = 10$. The likelihood of the data is then $P(D \mid
% \mu) = \prod_{i} \mathcal{N}(d_i, \mu, \sigma)$. This gives us $P(D \mid \mu_1) \approx
% 1.31e-12$ and $P(D \mid \mu_2) \approx 2.38-09$.
%
% \paragraph{Posterior beliefs.} The data is more likely under the second hypothesis, but Jones initially considered the first more likely. Bayesian inference combines prior beliefs and likelihood. By \emph{Bayes rule} we compute the posterior probability of $\mu_1$ to be:
% \begin{align*}
% P(\mu_1 \mid D) = \frac{P(\mu_1) \ P(D \mid \mu_1)}{ \sum_{i =1}^2 P(\mu_i) \ P(D \mid \mu_i)} = \frac{0.9 \cdot 1.31e-12}{ 0.9 \cdot 1.31e-12 + 0.1 \cdot 2.38-09} \approx 0.005
% \end{align*}
% The data has very substantially changed Jones' beliefs: after rational belief update, she now considers H2 almost 200 times more likely than H1.
%
% \end{minipage} \par
% } \par
% \begin{center}
% Info Box 1: Priors, likelihood and posteriors in Bayesian inference.
% \end{center}
% % \caption{\label{InfoBox:asymptotic_CIs} Here is my caption}
%\end{InfoBox}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The R package \texttt{brms} \citep{buerkner2016brms} makes it easy to run Bayesian regression models. It uses a very similar formula syntax as related packages for regression analysis. In our case, we want to regress the dependent variable \texttt{pitch} against the independent variables \texttt{gender} and \texttt{context} and their two-way interaction. This model is expressed by the formula:
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# formula for (fixed effects) regression model
formula_FE = pitch ~ gender * context
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
The Bayesian model can then be fitted with the function \texttt{brm} from the \texttt{brms} package. We only need to specify the formula and supply the data. Here, we also set a seed to make sure we all produce exactly the same results.
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# run regression model in brms
model_FE = brm(
formula = formula_FE,
data = politedata,
seed = 1702
)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
\noindent The \texttt{brms} package uses the probabilistic programming language \texttt{Stan}
in the background. \texttt{brms} basically translates our formula into Stan code that
implements a regression model and then executes it. The Stan code is then translated to C++
(hence the message about ``compiling C++'' when you run this code). Both the compilation in C++
and the sampling in Stan can sometimes take a while. It might also eat your battery in no time,
so always make sure your laptop is plugged in. Conceptually, Stan obtains samples from the
posterior distribution, based on an algorithm called \emph{Hamiltonian Monte Carlo}. This is an
instance of a more general class of algorithms, called \emph{Markov Chain Monte Carlo (MCMC)}
methods. The purpose of these methods is to return representative samples from the posterior
distribution. If you are interested in finding out more about this `sampling stuff', check out
Info Box~2.
\begin{InfoBox}[]
\centering
\colorbox{mygray}{\centering
\begin{minipage}{1\textwidth}
\emph{Markov Chain Monte Carlo (MCMC) sampling}
\medskip
Bayesian ideas are old, but they have recently seen a revival, mainly due to
advances in computer science (notably: clever algorithms, not just faster computers). To
understand this, consider Bayes rule for data analysis. We have a prior $P(\theta)$ over
parameter $\theta$ and a likelihood function $P(D\mid\theta)$. We want to compute
the posterior distribution:
\begin{eqnarray*}
P(\theta \mid D) = \frac{P(\theta) \ P(D \mid \theta)}{ \int P(\theta') \ P(D \mid
\theta') \textrm{d}\theta'}
\end{eqnarray*}
If $\theta$ is a high-dimensional vector of parameters (e.g., in a hierarchical regression model), it
might be quite impossible to compute the integral in the denominator. Fortunately, clever
algorithms like MCMC allow us to draw \textbf{representative samples} from the
posterior distribution without having to calculate the integral-of-doom.
\medskip
For common applications, it is not required to understand MCMC algorithms in full detail.
It suffices to know that samples are collected by starting at (usually random) initial
parameter values, and then ``jump around the parameter space'' in such a way that, if we
were to jump infinitely often, we will have visited any particular tuple of parameter
values with a relative frequency that corresponds exactly to its posterior probability. So
these algorithms are guaranteed to give us representative samples, given unlimited time to
jump around.
\medskip
To ensure the trustworthiness of a finite set of samples, we routinely run several
\textbf{chains}, i.e., we start the ``jumping around'' procedure at different random
initial starting points and check whether the different chains reached a similar outcome.
We do this by plotting (so-called \emph{trace plots}) and the \textbf{$\hat{R}$-value}
which is included in the \texttt{brm} model summary. An $\hat{R}$-statistic compares the
variance of the samples within each chain to that from all samples across chains. If the
$\hat{R}$-value is below 1.1, we commonly assume that the chains have converged
sufficiently.
\medskip
In practice, the \texttt{brm} model summary and the messages during the fitting process will inform you about potential problems, and will normally offer good advice on how to solve the issue.
For example, sometimes the different chains do not converge on sufficiently similar results.
One quick and simple solution to these ``convergence issues'' is increasing the number of samples (specifying the option \texttt{iter} in the \texttt{brm} function call).
\end{minipage} \par
} \par
\begin{center}
Info Box 1: Background on sampling methods \& diagnostics.
\end{center}
\end{InfoBox}
You can enter \texttt{model\_FE} in order to get a summary of the model fit. It should look like the following output.
\begin{minipage}[]{1.2\textwidth}
\begin{rc}
> model_FE
Family: gaussian
Links: mu = identity; sigma = identity
Formula: pitch ~ gender * context
Data: politedata (Number of observations: 83)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 260.64 7.76 245.18 275.77 2148 1.00
genderM -116.17 11.16 -137.71 -94.36 2115 1.00
contextpol -27.43 11.11 -48.75 -6.36 1959 1.00
genderM:contextpol 15.83 15.87 -15.16 46.89 1975 1.00
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma 36.14 2.81 31.09 42.08 4416 1.00
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
\end{rc}
\end{minipage}
\vspace{-0.5cm}
Lines 2--5 give us information about the model and the data used. Lines 6 and 7 tell us about
the sampling procedure (see Info Boxes~2 \& 3 for more information). Lines 9--14 contain information about our
parameters of interest. We will discuss them in detail below. Lines 16--18 contain information
that look similar to those in lines 9--14. This is the estimation of the standard deviation
\texttt{sigma}, describing the variance of the assumed normal distribution (which describe the
distribution of measures in each design cell). Finally, lines 20--22 contain general
information about the model fit and the information presented in this summary.
\marginnote{If the model failed to converge or other problems occurred, you would see an informative message in the last part of this summary. See Info Box~2 for more on ``convergence''.}
Let us look at lines 9--14 in detail now. What these lines give us is a table with four rows, each of which corresponds to a parameter in the model, namely the coefficients shown in Figure~\ref{fig:BasicPlotData_table}. The variable \texttt{Intercept} refers to the coefficient $\beta_0$, which represents the mean of the reference level in cell 1 (female speakers in polite contexts). The variable \texttt{genderM} corresponds to the coefficient $\beta_{\text{male}}$, \texttt{contextpol} corresponds to the coefficient $\beta_{\text{pol}}$, and \texttt{genderM:contextpol} is the interaction coefficient $\beta_{\text{pol\&male}}$. For each of these parameters, the table contains very useful summary statistics based on the samples returned from the model fit.
%
\marginnote[-2.5cm]{Intuitively, the 95\% credible interval is the range of values that we can often practically consider credible enough to care about.}
%
More about the information given in the columns can be found in Info Box~3.
\begin{InfoBox}[t]
\centering
\colorbox{mygray}{\centering
\begin{minipage}{1.0\textwidth}
\emph{Information displayed in the summary of a \texttt{brm} model fit}
\medskip
Lines 6--7 of the summary of \texttt{model\_FE} tell us that we collected a total of 4000 samples from the posterior distribution. These came from four chains, each running for 2000 iterations, but discarding the first 1000 samples as \textbf{warm up}. We discard the first 100 samples because the initial starting point might be quite ``unrepresentative'' (see also Info Box~2).
From the table in lines 9--14, the column \emph{Estimate} gives the mean of the obtained
samples, thereby approximating the mean of the (marginal) posterior distribution for each
parameter. For example, the parameter \texttt{Intercept} is estimated to have a mean of
about $261$, which (here) coincides with the mean of the data points in cell 1, as shown in
Figure~\ref{fig:BasicPlotData_table}. \emph{Est.Error} is the estimation error, an
indication of the certainty we should have about the whole inference procedure. The columns
\texttt{l-95\% CI} and \texttt{u-95\% CI} give the lower and upper bound of the 95\%
credible interval for each parameter, as discussed in the main text. The column
\texttt{Eff.Sample}, for efficient samples, gives us a rough measure of how many of all the
samples we took (4000 in our case) are contributing non-redundant information to our
estimation. The higher this number, the better. Finally, we get the \texttt{Rhat} column
with the $\hat{R}$-values for each parameter (see Info Box~2).
\end{minipage} \par
} \par
\begin{center}
Info Box 2: Information in summaries of \texttt{brm} model fits.
\end{center}
\end{InfoBox}
For our current purposes, the information in columns \texttt{l-95\% CI} and \texttt{u-95\% CI} is most important.
These numbers give the lower and upper bounds of a \textbf{95\% credible interval}.
Take the parameter \texttt{contextpol}, corresponding to the coefficient $\beta_{\text{pol}}$.
This parameter corresponds to the estimated change of the mean of the reference level when we change the context level to polite.
The 95\% CI is roughly [-49;-6].
We could take values outside of this interval to be sufficiently unlikely to be ignorable for most purposes.
Consequently, this analysis suggests that zero is a very unlikely value for the coefficient
$\beta_{\text{pol}}$.
Given the data and the model, we should believe that $\beta_{\text{pol}}$ is most likely negative.
This directly addresses the first research hypothesis. Based on the regression model, the data suggests that H1 is likely true.
But how likely? How likely is it that $\beta_{\text{pol}}$ is smaller than zero? Instead of simply making a binary thumbs-up / thumbs-down decision, it would be even more elegant if we could put a number to it. As Bayesians we fortunately can. To see how this works, let us have a more intimate look at the samples that the \texttt{brm} function returns. We can access the samples of a model fitted with \texttt{brm} with the function \texttt{posterior\_samples}:
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# extract posterior samples
post_samples_FE = posterior_samples(model_FE)
head(post_samples_FE %>% round(1))
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
The output of this could look like this:
\begin{minipage}[]{1.2\textwidth}
\begin{rc}
b_Intercept b_genderM b_contextpol b_genderM:contextpol sigma lp__
1 262.0 -117.6 -25.1 13.5 38.8 -420.1
2 257.8 -118.9 -32.0 12.6 37.6 -421.5
3 255.4 -114.4 -28.0 11.4 37.2 -420.5
4 249.4 -101.1 -8.9 -5.6 33.8 -421.0
5 283.4 -137.5 -46.1 43.4 42.2 -425.4
6 284.8 -127.8 -45.8 39.7 40.6 -427.6
\end{rc}
\end{minipage}
\vspace{-0.5cm}
What you see here is the top 6 rows of a data frame with columns for each parameter and 4000 rows, corresponding to each sample of that parameter (so the sampling method has generated 4000 values for each parameter, where each row is a sample from the posterior distribution $P(\theta \mid D)$).
%
\marginnote{The column \texttt{lp\_\_} contains the log-probability of the data for the parameterization in each row. This is useful for model comparison and model criticism but not important for our current adventures.}
%
We can use these samples to produce density plots reflecting our posterior distribution. The plot in Figure~\ref{fig:Posteriors_FE}
shows, for each of the four main model parameters, an estimate of the posterior density. Each
curve shows how much credence we should put on particular parameter values. For example, we see
that our beliefs concerning plausible values for the mean of cell 1 (female speakers in informal
contexts, the reference level) should hover around 260, roughly spreading from about 240 to 280.
We also see that all values that receive substantial probability density for \texttt{context:pol} (the coefficient $\beta_{\text{pol}}$) are negative (as captured in the 95\% CI discussed above). In other words, the voice pitch of women in polite contexts is likely to be lower than in informal contexts. Zero is estimated to be a comparatively unlikely value for this parameter.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{pics/posterior_density_FE.pdf}
\caption[Posterior over coefficients in fixed-effects model]{Posterior density of parameter
values in the fixed-effects regression model. The thick red lines indicate the 95\%
credible intervals, i.e., the range of parameter values that it is reasonable to believe
in.}
\label{fig:Posteriors_FE}
\end{figure}
Now, here comes a nice gadget. Based on the samples obtained for \texttt{contextpol} ($\beta_{\text{pol}}$), it is very easy to estimate our belief that $\beta_{\text{pol}}$ is indeed negative. We simply have to calculate the proportion of samples that were negative. That's all. For instance, with the code below, which reveals that the posterior probability of the proposition that $\beta_{\text{pol}} < 0$ is about 0.99375. This is very close to 1!
\begin{minipage}[]{\textwidth}
\begin{lstlisting}[language=R]
# proportion of negative samples for parameter b_contextpol
# this number approximates P(b_contextpol < 0 | model, data)
mean(post_samples_FE$b_contextpol < 0)
>0.99375
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
As an interim summary, we have seen how to run a Bayesian regression analysis with the
\texttt{brms} package and how to deal with its output. We have also seen that the output can be interpreted in very intuitive ways (e.g., ``The probability of H1, given our model, priors, and data, is more than .99'').
Unfortunately, what we have not seen yet is what the model and data say about hypotheses 2 or 3. This is because there is no single parameter in the (dummy-coded) regression model that corresponds to the differences between cells 3 and 4 (for hypothesis 2) and cells 2 and 3 (for hypothesis 3). Notice that this problem is not specific to Bayesian analyses, but inherent in the way the regression coefficients were set up.
%
\marginnote[-1cm]{A potential way of testing different hypotheses of the kind we have set out
here, is to run different regression analyses, each with a different reference cell. This is
a rather unhandy work flow. It wouldn't help with hypothesis 3 either, which compares the
cell means in the design matrix ``diagonally'': there is no way of changing the reference level of either factor such that dummy coding gives us a single coefficient as the difference between cells 2 and 3.}
%
Fortunately, we can recover information about any derived measure from the obtained samples. Here's how:
Take hypothesis 3 which requires us to compare cells 2 and 3. The hypothesis states that
$\beta_0 + \beta_{\text{pol}} > \beta_0 + \beta_{\text{male}}$, which reduces to
$\beta_{\text{pol}} > \beta_{\text{male}}$. We can approximate the posterior probability that
this is true based on the samples that we obtained for the model in the same general way as
before, namely:
\begin{minipage}[]{1.1\textwidth}
\begin{lstlisting}[language=R]
# proportion of samples for which the mean of cell 2 was larger
# than that of cell 3
# this number approximates the quantity:
# P(b_contextpol > b_genderM | model, data)
mean(post_samples_FE$b_contextpol > post_samples_FE$b_genderM)
>1
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
Based on the posterior samples, the estimated probability is 1. That's a strong result. If the model was true, (given the data) our certainty that hypothesis 3 is true should be pretty much almost at ceiling.
To sum up, the Bayesian approach to regression modeling allows us to retrieve all direct comparisons between cells in a factorial design. It also allows us to retrieve quantitative information about our hypotheses in a way which is easy to communicate and understand. We can calculate the (estimated) posterior probability that a particular hypothesis holds.
\section{The \texttt{faintr} package}
To make the comparison of pairs of cells even easier and applicable for even bigger factorial designs, this tutorial comes with a little R package, the \texttt{faintr} package.
%
% \marginnote[-1.5cm]{The name \texttt{faintr} is indicative of the possibility that the package might break down unexpectedly (we might consider renaming after more extensive testing), but also alludes to ``\emph{fa}ctorial design'' and ``\emph{int}erpretation'' somehow.}
%
You can install the package from GitHub with the \texttt{devtools} package, as follows:
\begin{minipage}[]{1.3\textwidth}
\begin{lstlisting}[language=R]
# load 'devtools' package to allow installation from GitHub
library(devtools)
# install 'faintr' package from GitHub
install_github(
repo = "michael-franke/bayes_mixed_regression_tutorial",
subdir = "faintr")
# load the 'faintr' package
library(faintr)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
The \texttt{faintr} package provides two main helper functions.
%
\marginnote{More on the use of the \texttt{faintr} package can be found here: \url{https://tinyurl.com/faintr-vignette}.}
%
The function \texttt{post\_cells()}
takes as input the fitted regression model (technically: the \texttt{brmsfit} object returned
by function \texttt{brm}) and outputs samples for all design cell means, a comparison of all
design cells against each other, and a summary of each cell's inferred mean value.
For example, although the model fitted $\beta$-coefficients, we can reconstruct posterior estimates of each cell's means by typing:
\begin{minipage}[]{1.3\textwidth}
\begin{lstlisting}[language=R]
post_cells(model_FE)$predictor_values
\end{lstlisting}
\end{minipage}
\vspace*{-0.5cm}
\noindent Using these reconstructed samples, we can plot approximate posterior estimates for each cell, as in Figure~\ref{fig:Posteriors_cell_means}.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{pics/posterior_density_cell_means.pdf}
\caption[Posteriors over cell means in fixed-effects model]{Posterior density of cell means
in the fixed-effects regression model. The thick red lines indicate the 95\% credible
intervals, i.e., the range of parameter values that are reasonable to believe in (given the data and the model.}
\label{fig:Posteriors_cell_means}
\end{figure}
Another helpful function is \texttt{compare\_groups()}, which takes the fitted regression model as input together with a specification of which two (subsets of) cells to compare
against each other. For example, we can compare (diagonally) the cells for female speakers in
polite contexts with male speakers in informal contexts with this call:
\begin{minipage}[]{1.3\textwidth}
\begin{lstlisting}[language=R]
compare_groups(
model = model_FE,
lower = list(gender = "M", context = "inf"),
higher = list(gender = "F", context = "pol")
)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
\noindent The output looks like this:
\begin{minipage}[]{\textwidth}
\begin{rc}
Outcome of comparing groups:
* higher: gender:F context:pol
* lower: gender:M context:inf
Mean 'higher - lower': 88.74
95% CI: [ 68.09 ; 111 ]
P('higher - lower' > 0): 1
\end{rc}
\end{minipage}
\vspace{-0.5cm}
For the purposes of this tutorial, we gather the output of the \texttt{compare\_groups()} for
all of the three hypotheses of current interest in a convenience function (which the interested
reader will find in the source code, but which is of no further theoretical interest here):
\begin{minipage}[]{\textwidth}
\begin{rc}
> get_posterior_beliefs_about_hypotheses(model_FE)
# A tibble: 3 x 2
hypothesis probability
<chr> <dbl>
1 Female-polite < Female-informal 0.994
2 Male-polite < Male-informal 0.849
3 Male-informal < Female-polite 1
\end{rc}
\end{minipage}
\vspace{-0.5cm}
Based on the currently assumed model and data, we would conclude that hypotheses 1 and 3 are very likely true, but we can also see that there is quite a bit of uncertainty associated with hypothesis 2. The probability of H2 (given the data and the model) is only 0.849. So we should be sufficiently suspicious about H2 until more evidence is available.
%
\marginnote{It is generically reasonable to consider posteriors above 0.95 or 0.975 as large enough to warrant speaking of ``evidence in favor of a hypothesis''.}
\section{Priors}
One important difference between frequentist and Bayesian inference are priors. Priors are pieces of information about our data that we assume before actually looking at them. Specifying priors has several advantages, both technical and conceptual. First, we can use \textbf{regularizing priors} to implement soft constraints on what counts as a plausible parameter setting for the model, thereby reducing the computational resources needed to estimate the model parameters.
%
\marginnote[-1.5cm]{Defining regularizing priors is essential for more complex models which have to estimate many parameters. Regularizing priors can help the model to converge more quickly.}
%
As a conceptual advantage, priors can also express relevant subjective prior beliefs about the situation or problem at hand. For example, pitch values (and many other things we measure in nature) cannot be smaller than zero. Human pitch values are also limited to a certain range defined by physiological and bio-mechanical constraints on our laryngeal system. In adults, values larger than, let's say, 1000 Hz are very unlikely.
But wait a minute. Subjective beliefs? This is science. We are supposed to be objective, right?
We should have a heart of stone and be skeptical about possible relationships in the first
place. Practically, this means the following for us: We should not hesitate to make use of the
possibility to bring all the potentially relevant background knowledge to bear when analyzing
our data. The formalization of background knowledge should, of course, be made transparent and
be explicitly justified. So, any pieces of reasonably uncontroversial background knowledge
might well be included in the model. But, of course, when it comes to the specific hypotheses
we would like to test, we should \emph{not} engineer in the conclusions we would like to
eventually draw, no matter what our subjective beliefs (possibly inspired by hope) are! If our
research hypothesis is that female speakers lower their voices in polite contexts, we obviously
don't want to specify a prior that assumes the hypothesized relationship before having seen the
data. Instead, we can feed the model with a skeptical prior about the truth of our
research hypothesis. Since there is probably little doubt about an effect of gender on voice
pitch, the following explores what happens when we entertain a skeptical prior about the effect
of context (on female speakers).
% Concretely, we could believe \emph{a priori} that it is not
% likely that could be the assumption that there is likely \textit{no} relationship between speaker gender and voice pitch (i.e. the average difference is 0). Of course, there might be some variability around this average value (nature is messy after all), so we might want to assume that pitch differences between male and female speakers are normally distributed with an average of 0 and a standard deviation of, say, 50 Hz. Importantly, such a skeptical prior is conservative with regard to our research hypothesis. Our point of departure (before we observed new data) is to assume that there is no effect of gender on pitch. The data need to convince us otherwise.
How does that look in practice? --- First of all, you do not have to specify priors by hand; a call to \texttt{brm} will invoke generically defensible priors for the model. For example, to see the priors for the model fit we obtained above, which is stored in \texttt{model\_FE}, we could type and inspect the automatically assumed priors like this:
\begin{minipage}[]{\textwidth}
\begin{rc}
> prior_summary(model_FE)
prior class coef
1 student_t(3, 204, 83) Intercept
2 b
3 b contextpol
4 b genderM
5 b genderM:contextpol
6 student_t(3, 0, 83) sigma
\end{rc}
\end{minipage}
\vspace{-0.5cm}
This table tells us about \texttt{brm} default priors for the model. The intercept was sampled from a Student's $t$ distribution with a mean at 204, a standard deviation of 83 and 3 degrees of freedom. These parameters have been determined for you behind the scenes by looking at the distribution of observed pitch values in the data set.
\marginnote{Think of a Student's $t$ distribution as a normal distribution with thicker tails, where the thickness is determined by the degrees of freedom.}
%
Similarly, the prior for the standard deviation $\sigma$ is also a Student's $t$ distribution with suitable parameters determined from the data.
But the column with information about the priors over coefficients (``class $b$'' in the table above) is empty!
That does not mean that these priors are a secret.
It means that they have not been set.
When we do not set a prior in a Bayesian model, software like \emph{Stan}, which we use here in the background, assumes that any logically possible parameter value is equally likely.
This is variably called an \textit{unbiased prior}, a \textit{flat prior} or a \textit{maximum-entropy prior} because it encodes no bias in either direction at all.
Using unbiased priors for coefficients is a reasonable generic choice for \texttt{brm} because
it cannot know which hypotheses you want to test. But, since you (should) know which hypotheses
you care about, you can be more explicit, and use the priors for conservative inference.
\marginnote[-1.5cm]{You can specify priors for each class of parameters or every single
parameter of the model individually. To see all parameters, classes and default priors \emph{before} having to run a model, you can use the \texttt{get\_prior()} function of \texttt{brms}, which takes a model formula as input. (For comparison, the function \texttt{prior\_summary()} takes the fitted model as \texttt{brmsfit} object as input.)}
%
So,
let's define some priors for the model with fixed effects. Our goal is not to provide the
``best'' choice of prior here (which is highly debatable anyways), but to show one example of realizing
a skeptical stance towards a given hypothesis. Here, we therefore specify the priors for the
coefficient $\beta_{\text{pol}}$ as normal distributions centered around zero with a rather
small standard deviation of 20. This corresponds to the prior belief that there is likely no difference
between (female speakers') voice pitch in informal and polite contexts.
%
\marginnote{Notice that this only affects an effect of context of female speaker's voice
pitch, as long as we allow for the other coefficients to ``roam around freely'', since
effects of context on male speaker's pitch can be ``accommodated'' by the interaction term.}
%
\begin{minipage}[]{1\textwidth}
\begin{lstlisting}[language=R]
priorFE <- c(
# define skeptical prior for context effect on female speakers
prior(normal(0, 10), coef = contextpol)
)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
We add this prior to the model as an argument, like so:
%
\marginnote{The argument `control' here allows us to tweak the sampling procedure in order to ensure convergence. \texttt{brms} will encourage you to do so whenever you run into such convergence issues.}
\begin{minipage}[]{1\textwidth}
\begin{lstlisting}[language=R]
model_FE_prior = brm(formula = pitch ~ gender * context,
prior = priorFE, # add prior
data = politedata,
control = list(adapt_delta = 0.99),
seed = 1702
)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
We then calculate the probability for our hypotheses again.
\begin{minipage}[]{\textwidth}
\begin{rc}
> get_posterior_beliefs_about_hypotheses(model_FE_prior)
# A tibble: 3 x 2
hypothesis probability
<chr> <dbl>
1 Female-polite < Female-informal 0.951
2 Male-polite < Male-informal 0.834
3 Male-informal < Female-polite 1
\end{rc}
\end{minipage}
\vspace{-0.5cm}
\noindent We see that a very skeptical prior about H1 has decreased the posterior probability
of the hypothesis being true. But nonetheless, the data still suggest that H1's probability is
rather high. In this sense we can conclude that even if we were skeptical at the outset, the
data should nonetheless convince us that H1 is true.
\section{Model criticism}
So we know now how to extract relevant comparisons to quantify the evidence surrounding our hypotheses. We have also learned how to specify prior information. The next important step is being able to check if the model really reflects the observed data. A common and easy way to answer this question is to use so-called \textbf{posterior predictive checks}. These checks generate new hypothetical data, sampled from the so-called \textbf{posterior predictive distribution}. This distribution quantifies how likely we would expect to see a particular outcome if the experiment were repeated in the same way (given our posterior beliefs about parameters). We can compare samples from the posterior predictive distribution to the observed data. \texttt{brms} offers a neat function called \texttt{pp\_check()} for this. If the simulated data diverges systematically from the observed data, we should be concerned. It would suggest that the model does not capture some of important properties of the data.
To illustrate this point, let's run the model from above without the gender predictor.
\begin{minipage}[]{1\textwidth}
\begin{lstlisting}[language=R]
model_FE_noGender= brm(formula = pitch ~ context,
prior = priorFE,
data = politedata,
control = list(adapt_delta = 0.99),
seed = 1702
)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
We know that a big chunk of variation is accounted for by gender, so this model should be a
poor fit. Figure~\ref{fig:PPC} shows the output of the following function call:
\begin{minipage}[]{1\textwidth}
\begin{lstlisting}[language=R]
pp_check(model_FE_noGender, nsample = 100)
\end{lstlisting}
\end{minipage}
\vspace{-0.5cm}
If we run \texttt{pp\_check()} on the model with and without gender (we also specify how many samples we want
to compare the observations to), we can see that a model without \texttt{gender} (left panel)
fails to capture an important property of the data: having pitch values from men and women
leads to a bimodal distribution of pitch values (i.e. two bumps).
\begin{figure}
\centering
\includegraphics[width = 1\textwidth]{pics/pp_checks_plot.pdf}
\caption{Output of the posterior predictive check for a model \textbf{without} the predictor \texttt{gender} in the left panel and a model \textbf{including} the predictor \texttt{gender} in the right panel. Red lines represent distribution of data, grey lines represent 100 posterior samples.}
\label{fig:PPC}
\end{figure}
The model without gender overestimates the probability of very low and high pitch values, underestimates the probability of values that surround the two bumps and heavily overestimates the probability of values around 200 Hz.
The earlier model which takes \texttt{gender} into account looks quite a bit better (right panel). While still showing much uncertainty surrounding the two bumps in the observed distribution, it clearly captures the evidence better.
\marginnote{The \texttt{pp\_check()} can take additional arguments that allow you to display posterior predictive distributions for individual parameters}
%\begin{figure}[]
% \centering
% \includegraphics[width = 0.8\textwidth]{pics/pp_check_FE.pdf}
% \caption{Output of the posterior predictive check for a model with the predictor \texttt{gender}. Orange line represents distribution of data, grey lines represent 100 posterior samples.}
% \label{fig:PPC_with_gender}
%\end{figure}
\section{Adding random effects}
In our experiment, we measured pitch multiple times for each subject (since they produced
multiple sentences). We also have multiple measures for each sentence (as each sentence was
produced by multiple speakers). The observant reader might have already noticed this, but it is
probably worth reiterating that linear regression models make the crucial assumption that all
samples (data points) are independent of each other. But if two data points are produced by the
same participant, then observations will not necessarily be independent anymore. We need to
inform the model about these dependencies between observations. The way we’re going to handle
this is to add random effects to the model, just as we do in the frequentist framework. Random
effects are additional parameters that the Bayesian model estimates and that account for
dependencies between data points. The choice of the random effect structure is controversial.
This tutorial follows the approach advocated by \citet{barr2013random} to include the maximal
random effect structure justified by the design \citep[for a complementary view on random
effect specifications, see][]{matuschek2017balancing}. For our case here, we estimate how
individual sentences (\texttt{(1 | sentence)}) and individual subjects (\texttt{(1 | subject)})
differ in their overall pitch values (random intercepts). We also estimate how much pitch
values across sentences vary according to the context they appear in and according to the
gender of the speaker (as well as their interaction, i.e. \texttt{(0 + gender * context |
sentence)}). This so-called by-sentence random slope allows the interaction term to vary
between sentences. In other words, the impact of the predictors gender and context (and their