/
ex_08.Rmd
591 lines (438 loc) · 18.2 KB
/
ex_08.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
---
title: "Application to regression IV - Leveling up to hierarchical models"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Application to regression IV - Leveling up to hierarchical models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = TRUE,
eval = FALSE,
message = FALSE
)
```
```{r setup}
# Packages
library(learnB4SS)
library(modelr)
library(brms)
library(tidyverse)
library(tidybayes)
library(bayestestR)
# Data
data(polite)
```
# Walkthrough
## Introduction
Until now, we have only dealt with very simple models, most of which are just not particularly relevant for what we actually do in our research, right?
So in this section, we will level up.
We will add additional parameters to our regression in the form of fixed effects and random effect.
Sounds much more like the stuff you want to do, right?
We are going to estimate these parameters and specify appropriate priors for them.
After this session you will be much closer to being an operational Bayesian for your actual research.
## A simple model
Okay let's think about this. Here is the model that we have so far looked at alongside its priors:
```{r simpleModel_priors}
# specify priors
priors_simple <- c(
# prior for the Intercept (= the reference level)
prior(normal(0, 15), class = Intercept),
# prior for the fixed effect coefficient for polite attitude
prior(normal(0, 10), class = b, coef = attitudepol),
# prior for the residual variance
prior(cauchy(0, 1), class = sigma)
)
```
```{r simpleModel}
# specify model
simple_model <- brm(
articulation_rate ~ attitude,
data = polite,
prior = priors_simple,
family = gaussian
)
```
Brms uses a gaussian link function by default (we made that explicit above), i.e. we assume that the model residuals are normally distributed.
But is that really the case?
We have a measurement that is bound by 0, because there can't be negative values for articulation rate, right?
It is likely that articulate rate data are skewed to the right, because values can vary more toward higher values than toward lower values.
Brms allows us to critically evaluate our model fit by comparing the model using so-called posterior predictive checks.
Posterior predictive checks are, in simple words, "simulating replicated data under the fitted model and then comparing these to the observed data" ([Gelman & Hill 2007: 158](http://www.stat.columbia.edu/~gelman/arm/)).
So, you use posterior predictive checks to investigate whether there are systematic discrepancies between the observed and simulated data.
```{r pp_check}
# posterior predictive check
pp_check(simple_model, nsamples = 100)
```
Looking at the plot, you see a thick dark blue line which is the data, and bunch of light blue lines which are simulated data based on your model.
The model assumes a normal distribution, so the posterior draws are much more symmetric than the actual data, thus we notice a discrepancy between model and data.
These situations are common for measurements that are bound by zero (e.g. duration or response latency). To account for this, we can use a different link function (specified with the `family` argument).
A more appropriate model would use a log normal distribution for articulation rate.
When we change the family, the model will fit the measurement in log space, so we will have to change the priors. Let's throw in some numbers and run a prior_predictive check, i.e. running the model with only the prior information (`sample_prior = "only"`). We also need to specify the `lognormal` family when running the model.
```{r simpleModel_priors_log}
# specify priors in log space
priors_simple_log <- c(
# prior for the Intercept (= the reference level)
prior(normal(0, 2), class = Intercept),
# prior for the fixed effect coefficient for polite attitude
prior(normal(0, .25), class = b, coef = attitudepol),
# prior for the residual variance
prior(cauchy(0, 0.1), class = sigma)
)
```
```{r simpleModel_priors_log_prediction}
# specify model
simple_model2_prior <- brm(
articulation_rate ~
attitude,
data = polite,
prior = priors_simple_log,
# specify that the model should only estimate the posterior based on the information in the prior
sample_prior = "only",
# specify lognormal family function
family = lognormal
)
```
```{r simpleModel_priors_log_plot}
# quick and dirty plot on the original scale
conditional_effects(simple_model2_prior)
```
Well, our priors are very weakly informative and cover more than what we consider a possible range for articulation rate. However it constraints the parameter space substantially already. Good enough.
Let's refit the model and specify the `lognormal` family.
```{r simpleModel_lognormal}
# specify model
simple_model2 <- brm(
articulation_rate ~
attitude,
data = polite,
prior = priors_simple_log,
family = lognormal
)
```
And check the model fit again:
```{r pp_check2}
# posterior predictive check
pp_check(simple_model2, nsamples = 100)
```
Much better, right?
So by critically evaluating our model assumptions and the actual data we discovered that
a `lognormal` distribution is a much better assumption about the underlying generative model (i.e. how the data came to be).
Fantastic!
Let's have a look at the results (remember that the results are now log transformed)
```{r simpleModel_post}
# Extract posterior coefficient and plot
simple_model2 %>%
spread_draws(b_attitudepol) %>%
# plot
ggplot(aes(x = b_attitudepol)) +
stat_halfeye() +
geom_vline(xintercept = 0, lty = "dashed") +
labs(x = "log(articulation rate)")
```
If you want to plot something more traditional, i.e. the posterior values for polite and informal speech productions, we can use some neat functions from other packages.
This way we can plot the data in log space and compare the two conditions immediately.
```{r traditional_plotting}
# Extract posterior and plot predicted values for both levels
polite %>%
data_grid(attitude) %>%
add_predicted_draws(simple_model2) %>%
# plot
ggplot(aes(x = .prediction, y = attitude, fill = attitude)) +
stat_halfeye() +
scale_fill_manual(values = c("#8970FF", "#FFA70B")) +
labs(x = "articulation rate")
```
## Adding predictors
This model estimates the effect of `attitude` on articulation rate (`articulation_rate`).
That.is.it.
But what if we also want to estimate the differences between attitude across different experimental tasks?
Let's add `task` as a predictor and add a prior for it as well.
Let's stick to weakly informative priors again, centered on zero (we expect no difference of articulation rate between different tasks and acknowledge the possibility of some variance around that value).
```{r multiple_model_get_prior}
# get prior
get_prior(
formula = articulation_rate ~ attitude + task,
data = polite, family = lognormal
)
```
```{r multiple_model_priors}
# specify priors
priors_multiple <- c(
prior(normal(0, 2), class = Intercept),
prior(normal(0, .25), class = b, coef = attitudepol),
prior(normal(0, .25), class = b, coef = tasknot),
prior(cauchy(0, .1), class = sigma)
)
```
```{r multiple_model_run}
# tun model
multiple_model <- brm(
articulation_rate ~ attitude + task,
data = polite,
prior = priors_multiple,
family = lognormal)
summary(multiple_model)
```
Looking at the summary of the model, we get a table with two coefficients for the population parameters `attitudepol` and `tasknot`.
Let's remind ourselves that these reflect the change in articulation rate from the reference level (attitude = informal (`inf`), task = dialog completion task (`dct`)) to attitude = polite (`attitudepol`) and to task = note (`tasknot`), respectively.
We can inspect posteriors for both predictors just like we did before:
```{r multiple_model_post}
# Extract posterior coefficient attitudepol
post_multiple_attitude <- multiple_model %>%
spread_draws(b_attitudepol, b_tasknot)
ggplot(post_multiple_attitude) +
stat_halfeye(aes(x = b_attitudepol)) +
labs(title = "posterior for effect of attitude",
x = "log(articulation rate)") +
geom_vline(xintercept = 0, lty = "dashed")
# Extract posterior coefficient tasknot
ggplot(post_multiple_attitude) +
stat_halfeye(aes(x = b_tasknot)) +
labs(title = "posterior for effect of task",
x = "log(articulation rate)") +
geom_vline(xintercept = 0, lty = "dashed")
```
Both posterior distributions are very clearly located to the left of zero, suggesting that there is both attitude and task affect articulation-rate to some extend (given the model, the data, and the prior assumptions)
## Adding random effects
So running Bayesian regression models is really not much different from running frequentist regression models.
Practically, the only difference so far is that we specify prior knowledge.
Conceptually, we obviously interpret the output differently in terms of the inference we draw, but this should be business as usual for you.
Now, we probably all suspect that this model is not quite appropriate for the data, right? It does not take into account the fact that data points are not independent from each other.
There were multiple speakers that produced multiple data points, so observations from these speakers are not independent and we need to take that into account for robust inference.
So let's also add an appropriate random effect structure, including a by-subject random intercept as well as a by-subject random slope for attitude.
We can write down the code to run this model easily, because we should be familiar with the `lme4` syntax.
`brms` syntax handles random effect structure exactly like `lme4` like so:
```{r complexModel_formula}
# model specification
complex_model_formula =
articulation_rate ~ attitude + task +
# add random intercept
(1 | subject) +
# add random slope
(0 + attitude | subject)
```
Looks familiar?
But what about priors?
Do we need to specify priors for these new elements as well?
And if so how?
Generally, we encourage you to specify priors for all elements of your model, so let us try this as well.
```{r complexModel_get_priors}
# get prior
get_prior(
articulation_rate ~ attitude + task +
(1 | subject) +
(0 + attitude | subject),
data = polite,
family = lognormal
)
```
We need to specify four parameters for the Group-Level Effects.
Three of them are variance components that we will specify with half-cauchy distributions, just like we did for the residual variance.
The final parameter is a correlation coefficient of the group level effects and will be specified using the Lewandowski-Kurowicka-Joe prior (LKJ).
```{r complexModel_priors}
# specify priors
priors_complex <- c(
prior(normal(0, 2), class = Intercept),
prior(normal(0, 0.25), class = b, coef = attitudepol),
prior(normal(0, 0.25), class = b, coef = tasknot),
# specify weakly informative prior for the random effects (slopes)
prior(cauchy(0, 0.1), class = sd, coef = Intercept, group = subject),
prior(cauchy(0, 0.1), class = sd, coef = attitudeinf, group = subject),
prior(cauchy(0, 0.1), class = sd, coef = attitudepol, group = subject),
# specify weakly informative prior for the correlation between random intercept and slope
prior(lkj(2), class = cor, group = subject),
prior(cauchy(0, 0.1), class = sigma)
)
```
That is it. Now we can run the model. And we will add one new argument to the `brm()` function: The `seed` argument allows us to set a random seed. The sampling procedure has a random initial state, so if you run this model on your machine, you will get slightly different results from someone else. With `seed` we can fix this initial state and obtain the exact same results. This is great, because we can make our results fully reproducible that way.
```{r complexModel}
# Run model
complex_model <- brm(
articulation_rate ~ attitude + task +
# add random intercept
(1 | subject) +
# add random slope
(0 + attitude | subject),
data = polite,
prior = priors_complex,
family = lognormal,
# set seed
seed = 999
)
summary(complex_model)
```
Let us walk through the summary here.
We now have one part of the table called "Group-Level Effects".
This part of the table gives us the models estimates for four parameters:
How much subjects' baseline varies (`sd(Intercept)`), how much they vary in the informal condition (`sd(attitudeinf)`), how much subjects vary in the polite condition (`sd(attitudepol)`), and what the correlation between `sd(attitudeinf)` and `sd(attitudepol)` is.
As before, for all of these parameters, we receive an `Estimate` which is the mean of the posterior distribution and a range of plausible values within the 95% Credible Interval (`l-95% CI` - `u-95% CI`). Except for the correlation, the estimates are bound by 0, i.e. the variance can only be positive.
The correlation coefficient can vary between -1 and 1.
The second part of the summary table, i.e. the Population-Level Effects, did not change and summarizes our regression coefficients for the fixed effects.
Just like before.
There we go. We have run a linear mixed effects model within the Bayesian framework.
We specified priors for both random and fixed effects.
What is left is interpreting the output.
```{r complexModel_post}
# Extract posterior coefficient politeness
post_complex <- complex_model %>%
spread_draws(b_attitudepol, b_tasknot)
# plot
ggplot(post_complex) +
stat_halfeye(aes(x = b_attitudepol))
labs(x = "log(articulation rate)") +
geom_vline(xintercept = 0, lty = "dashed")
```
## Making inference
Now let's use these posteriors to draw probability inferences.
What is the 95% credible interval for the `attitudepol` coefficient and what is its probability of direction?
```{r p_direction}
# extract 95% HDI
hdi(post_complex$b_attitudepol)
cat("\n")
# extract probability of direction
p_direction(post_complex$b_attitudepol)
```
Fantastic! So our best estimate of the effect of attitude has a 95% CrI between [-0.12, -0.01] and a 0.98 probability of being negative.
Although the majority of plausible values are negative, positive values are still plausible (given the data, the model, and the priors).
# Exercise
Now it's your turn: Our goal is it to run the following model: `f0mn ~ attitude + (1 | subject) + (0 + attitude | subject)`
(a) First, see which priors we need to specify using the `get_prior()` function.
(If you are unsure, see our code examples above)
```{r get_prior_ex}
# get prior
get_prior(
formula = ...,
data = ...
)
```
(b) Now specify weakly informative priors for all parameters.
For the intercept, chose a normal prior with mean = 0 and sd = 500.
For simplicity sake, let us just go for normally distributed priors for all other parameters (mean = 0, sd = 100)
```{r get_prior_ex_1}
# specify priors
priors_ex <- c(
...
)
```
(c) Now run the model with your specified priors.
Use `seed = 1111` in order for all of you to get the same results.
```{r get_prior_ex_2}
# Run model
ex_model <- brm(
formular = ...,
data = ...,
prior = ...,
seed = 1111
)
```
(d) Extract the posteriors for the coefficient attitudepol and plot it.
```{r ex_model_post}
# Extract posterior coefficient politeness
ex_post <- ex_model %>%
spread_draws(...)
# plot
ggplot(...) +
...
```
(e) Now let's use the posteriors to draw probability inferences.
What is the 95% credible interval for the `attitudepol` coefficient and what is its probability of direction?
```{r ex_inference}
# extract 95% HDI
hdi(...)
# extract probability of direction
p_direction(...)
```
(f) But hold on. Is the model fit okay? Critically check the model fit with `pp_check()`. Do you think the model and the priors capture the generative process appropriately? What might be the issue? Can you fix it?
<br>
<!-- Solutions -->
<!-- (a)
get_prior(
formula =
f0mn ~
attitude +
(1 | subject) +
(0 + attitude | subject),
data = polite
)
-->
<!-- (b)
priors_ex <- c(
prior(normal(0, 500), class = Intercept),
prior(normal(0, 100), class = b, coef = attitudepol),
# specify weakly informative prior for the random effects (slopes)
prior(normal(0, 100), class = sd, coef = Intercept, group = subject),
prior(normal(0, 100), class = sd, coef = attitudeinf, group = subject),
prior(normal(0, 100), class = sd, coef = attitudepol, group = subject),
# specify weakly informative prior for the correlation between random intercept and slope
prior(lkj(2), class = cor, group = subject),
prior(normal(0, 100), class = sigma)
)
-->
<!-- (c)
ex_model <- brm(
f0mn ~
attitude +
(1 | subject) +
(0 + attitude | subject),
data = polite,
prior = priors_ex,
seed = 1111,
)
-->
<!-- (d)
ex_post <- ex_model %>%
spread_draws(b_attitudepol)
ggplot(ex_post) +
stat_halfeye(aes(x = b_attitudepol)) +
labs(x = "fundamental frequency in Hz")
-->
<!-- (e)
hdi(ex_post$b_attitudepol)
p_direction(ex_post$b_attitudepol)
-->
<!-- (f)
pp_check(ex_model, nsamples = 100)
Doesn't capture bimodality well and over estimates the probability of value between 0 and 50.
One reason for this is that f0 is also bound to zero and very low values are virtually impossible.
Again, a lognormal link function is probably more appropriate.
priors_ex_log <- c(
prior(normal(0, 6), class = Intercept),
prior(normal(0, 3), class = b, coef = attitudepol),
# specify weakly informative prior for the random effects (slopes)
prior(cauchy(0, .1), class = sd, coef = Intercept, group = subject),
prior(cauchy(0, .1), class = sd, coef = attitudeinf, group = subject),
prior(cauchy(0, .1), class = sd, coef = attitudepol, group = subject),
# specify weakly informative prior for the correlation between random intercept and slope
prior(lkj(2), class = cor, group = subject),
prior(cauchy(0, .1), class = sigma)
)
Run model for priors only
ex_model_log_priors <- brm(
f0mn ~
attitude +
(1 | subject) +
(0 + attitude | subject),
data = polite,
sample_prior = "only",
family = lognormal,
prior = priors_ex_log,
seed = 1111
)
conditional_effects(ex_model_log_priors)
Run model
ex_model_log <- brm(
f0mn ~
attitude +
(1 | subject) +
(0 + attitude | subject),
data = polite,
family = lognormal,
prior = priors_ex_log,
seed = 1111
)
pp_check(ex_model_log, nsamples = 100)
Much better!
-->