Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
179 lines (124 sloc) 3.53 KB
---
title: "Bayesian Modeling Exercises"
---
## Bayesian Workflow
1. Develop a Data Generating Process (DGP)
1. Generate _fake_ data from DGP
1. Fit a model to the DGP
1. Perform posterior checks
1. Sensitivity analysis if required
1. Fit your real data!
1. Make inferences on your real data
## Specifying a Few Distributions
### The Binomial Trial
```{r binom-setup}
n <- 1000
gender <-rep(x = 0:1, length.out = n)
income <- rnorm(n, 0, 1)
mu <- gender * 1.5 + income * 2
```
The we make it random!
```{r binom-dat}
y <- rbinom(n,1, prob = plogis(mu))
dat <- data.frame(gender =gender,
income = income,
y = y)
```
Then we can create a model using `brms`
```{r create-model}
library(brms)
(model <- bf(y ~ gender + income))
```
We can treat `model` just like any other R object. Additionally, (and this is the handy part of doing it this way), we can see what priors are available with the `get_prior` function
```{r get-priors}
get_prior(model, dat, bernoulli())
```
We can then set our priors
```{r set-prior}
my_priors <- c(
prior(normal(0, 0.5), class = "b", coef = "gender"),
prior(normal(0, 0.5), class = "b", coef = "income"))
```
And fit our new model!
```{r binom-fit}
fit_binom <- brm(model, my_priors,
data = dat,
family = bernoulli(),
inits = 1000, cores = 2, # These are important!
chains = 2, seed = 1234, refresh = 0) # Setting a seet is important for reprocibility
```
Other features `adapt_delta` and `max_treedepth` may need to be altered depending on convergence.
### Diagnostics
We check our trace plots
```{r trace-plot}
library(tidybayes)
library(bayesplot)
mcmc_trace(as.matrix(fit_binom))
```
And the centering of our parameters
```{r pairs-plot}
pairs(fit_binom)
```
And our course our values
```{r summary-fit}
summary(fit_binom)
```
### Time Series in `brms`
Let's make a simple time series data set
```{r ts-dat}
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- vector(length = n)
y[1] <- 5*x1[1] + 2*x2[1]
for(i in 2:n){
y[i] <-5*x1[i] + y[i-1] - 2*x2[i]
}
ts_dat <- data.frame(x1, x2, y)
```
Now we have to specify an additional term.
```{r}
library(brms)
fit_ar1 <- brm(y ~ x1 + x2,
data = ts_dat,
autocor = cor_ar(p = 1), # Tell me more?
inits = 1000, cores = 2, # These are important!
chains = 2, seed = 1234, refresh = 0) # Setting a seet is important for reprocibility
```
A PPC
```{r}
library(bayesplot)
color_scheme_set("darkgray")
pp_check(fit_ar1)
```
```{r}
summary(fit_ar1)
```
## Clinical Trial
Suppose we have some people in the experiment and we give them some kind of treatment. Then we measure the impact over time. See [here](https://michaeldewittjr.com/dewitt_blog/posts/2018-09-24-the-power-of-fake-data-simulations/) for more details.
```{r}
J <- 50 # number of people in the experiment
N_per_person <- 10 # number of measurements per person
person_id <- rep(1:J, rep(N_per_person, J))
index <- rep(1:N_per_person, J)
time <- index - 1 # time of measurements, from 0 to 9
N <- length(person_id)
a <- rnorm(J, 0, 1)
b <- rnorm(J, 1, 1)
theta <- 1
sigma_y <- 1
z <- sample(rep(c(0,1), J/2), J)
y_pred <- a[person_id] + b[person_id]*time + theta*z[person_id]*time
y <- rnorm(N, y_pred, sigma_y)
z_full <- z[person_id]
exposure <- z_full*time
data_1 <- data.frame(time, person_id, exposure, y)
```
And then we can fit the data
```{r}
fit_1 <- brm(y ~ (1 + time | person_id) + time + exposure, data=data_1)
```
And see if we can recover our effect.
```{r}
summary(fit_1)
```
You can’t perform that action at this time.