exercises_bayes.Rmd

---
title: "Bayesian Modeling Exercises"
---

## Bayesian Workflow

1. Develop a Data Generating Process (DGP)  
1. Generate _fake_ data from DGP 
1. Fit a model to the DGP 
1. Perform posterior checks
1. Sensitivity analysis if required 
1. Fit your real data! 
1. Make inferences on your real data 

## Specifying a Few Distributions

### The Binomial Trial

```{r binom-setup}
n <- 1000

gender <-rep(x = 0:1, length.out = n)

income <- rnorm(n, 0, 1)

mu <- gender * 1.5 + income * 2

```

The we make it random!

```{r binom-dat}
y <- rbinom(n,1, prob = plogis(mu))

dat <- data.frame(gender =gender, 
                  income = income, 
                  y = y)
```

Then we can create a model using `brms`

```{r create-model}
library(brms)
(model <- bf(y ~ gender + income))
```

We can treat `model` just like any other R object. Additionally, (and this is the handy part of doing it this way), we can see what priors are available with the `get_prior` function

```{r get-priors}
get_prior(model, dat, bernoulli())
```

We can then set our priors

```{r set-prior}
my_priors <- c(
  prior(normal(0, 0.5), class = "b", coef = "gender"),
  prior(normal(0, 0.5), class = "b", coef = "income"))
```

And fit our new model!

```{r binom-fit}
fit_binom <- brm(model, my_priors,
           data =  dat,
           family = bernoulli(), 
           inits = 1000, cores = 2, # These are important! 
           chains = 2, seed = 1234, refresh = 0) # Setting a seet is important for reprocibility
```

Other features `adapt_delta` and `max_treedepth` may need to be altered depending on convergence.

### Diagnostics

We check our trace plots

```{r trace-plot}
library(tidybayes)
library(bayesplot)
mcmc_trace(as.matrix(fit_binom))
```


And the centering of our parameters

```{r pairs-plot}
pairs(fit_binom)
```

And our course our values

```{r summary-fit}
summary(fit_binom)
```


### Time Series in `brms`

Let's make a simple time series data set

```{r ts-dat}
n <- 100

x1 <- rnorm(n)
x2 <- rnorm(n)

y <- vector(length = n)
y[1] <- 5*x1[1] + 2*x2[1]
for(i in 2:n){
  y[i] <-5*x1[i] + y[i-1] - 2*x2[i]
}

ts_dat <- data.frame(x1, x2, y)

```

Now we have to specify an additional term.

```{r}
library(brms)
fit_ar1 <- brm(y ~ x1 + x2,
           data =  ts_dat,
           autocor = cor_ar(p = 1), # Tell me more?
           inits = 1000, cores = 2, # These are important! 
           chains = 2, seed = 1234, refresh = 0) # Setting a seet is important for reprocibility
```

A PPC

```{r}
library(bayesplot)

color_scheme_set("darkgray")
pp_check(fit_ar1)

```


```{r}
summary(fit_ar1)
```

## Clinical Trial

Suppose we have some people in the experiment and we give them some kind of treatment. Then we measure the impact over time. See [here](https://michaeldewittjr.com/dewitt_blog/posts/2018-09-24-the-power-of-fake-data-simulations/) for more details.

```{r}
J <- 50  # number of people in the experiment
N_per_person <- 10 # number of measurements per person
person_id <- rep(1:J, rep(N_per_person, J))
index <- rep(1:N_per_person, J) 
time <- index - 1  # time of measurements, from 0 to 9
N <- length(person_id)
a <- rnorm(J, 0, 1)
b <- rnorm(J, 1, 1)
theta <- 1
sigma_y <- 1
z <- sample(rep(c(0,1), J/2), J)
y_pred <- a[person_id] + b[person_id]*time + theta*z[person_id]*time
y <- rnorm(N, y_pred, sigma_y)
z_full <- z[person_id]
exposure <- z_full*time
data_1 <- data.frame(time, person_id, exposure, y)
```


And then we can fit the data

```{r}
fit_1 <- brm(y ~ (1 + time | person_id) + time + exposure, data=data_1)
```

And see if we can recover our effect.

```{r}
summary(fit_1)
```