Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling new offsets from a model with random effects #191

Closed
tomwallis opened this issue Mar 2, 2017 · 36 comments

Comments

Projects
None yet
3 participants
@tomwallis
Copy link

commented Mar 2, 2017

Hi Paul,

In Bayesian power analysis, one wishes to

  1. generate sample datasets from a hypothetical or real model fit (for example, a fit to pilot data),
  2. fit a new model to each sampled dataset, then
  3. assess the proportion of samples in which a goal is achieved

(See second edition of Kruschke's book for more detail on the approach). This allows us to assess (e.g.) given a sampling plan of N observations, how often is our goal achieved?

The mechanical way to do this, outlined by Kruschke in his book, is to generate new datasets from some number of MCMC samples (which represent jointly-credible model parameters). Each MCMC sample provides one simulated dataset.

In the context of a model with one or more random effects, I believe this would mean sampling new random effects offsets from the model for a given number of samples from the random variable. It would be great if there was a way to do this easily from within brms.

As a more concrete example, consider a linear mixed effects model like:

y ~ x + (x | subj)

where x might be a covariate with some slope and subj is (for example) the index of each subject in an experiment. This is a "varying intercept, varying slopes" model with a random effect of subject on both fixed effects. We also allow the offsets to be correlated.

Imagine I fit this model in brms, and now I want to make predictions for new data. I can use the predict method to generate new y values, and I can sample from a new data frame, allowing new levels of subj:

prediction_data <- expand.grid(x = seq(-1, 1), subj = paste0("Sim", 1:10))

preds <- predict(fit, newdata = prediction_data, allow_new_levels = TRUE)

However, in this case the new subjects all receive the same prediction (you can verify this by looking at the fitted regression line). Differences in predict between subjects are only due to the measurement error of the model. That is, the predict method with new data is giving a prediction for a "new, average subject".

If I understand correctly, then doing a Bayesian power analysis appropriately in this setting would require sampling new subjects from the model. That is, in the example above, each of the 10 subjects would be given offsets to the fixed effects sampled from a zero-mean two-dimensional Gaussian, taking the estimated marginal variance and correlation structure given by fit into account.

I can imagine doing this by hand, but it would be lovely to have a general function to do this in brms (possibly as an argument to the predict method, in which random effects are "sampled" rather than "zeroed").

Or have I misunderstood what the same regression line for all random effects buys me?

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 2, 2017

There are two ways of zeroing out random effects. The first is just to plainly ignore them via re_formula = NA. The second is to define new levels and set allow_new_levels = TRUE. In this case, we include the variance of the random effects in the predictions. However, the mean stays at the population mean. This is intended as otherwise the same call to predict, fitted, or marginal_effects would look completely different depending on which values we drew from the posterior distribution of the random effects. Here is an example:

library(brms)
data("sleepstudy", package = "lme4")
fit <- brm(Reaction ~ Days + (Days | Subject), sleepstudy)
# exclude all random effects
marginal_effects(fit, re_formula = NA)
# include all random effects
marginal_effects(fit, re_formula = NULL)

So, by including the random effects in the predictions, we don't get predictions for a "new, average subject", but we include the uncertainty over subjects.

Now to your feature request: We might add an argument saying that new subjects should be sampled from the distribution of subjects (instead of including the uncertainty), which should allow doing the power analysis you have in mind. Any suggestions for the name of such an argument?

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 2, 2017

Thanks for clarifying the interpretation of random effects variance, and considering adding the feature.

As to an argument, what about

re_sample = FALSE

re_sample: A flag indicating whether to include random effects variance on predictions or to sample new levels. If FALSE (default), include random effects variance on predictions specified in re_formula. If TRUE, sample new random effects offsets from the posterior distribution. In this case, newdata must be provided, and it must contain at least one new level for the random effect. One sample from the random effect distribution is drawn for each new level of the random effect specified in newdata. This feature may be useful for conducting a Bayesian power analysis.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 2, 2017

Sounds great!

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2017

This is something we've been discussing doing for rstanarm too, so maybe it would make sense to make this a new function (instead of just a new argument)? That way we can put a generic in rstantools, brms and rstanarm can provide their own methods, and then from a user's perspective everything works the same regardless of which package they are using. @paul-buerkner What do you think?

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 2, 2017

Hmm... the thing is that it nicely fits into the existing functions from a conceptual and coding perspective (I am nearly finished implementing this feature). What we could consider is a new function that internally calls predict just with sample_new_levels (the new argument) fixed to TRUE just as I am doing it with posterior_predict, which is basically just a restricted version of predict.

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2017

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 2, 2017

In any case, I am happy to adopt the names of new methods you add in rstantools.

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2017

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 2, 2017

Here we go! :-) In the new dev version on github, predict etc. now supports the sample_new_levels argument. You can see it in action by repeatedly calling (with the above model):

me <- marginal_effects(fit, re_formula = NULL, sample_new_levels = TRUE)
plot(me, plot = FALSE)[[1]] + ylim(100, 500)

paul-buerkner added a commit that referenced this issue Mar 2, 2017

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Awesome as always, Paul. This works now for me.

One question: having a quick look through your commit code, in the extract_draws.R file, you seem to be randomly sampling one of the existing levels of the random effect? Is that right, or am I reading that section (405--427) incorrectly?

The behaviour I expected was to sample new levels from the (possibly multivariate) zero-mean Gaussian defining the random effects structure. That way, new levels could be observed in the simulated data, but in aggregate the data should have the same covariance structure as the estimated model.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

Yes, you are correct and this is something I wanted to discuss with you (and others who want to participate).

I think we have two options and both are not ideal.

  1. We can sample from a multivariate gaussian as you suggested. The problem is that the multivariate gaussian is merely a prior in the way that the shape of the random effects posterior distribution might not end up being normal at all. Of course, we usually get something not too unreasonable given that we use the posterior standard deviations and correlations, but we are not really sampling from the posterior in my opinion.

  2. We sample from the existing random effects. For enough random effects levels, we can accurately capture the posterior distribution of the random effects even if it is not normal. However, for small number of levels, our sampling is quite limited, but so is the sampling from the MV gaussian since with few levels we cannot get good estimates for the SDs and correlations.

I am grateful for any suggestions / comments on this and I am happy to adopt any solution that turns out to be most reasonable.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

Also, there might be another problem with the multivariate gaussian approach. When we draw just one posterior sample for a new person, everything is fine, but how do we draw multiple samples from the same new person? We cannot just sample from the MV gaussian again, as we will end up sampling another person. Maybe, I am overlooking something, though.

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Sampling from the MV Gaussian seems to be a better fit for the purposes I envision, but I can see the value of the other approach in some settings.

My use case is something like this: I've collected a small amount of pilot data, checking that the experimental apparatus and data pipeline are all running acceptably. This pilot data might have between two and five samples from a subject random effect, which is of course not enough to get a good estimate of the covariance structure. However, almost always the primary outcome I'm interested in has to do with the fixed effects coefficients. Therefore, I would like to be able to add the assumption of MV Gaussian to all the other assumptions that go into such a generative model for the purpose of power analysis, drawing new samples from the model to ask about goals (e.g. precision) for the fixed effects estimates. You're right that we're not really sampling from the true posterior then but an estimate of the posterior given the prior that it is MV Gaussian in the limit. You can also see here the limit of the second approach for my use case: if I only have between two and five samples, then resampling from only those cases is going to be very noisy. Both approaches will of course underestimate the true marginal variances, but this could be mitigated by a wide prior in the MV Gauss case (but not for the resampling existing levels case).

Regarding your second point: taking the approach of Kruschke, each MCMC sample is used to generate one simulated dataset. That is, there is one value for each fixed effect coefficient, one value for the sd of the random effect, one covariance, etc. Then it's just a matter of drawing MV Gauss samples from a distribution with those properties, giving the offsets for the particular dataset in question (centred around the single fixed-effects estimate for that MCMC sample). The next dataset is simulated from the next MCMC sample, etc. Thus, the simulated datasets in aggregate estimate the posterior variability, but the noise in each simulated dataset comes only from (i) sampling random effects and (ii) measurement error.

So for each unique level of subj in newdata, each simulated dataset is based on a single set of offsets (analogous to the BLUP in max likelihood mixed-effects) for that subject (who may then provide multiple observations, getting at within-person measurement error). In the next simulated dataset, the same subject level will get a different set of offsets.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

Ok, so for your use case we only need to draw one posterior sample per (exactly the same) subject anyway, right? Indeed, in this case, the MV gaussian approach seems to be the better option. This approach, will however, not work with multiple samples, since in this case the approach resulted in a similar inclusion of uncertainty as is done when sample_new_levels = FALSE instead of sampling consistently for one new subject.

@jgabry Do you have any thoughts on this?

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

By

only need to draw one posterior sample per (exactly the same) subject anyway

I take it you mean "per simulated dataset", and that there could be multiple observations / replicates coming from the same subject within a dataset, right?

Imagine the model is

y_ij = (alpha + a_j) + (beta + b_j)x_i + epsilon

for observation i of subject j. a and b are the random effects offsets for two fixed effects: intercept alpha and slope beta. For one simulated dataset (corresponding to one MCMC sample), we sample j = 1 : N subject offsets (a and b) from the multivariate gauss (whose covariance matrix or variances are given by the generating model), then generate i responses for each subject using those offsets. In a new simulated dataset, take a new MCMC sample and repeat (now with a different variance / covariance structure and fixed effects alpha and beta, determined by the posterior sample).

This is what makes sense to me for the use case of a Bayesian power analysis.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

Yeah, that is what I ment. To simulate the response a single new data set we want to go for, say,

predict(fit, newdata = data.frame(subject = c(1, 1, 1, 2, 2, 2), <more data>), nsamples = 1, 
        sample_new_levels = TRUE, summary = FALSE)

What about changing the sample_new_levels argument to accepting three option (names subject to debate of course)?

  • "uncertainty" (existing default)
  • "gaussian" (sample from the MV gaussian)
  • "old_levels" (sample from the old levels as discussed above)
@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Yes, something like the three argument options sounds like a workable solution.

Am I right that in the case like

sample_new_levels = "gaussian", nsamples = 10

I would effectively get 10 independent simulated datasets? That is, there would be ten MCMC chains sampled, each one leading to a different MV Gauss sample?

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

You are right that we get 10 independent simulated dataset. I wouldn't use the term "MCMC chains" here, since we are not using the MCMC methods to obtain the samples, but just (idependently) sample random effects from a multivariate normal distribution.

I will work on getting the "gaussian" option implemented. It is acutally less trivial than it sounds given the complexity of the models supported by brms, but I believe to have some code already lying around somewhere.

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Sorry, yes, that should have been "MCMC samples" rather than "chains".

I can imagine it's not trivial. Thanks for your work!

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 3, 2017

Clarification of this last point, since I think there's a slight confusion:

My understanding is that If I were to write

predict(fit, newdata = data.frame(subject = c(1, 1, 1, 2, 2, 2)), nsamples = 1, 
        sample_new_levels = TRUE, summary = FALSE)

I would get two new subjects, whose offsets are sampled from the same MCMC sample. That is, the MCMC sample defines a jointly-credible set of fixed effects, random effect SDs and covariances. For nsamples = 1, the two subject offsets are sampled from the same MVG, and I then observe three trials / replicates for each subject.

If I were to write:

predict(fit, newdata = data.frame(subject = c(1, 1, 1, 2, 2, 2)), nsamples = 10, 
        sample_new_levels = TRUE, summary = FALSE)

I should get ten independent datasets, each consisting of two subjects with three observations each, where each dataset reflects one MCMC sample.

Is that what you meant?

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 3, 2017

That is exactly what I ment.

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 3, 2017

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 3, 2017

I actually just recently had a conversation about this stuff. We (Andrew, Ben, and I) were asked about this by some people from the real world (i.e., not just academics but people who make consequential decisions!) who are interested in how to formalize something along the lines of Bayesian "power analysis". I think we're going to write up something formal (or at least some case studies) about this soon. Will definitely keep you posted.

Anyway, here's a brief summary of my thinking on this issue (which is admittedly still rather fuzzy):

  • I think we are (or at least should) be talking about something like "research design decision analysis" rather than "power analysis". At least, I don't think we should be trying to adapt the classical "power analysis" methods to a Bayesian context (or plug in estimates/simulations from Bayesian models into a standard power analysis workflow).

  • We should be able to do much better by shifting our attention to something along the lines of Gelman and Carlin's type S, type M framework, in particular type S (sign) error.

  • Ultimately the purpose of this whole process will typically (or really always) be to make a decision (or various decisions) about research design. So the kinds of simulations already discussed should just be an intermediate step in a decision analysis. The decision analysis part is a bit more complicated in that it requires that the user have some sort of loss function, but if the user can't formulate a loss function (or can't provide enough info for an approximate loss function to suggested/provided/etc.) then I would argue that they will not be able to come to the right conclusions themselves just using the results of the simulations anyway.

  • As you already know, this is simpler to think about when starting from scratch instead of starting from an already existing a pilot study. In the former case, there are several options. Coefficients can either be specified by the user or be drawn from the prior. Those will have different interpretations but both can be useful and both are pretty straightforward. In the case where there's a posterior distribution from a model fit to data from a pilot study then you run into the issue you've already been discussing. Personally I don't love either of the options that have been suggested for the reasons you've already mentioned (although, as you say, in some cases -- e.g. many observations of many levels of grouping variable -- it could be fine). I haven't thought this through much yet (so this might not make much sense), but I wonder if a better version of the MVN approach would be to approximate the full posterior (not just the RE stuff) by an MVN (on the unconstrained space), draw from that MVN, and then just use the margins you need. In some cases this certainly wouldn't give any real benefit compared to just doing it for the group-level stuff, but in other cases maybe it really would. Ben and I are actually already thinking about doing some research on approximating the full posterior with an MVN on the unconstrained space in order to be able to do Bayesian updating (i.e., to use an existing posterior as a prior when fitting a model after collecting additional data), which is currently not easy to do with Stan. For many Stan models it's certainly not clear that an MVN approximation would be good, but for rstanarm models and (most) brms models this MVN approximation to the posterior should often be OK or at least not horrible.

Anyway, I'm not sure that helped clarify anything at all ;). We're going work on developing a workflow for this sort of research design decision analysis and hopefully provide some useful tools for carrying it out. Whatever those tools end up being (might be an R package) they will be definitely be general enough to integrate with models fit using brms.

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 3, 2017

@tomwallis As I alluded to in my most recent response I would say that this list

In Bayesian power analysis, one wishes to

  1. generate sample datasets from a hypothetical or real model fit (for example, a fit to pilot data),
  2. fit a new model to each sampled dataset, then
  3. assess the proportion of samples in which a goal is achieved

omits the most important part: the decision! I think to do this well it is important to always be working backwards from the particular type of decision that ultimately needs to be made. It's not always the case that step 3 gives you the best answer. It depends on the decision task and your utility/loss function. Although maybe you're including all that in "assess"? Either way, not trying to be critical, I just like to emphasize the role of decision analysis because it's super important and I think often overlooked.

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 4, 2017

Hi @jgabry, thanks for all your interesting points. A few quick responses:

  • In the "power analysis" framework I briefly outlined (taken from Kruschke's book, 2nd edition), the "goal" can be anything (not just the equivalent of a "p-value" or false-positive rate). In my case for example, I want to answer the question "Assuming the pilot data and my priors are reasonable approximations to the world, and the fixed-effect I'm interested in is indeed positive, how many subjects will I need to collect for the effect to be reliably positive (according to some criteria) in 95% of the simulations?". In my understanding this is the same as the type-S error rate that Gelman and Carlin discuss. I could put a full decision-theoretic loss function on top of that to get the best answer given my weights for "true sign" vs "false sign" but this seems like overkill to me (at least for my time frame for planning this experiment). I'm really just interested in a ballpark figure for a "high powered" study. Is it 5 subjects? 10? 50?

  • I'm very excited about the idea that Bayesian updating might get much easier soon. Fits neatly into ideas like sequential testing, which I want to look into more in the future.

  • A full package to aid Bayesian decision analysis would be really amazing. It's an exciting time to be an experimentalist who can piggy-back on all the good work you guys (Paul, and the Stan team) are doing.

paul-buerkner added a commit that referenced this issue Mar 5, 2017

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 5, 2017

@tomwallis Setting sample_new_levels = "gaussian" should now work as discussed.

@jgabry

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2017

@tomwallis thanks for the comments. What you describe seems totally reasonable and a useful thing to do.

Here's a simple example of what I was thinking about (as a result of some recent conversations on the topic), but it could be generalized pretty broadly. Suppose you have to decide between two possible designs that have different costs (monetary, political, whatever goes into the utility/loss function). For instance, maybe the decision is whether to randomize at the hospital/school/etc. level or at the patient/student/etc. level. We can set up a decision that assumes the designs have the same "power" or type "S" error (which probably means factoring in that they require different numbers of participants at the different levels to achieve that same "power"/error) so that we can just focus on the expected difference in the utility/loss of the decisions. In the end this lets gives a principled way to make the decision and lets us talk about the type S error probability for the decision (i.e., for the estimated difference between the designs).

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 6, 2017

@paul-buerkner thanks so much. I will test it this afternoon.

@jgabry interesting problem. I'm looking forward to reading more from the Stan / Gelman team on these issues.

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 6, 2017

I just started playing with the sample_new_levels = "gaussian" argument and I get the following error:

Error in `*tmp*`[[i]] : subscript out of bounds

Here's the traceback:

15.
extract_draws_re(new_ranef, args, sdata = draws$data, nlpar = nlpar, 
    sample_new_levels = sample_new_levels) 
14.
extract_draws.btl(fit = structure(list(formula = structure(list(
    formula = correct ~ model * var5_norm + (model * var5_norm | 
        subj), family = structure(list(family = "bernoulli", 
        link = "logit"), .Names = c("family", "link"), class = c("brmsfamily",  ... 
13.
(function (x, ...) 
{
    UseMethod("extract_draws")
})(fit = structure(list(formula = structure(list(formula = correct ~  ... 
12.
do.call(extract_draws, c(args, more_args)) 
11.
extract_draws.brmsfit(x = structure(list(formula = structure(list(
    formula = correct ~ model * var5_norm + (model * var5_norm | 
        subj), family = structure(list(family = "bernoulli", 
        link = "logit"), .Names = c("family", "link"), class = c("brmsfamily",  ... 
10.
(function (x, ...) 
{
    UseMethod("extract_draws")
})(x = structure(list(formula = structure(list(formula = correct ~  ... 
9.
do.call(extract_draws, draws_args) 
8.
predict.brmsfit(fit, newdata = X, allow_new_levels = TRUE, sample_new_levels = "gaussian", 
    summary = FALSE, nsamples = n_sims) 

I can work on a minimal example if needed.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 6, 2017

A reproducible example would be great, as I don't see this error with my own models.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 6, 2017

I messed up index names and hopefully it is fixed now. Can you try it again?

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 7, 2017

Thanks Paul, that seems to be working now. Hope the data I sent helped and that you didn't have to dig through that traceback.

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 7, 2017

It helped, thanks for the data :-) I didn't look at the traceback actually, as the error message was not very helpful anyway, as it just told me that there was a problem with the assignment of a subset somewhere in the function. Debug mode is much more efficient to find problems.

@paul-buerkner paul-buerkner added this to the v1.5.1++ milestone Mar 7, 2017

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 26, 2017

@tomwallis Anything else we should consider before closing this issue?

@tomwallis

This comment has been minimized.

Copy link
Author

commented Mar 27, 2017

Nothing from my side. It seems to be working (but I haven't tested extensively). Perhaps it's worth implementing a simple test that the sampled offsets look like the correct SD? Is this easy to do from the brms tests side?

@paul-buerkner

This comment has been minimized.

Copy link
Owner

commented Mar 27, 2017

Sounds reasonable. I will add such a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.