question: simulate 'outcome' data via prior predictive #281

mikejacktzen · 2017-10-26T17:39:29Z

I was wondering if it would be feasible to implement the prior predictive into brms functionality?

The end goal use case would be for users who want to simulate data from a complex speced out model with brms+stan. As opposed to using observed data (eg left hand side observed outcome) to simulate from the posterior from a speced out model with brms+stan

More motivating context, it would be used similarly to mgcv::gamSim() in the example of the brms vignette

https://cran.r-project.org/web/packages/brms/vignettes/brms_distreg.html

dat_smooth <- mgcv::gamSim(eg = 6, n = 200, scale = 2, verbose = FALSE)

I'd imagine you can do something like

dat_real = cbind(y,x)
dat_no_y = dat_real[,-1]
# no outcome into prior_pred
brms::brm(data=dat_no_y,prior_pred=TRUE, bf(y~s(x)))

I think the rstanarm guys brainstormed something similar

https://groups.google.com/forum/#!msg/stan-users/5v7fuGmuqy8/bWHeO_n9BgAJ

https://www.rdocumentation.org/packages/rstanarm/versions/2.14.1/topics/stan_betareg

Here's slides for general ref

http://personal.strath.ac.uk/gary.koop/handout_geweke.pdf

The text was updated successfully, but these errors were encountered:

mvuorre · 2017-10-26T20:44:19Z

Try brm(..., sample_prior = "only").

mikejacktzen · 2017-10-26T21:17:10Z

that seems to work if the data you pass into brm() contains the observed outcome y

But if the dataset passed to brm() does not have the outcome, it errors out. So it seems to be a chicken and egg problem.

The use case is that the dataset you pass into brm() should not have the outcome, so that's why you use brm() to simulate the outcome

edit

This is definately low priority. I guess a workaround is to just make some temporary outcome and attach it to the dataset that you pass into brm() to trick it.

But you have to make sure that sample_prior = "only" will then ignore the 'likelihood' of the temporary outcome you attached

dat_full = iris
out_yes_y = brms::brm(chains=1,iter=100,
                      sample_prior = 'only',
                      data=dat_full,
                      prior = set_prior("normal(0,5)"),
                      formula = Sepal.Length ~ -1 + . )
# gets some output

dat_no_y = dat_full[,-1]  # drop "Sepal.Length"

out_no_y = brms::brm(chains=1,iter=100,
                     sample_prior = 'only',
                     data=dat_no_y,
                     prior = set_prior("normal(0,5)"),
                     formula = Sepal.Length ~ -1 + . )

# Error: The following variables are missing in 'data':
# error, needs explicit y outcome 'Sepal.Length'

paul-buerkner · 2017-10-26T21:25:34Z

Just use some response to avoid the error. It won't matter anyway when sample_prior = "only". Am 26.10.2017 22:17 schrieb "mikejacktzen" <notifications@github.com>:

…

that seems to work if the data you pass into brm() contains the observed outcome y But if the dataset passed to brm() does not have the outcome, it errors out. So it seems to be a chicken and egg problem. The use case is that the dataset you pass into brm() should not have the outcome, so that's why you use brm() to simulate the outcome dat_full = iris out_yes_y = brms::brm(chains=1,iter=100, sample_prior = 'only', data=dat_full, prior = set_prior("normal(0,5)"), formula = Sepal.Length ~ -1 + . ) # gets some output dat_no_y = dat_full[,-1] # drop "Sepal.Length" out_no_y = brms::brm(chains=1,iter=100, sample_prior = 'only', data=dat_no_y, prior = set_prior("normal(0,5)"), formula = Sepal.Length ~ -1 + . ) # Error: The following variables are missing in 'data': # error, needs explicit y outcome 'Sepal.Length' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMVtAEKSfzUogf4vt81SXh79nZMd0G7Pks5swPbWgaJpZM4QH8ts> .

mikejacktzen · 2017-10-26T21:32:22Z

Yeah I snuck in an edit before the response where I suspected attaching the fake outcome on the passed in dataset would be a workaround.

brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only")

Is it true that sample_prior = 'only' will then ignore the likelihood contribution of 'y_fake'?
If so, I think this would then achieve the goal of 'forward simulation' to get y_sim from the prior predictive.

paul-buerkner · 2017-10-26T21:35:21Z

It will ignore the likelihood contribution (that's basically the only thing sample_prior = "only" does). Keep in mind though that some default priors on random effects use the response to choose an appropriate scale. Am 26.10.2017 22:32 schrieb "mikejacktzen" <notifications@github.com>: Yeah I snuck in an edit before the response where I suspected attaching the fake outcome on the passed in dataset would be a workaround. brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only") Is it true that sample_prior = 'only' will then ignore the likelihood contribution of 'y_fake'? If so, thus would then achieve the goal of 'forward simulation' to get y_sim from the prior predictive. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMVtACndooX4iOqDEbxKsuut8CqpQT2hks5swPpngaJpZM4QH8ts> .

mikejacktzen · 2017-10-27T16:59:16Z

I think the last question related to all of this is, how do you get outcome predictions.
So far, all the steps return draws from the right hand side parameters via forward simulation of the prior.

My assumption (and question; is my assumption correct?) is that we can just use ?predict.brmsfit to get the left hand side outcome (which under the hood combines the right hand side terms)

The documentation and name says brms.predict() is for posterior predictive draws. But i think if we
used

predict(brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only"))

This should imply/reduce to (and return) the 'prior predictive' of the left hand side outcome, since the sample_prior='only' option during the fitting stage masked out the likelihood.

If this is not the case, i think there may need to be a symmetric sample_prior="only" argument in the predict step

predict(...,sample_prior="only")

But i have a feeling this is unnecessary

paul-buerkner · 2017-10-27T17:17:21Z

As you say, just using predict (possibly with summary = FALSE) is the way to go. Am 27.10.2017 17:59 schrieb "mikejacktzen" <notifications@github.com>: I think the last question related to all of this is, how do you get outcome predictions. So far, all the steps return draws from the right hand side parameters via forward simulation of the prior. My assumption (and question; is my assumption correct?) is that we can just use ?predict.brmsfit to get the left hand side outcome (which under the hood combines the right hand side terms) The documentation and name says brms.predict() is for posterior predictive draws. But i think if we used predict(brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only")) This should imply/reduce to (and return) the 'prior predictive' of the left hand side outcome, since the sample_prior='only' option during the fitting stage masked out the likelihood. If this is not the case, i think there may need to be a symmetric sample_prior="only" argument in the predict step predict(...,sample_prior="only") But i have a feeling this is unnecessary — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMVtAJ8OfXejPBl9v0jLi0S0RNFfBKXAks5swgvlgaJpZM4QH8ts> .

mikejacktzen · 2017-10-27T17:53:41Z

That's great!

And in hindsight, the requirement of needing the outcome (fake or not) to be part of the data passed into brm(data_with_y,sample_prior="only") seems to have positive consequences during the predict() stage.

Since the program would need to know the 'structural form' of the outcome in order to assemble right hand side simulations to produce left hand side simulated outcomes.

torkar · 2018-05-28T13:20:37Z

Paul,

I'd like a clarification to this. Say I fit a model fit (w/ sample_prior='only'). I expect that when I then do ppc_dens_overlay(y = orig_y, yrep=posterior_predict(fit, draws=25)), I would get a plot where my original y is plotted together with 25 curves taken from the model which disregards the likelihood. However, what I instead get is Error in validate_yrep(yrep, y) : NAs not allowed in 'yrep'.

Or did I completely misunderstand the above discussion and I'm only able to plot comparisons for each parameter individually since there's no likelihood being used?

What I really want to do is a sanity check:

Draw parameter values from priors
Generate data based on those parameter values
Fit model to generated data
Check fit is reasonable

as per pp. 12 in http://mc-stan.org/workshops/stancon2018_intro/Bayesian%20workflow.pdf

I hope this makes sense...

paul-buerkner · 2018-05-28T13:25:07Z

Your general workflow seems reasonable. We have to find out why your posterior_predict call contains NAs. Please open a new thread on http://discourse.mc-stan.org/ and provide more details about the particular model you are fitting.

torkar · 2018-05-28T13:53:34Z

http://discourse.mc-stan.org/t/use-of-sample-prior-only-in-brms/4339

paul-buerkner added the question label Oct 26, 2017

mikejacktzen changed the title ~~feature request: simulate 'outcome' data via prior predictive~~ question: simulate 'outcome' data via prior predictive Oct 27, 2017

paul-buerkner closed this as completed Oct 27, 2017

florianhartig mentioned this issue Apr 22, 2018

Check compatibility with brms florianhartig/DHARMa#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: simulate 'outcome' data via prior predictive #281

question: simulate 'outcome' data via prior predictive #281

mikejacktzen commented Oct 26, 2017

mvuorre commented Oct 26, 2017

mikejacktzen commented Oct 26, 2017 •

edited

Loading

paul-buerkner commented Oct 26, 2017 via email

mikejacktzen commented Oct 26, 2017 •

edited

Loading

paul-buerkner commented Oct 26, 2017 via email

mikejacktzen commented Oct 27, 2017

paul-buerkner commented Oct 27, 2017 via email

mikejacktzen commented Oct 27, 2017

torkar commented May 28, 2018

paul-buerkner commented May 28, 2018

torkar commented May 28, 2018

question: simulate 'outcome' data via prior predictive #281

question: simulate 'outcome' data via prior predictive #281

Comments

mikejacktzen commented Oct 26, 2017

mvuorre commented Oct 26, 2017

mikejacktzen commented Oct 26, 2017 • edited Loading

edit

paul-buerkner commented Oct 26, 2017 via email

mikejacktzen commented Oct 26, 2017 • edited Loading

paul-buerkner commented Oct 26, 2017 via email

mikejacktzen commented Oct 27, 2017

paul-buerkner commented Oct 27, 2017 via email

mikejacktzen commented Oct 27, 2017

torkar commented May 28, 2018

paul-buerkner commented May 28, 2018

torkar commented May 28, 2018

mikejacktzen commented Oct 26, 2017 •

edited

Loading

mikejacktzen commented Oct 26, 2017 •

edited

Loading