Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: simulate 'outcome' data via prior predictive #281

Closed
mikejacktzen opened this issue Oct 26, 2017 · 11 comments
Closed

question: simulate 'outcome' data via prior predictive #281

mikejacktzen opened this issue Oct 26, 2017 · 11 comments
Labels

Comments

@mikejacktzen
Copy link

I was wondering if it would be feasible to implement the prior predictive into brms functionality?

The end goal use case would be for users who want to simulate data from a complex speced out model with brms+stan. As opposed to using observed data (eg left hand side observed outcome) to simulate from the posterior from a speced out model with brms+stan

More motivating context, it would be used similarly to mgcv::gamSim() in the example of the brms vignette

https://cran.r-project.org/web/packages/brms/vignettes/brms_distreg.html

dat_smooth <- mgcv::gamSim(eg = 6, n = 200, scale = 2, verbose = FALSE)

I'd imagine you can do something like

dat_real = cbind(y,x)
dat_no_y = dat_real[,-1]
# no outcome into prior_pred
brms::brm(data=dat_no_y,prior_pred=TRUE, bf(y~s(x)))

I think the rstanarm guys brainstormed something similar

https://groups.google.com/forum/#!msg/stan-users/5v7fuGmuqy8/bWHeO_n9BgAJ

https://www.rdocumentation.org/packages/rstanarm/versions/2.14.1/topics/stan_betareg

Here's slides for general ref

http://personal.strath.ac.uk/gary.koop/handout_geweke.pdf

@mvuorre
Copy link
Contributor

mvuorre commented Oct 26, 2017

Try brm(..., sample_prior = "only").

@mikejacktzen
Copy link
Author

mikejacktzen commented Oct 26, 2017

that seems to work if the data you pass into brm() contains the observed outcome y

But if the dataset passed to brm() does not have the outcome, it errors out. So it seems to be a chicken and egg problem.

The use case is that the dataset you pass into brm() should not have the outcome, so that's why you use brm() to simulate the outcome

edit

This is definately low priority. I guess a workaround is to just make some temporary outcome and attach it to the dataset that you pass into brm() to trick it.

But you have to make sure that sample_prior = "only" will then ignore the 'likelihood' of the temporary outcome you attached

dat_full = iris
out_yes_y = brms::brm(chains=1,iter=100,
                      sample_prior = 'only',
                      data=dat_full,
                      prior = set_prior("normal(0,5)"),
                      formula = Sepal.Length ~ -1 + . )
# gets some output

dat_no_y = dat_full[,-1]  # drop "Sepal.Length"

out_no_y = brms::brm(chains=1,iter=100,
                     sample_prior = 'only',
                     data=dat_no_y,
                     prior = set_prior("normal(0,5)"),
                     formula = Sepal.Length ~ -1 + . )

# Error: The following variables are missing in 'data':
# error, needs explicit y outcome 'Sepal.Length'

@paul-buerkner
Copy link
Owner

paul-buerkner commented Oct 26, 2017 via email

@mikejacktzen
Copy link
Author

mikejacktzen commented Oct 26, 2017

Yeah I snuck in an edit before the response where I suspected attaching the fake outcome on the passed in dataset would be a workaround.

brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only")

Is it true that sample_prior = 'only' will then ignore the likelihood contribution of 'y_fake'?
If so, I think this would then achieve the goal of 'forward simulation' to get y_sim from the prior predictive.

@paul-buerkner
Copy link
Owner

paul-buerkner commented Oct 26, 2017 via email

@mikejacktzen mikejacktzen changed the title feature request: simulate 'outcome' data via prior predictive question: simulate 'outcome' data via prior predictive Oct 27, 2017
@mikejacktzen
Copy link
Author

I think the last question related to all of this is, how do you get outcome predictions.
So far, all the steps return draws from the right hand side parameters via forward simulation of the prior.

My assumption (and question; is my assumption correct?) is that we can just use ?predict.brmsfit to get the left hand side outcome (which under the hood combines the right hand side terms)

The documentation and name says brms.predict() is for posterior predictive draws. But i think if we
used

predict(brm(data=cbind(y_fake,x_real),bf(y_fake~.),sample_prior="only"))

This should imply/reduce to (and return) the 'prior predictive' of the left hand side outcome, since the sample_prior='only' option during the fitting stage masked out the likelihood.

If this is not the case, i think there may need to be a symmetric sample_prior="only" argument in the predict step

predict(...,sample_prior="only")

But i have a feeling this is unnecessary

@paul-buerkner
Copy link
Owner

paul-buerkner commented Oct 27, 2017 via email

@mikejacktzen
Copy link
Author

That's great!

And in hindsight, the requirement of needing the outcome (fake or not) to be part of the data passed into brm(data_with_y,sample_prior="only") seems to have positive consequences during the predict() stage.

Since the program would need to know the 'structural form' of the outcome in order to assemble right hand side simulations to produce left hand side simulated outcomes.

@torkar
Copy link

torkar commented May 28, 2018

Paul,

I'd like a clarification to this. Say I fit a model fit (w/ sample_prior='only'). I expect that when I then do ppc_dens_overlay(y = orig_y, yrep=posterior_predict(fit, draws=25)), I would get a plot where my original y is plotted together with 25 curves taken from the model which disregards the likelihood. However, what I instead get is Error in validate_yrep(yrep, y) : NAs not allowed in 'yrep'.

Or did I completely misunderstand the above discussion and I'm only able to plot comparisons for each parameter individually since there's no likelihood being used?

What I really want to do is a sanity check:

  1. Draw parameter values from priors
  2. Generate data based on those parameter values
  3. Fit model to generated data
  4. Check fit is reasonable

as per pp. 12 in http://mc-stan.org/workshops/stancon2018_intro/Bayesian%20workflow.pdf

I hope this makes sense...

@paul-buerkner
Copy link
Owner

Your general workflow seems reasonable. We have to find out why your posterior_predict call contains NAs. Please open a new thread on http://discourse.mc-stan.org/ and provide more details about the particular model you are fitting.

@torkar
Copy link

torkar commented May 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants