Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate data #3

Closed
wlandau opened this issue Apr 3, 2023 · 11 comments
Closed

Simulate data #3

wlandau opened this issue Apr 3, 2023 · 11 comments

Comments

@wlandau
Copy link
Collaborator

wlandau commented Apr 3, 2023

With the basic package wrapper in place, I think this is a good place to start with the meaningful content.

@kkmann, @yonicd, and @chstock, I know we talked about using brms itself to simulate data. However, just for this first go-round, I think I would prefer to use ordinary R. Once we have that, we will have the beginnings of an interface. Then after we can fit a brms model, it will be easier to go back and rewrite the internals of the simulation function to use brms. Sound reasonable?

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 11, 2023

I created a base R simulation function, and it helped create brm_model().

@kkmann
Copy link
Collaborator

kkmann commented Apr 13, 2023

I wonder, would it not be better to simulate from fixed parameters in R to check consistency etc. ?

We should be able to simulate from the prior predictive using brms for an informative prior (tbd), right?

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 14, 2023

Yeah, I think it makes sense to simulate from fixed parameters as well, e.g. to help with #11.

And on reflection, would we really want to use brms to simulate from the prior predictive distribution? The main situation I can think of where we want to do that is for SBC, and using the same implementation for both modeling and simulation seems a bit circular if the goal is validation.

@kkmann
Copy link
Collaborator

kkmann commented Apr 15, 2023

I would for sake of consistency and then provide an extensive test suite.

I would like to be able to simulate from the prior predictive and posterior predictive similar to what I stitched together in https://boehringer-ingelheim.github.io/oncomsm/articles/oncomsm.html and I think brms already offers that functionality - we just need to wrap it.

@kkmann kkmann self-assigned this Apr 15, 2023
@wlandau wlandau added this to the Version 1.0.0 milestone May 17, 2023
@wlandau
Copy link
Collaborator Author

wlandau commented May 17, 2023

I think we're converging on at least 3 types of simulations:

  1. Prior predictive simulations from R.
  2. Realistic simulations using one or more real datasets.
  3. Posterior predictive simulations from the fitted brms model.

Is that right? Would we still also want prior predictive simulations from the brms object? Can we do that with an already fitted model object?

@kkmann
Copy link
Collaborator

kkmann commented Jun 6, 2023

Yes to all, prior predictive checking will be very handy for elicitation and irrespective of brms or direct stan it will always be a special case of posterior predictive, so 3) -> 1)

I think prior predictive should also be possible with a fitted object, the question is whether we need that - I don't really see the use case for it. This raises an interesting question though - in a Bayesian framework, it makes a lot of sense differentiating between the non-fitted (prior) and fitted model objects which classical R / stats does not (there are only fitted objects). How do we want to handle that? We could have methods predictive() and sample() that work for non-fitted and fitted objects or prior_predictive() posterior_predictive() prior() posterior() for a single object. Any thoughts?

@wlandau
Copy link
Collaborator Author

wlandau commented Jun 6, 2023

Seems like the first step is to learn how to get a non-fitted brms object (could be obvious to you, but I have not looked into this). Then we could provide a function that is exactly like the current brms_model() in terms of its signature, except the model it returns is not fitted. At that point, maybe we could think about getting prior samples of parameters, marginals, etc. in a similar way the package already works with the posterior. This might make prior-vs-posterior comparisons easier using the current visualization functions.

@wlandau
Copy link
Collaborator Author

wlandau commented Aug 29, 2023

brms has functions posterior_predict(), posterior_epred(), pp_check() and others. I have only tried these for models simpler than MMRM, but they look thorough and usable for brms objects. For brms.mmrm, there may only be a need to document this existing functionality.

For the prior predictive distribution, I see that brm(sample_prior = "only") theoretically should be able to do the job. I think we might consider moving brms.mmrm::brm_simulate() to this implementation because it could easily capture different potential parameterizations. Then for extra assurance in SBC, we could borrow the skeleton of a dataset from brms but re-simulate a set of parameters and responses using custom R code.

@wlandau
Copy link
Collaborator Author

wlandau commented Aug 30, 2023

Hmm.... to simulate from the prior predictive distribution, brms really needs an existing dataset, a formula, and a prior. In addition, the prior depends on the formula and dataset, and the formula also depends on a dataset. Complicating all this is the need to customize the formula and prior to specific needs. Altogether, this is a lengthy and complicated process for the user. On the other hand, there is still value in the existing simple brm_simulate() because it gives us a painless starting point to run other functionality.

So I propose the following roadmap:

  1. Rename brm_simulate() to something like brm_simulate_basic().
  2. Implement a new brms_simulate_prior() function to simulate from the prior predictive distribution given a dataset (e.g. from brms.mmrm::brm_simulate_basic()), a formula from brms.mmrm::brm_formula(), and a brms prior.
  3. Implement a custom function that accepts the same inputs as brms.mmrm::brm_simulate_prior() and returns a dataset simulated from the prior predictive distribution, but using custom R code instead of brms. This function will not be part of the package.
  4. Use simulations to compare (2) and (3) to make sure they agree.
  5. Run an SBC study (Simulation-based calibration #56) using (3).
  6. Document prior predictive simulations and posterior predictive simulations in a new vignette.

@wlandau
Copy link
Collaborator Author

wlandau commented Sep 1, 2023

Updates:

  • For (1), I am smoothly deprecating brm_simulate() in favor of the more descriptive brm_simulate_simple() which respects a special case of our parameterization in the methods vignette (Deprecate brm_simulate() in favor of brm_simulate_simple() #57).
  • For (2), I am first implementing a brm_simulate_outline() function (and I am open to suggestions about a better name). This function will declare a structure for treatments and time points, and it will simulate random covariates and a missingness pattern for the response. However, it will not include the actual response variable. The response will take additional work to simulate, and it will require a prior distribution and a fixed effect parameterization (from brm_formula()).

@wlandau
Copy link
Collaborator Author

wlandau commented Sep 13, 2023

@wlandau wlandau closed this as completed Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants