Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generators using Stan models #35

Open
martinmodrak opened this issue Sep 6, 2021 · 10 comments
Open

Generators using Stan models #35

martinmodrak opened this issue Sep 6, 2021 · 10 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@martinmodrak
Copy link
Collaborator

There are probably multiple flavors we could consider:

  • A model with no parameters and only generated quantities + information on which parameters in the posterior are observable
  • Generate parameter values separately (e.g. by a function) and then use the fixed_param algorithm to run the generated quantities block which generates the observable values
  • Use MCMC for simulation (similarly to how SBC_generator_brms works)
@martinmodrak martinmodrak added the good first issue Good for newcomers label Sep 6, 2021
@hyunjimoon
Copy link
Owner

Note that this is a specific generator usecase.
With added latent gaussian mixture model for the calibration target, the following could be generalized with "family" and "family args". Added "family" signature is mapped as an input to self-calib function as transform_type which is delivered here as "poisson", "negative-binomial" to log, "binom" to logistic. For likelihood with more than one parameter, let's follow brms list of link function for each parameter here.

generator_gmm <- function(mixture_means, mixture_sds, fixed_values){
  # fixed value across simulated datasets
  ## meta
  nobs <- fixed_values$nobs
  ndraws <- fixed_values$ndraws
  ## distribution-specific
  nsize <- fixed_values$nsize # Do not confuse n of Binom(n, p) with `nobs`
  
  # predictor
  X = fixed_values$X
  # parameter with fixed distribution across `nsims` datasets
  b <- fixed_values$b 
  # target variable updated at each iteration
  a <- rvar_rng(rnorm, n = 1, sample(mixture_means$a, 1, replace=TRUE), sd=mixture_sds$a)

  # generate
  mu = draws_of(a + X %**% b)
  mu = invlogit(mu)
  Y <- rvar_rng(rbinom, n = nobs, size = nsize, p = mu, ndraws = nsims) 
  gen_rvars <- draws_rvars(nsims = nsims, nobs = nobs, 
                           mixture_means = mixture_means$a, mixture_sds = mixture_sds$a, 
                           Y = Y)
  SBC_datasets(
    parameters = as_draws_matrix(list(a = a)), 
    generated = draws_rvars_to_standata(gen_rvars)
  )
}

@alevaracca
Copy link

Hi,

Not sure if this is the right section, but I was wondering about the possibility of generating data using the same Stan file that one would develop, for example, to run Prior Predictive checks (i.e.: data & generated quantities blocks only). I am asking because I have (and I expect many other users will have too) several such files already available and it would be nice to use them in SBC. Also, some are very complex, so converting them to a new generator would be quite a lot of work.

Thanks!

@hyunjimoon
Copy link
Owner

hyunjimoon commented Feb 22, 2022

That is a great question. We have searched autogenerator from stanfile, but there were difficulties which @Dashadower could share further.

One development option that comes to my mind is to use modular stan program which is the template for stanfile. If this modular program could be used to generate both SBC generator and stanfile (the latter @Dashadower is working on with the support from this) repo it would prevent an double effort.

@alevaracca could you please share your thoughts on whether the above suggestion would meet your needs? Modular program is explained in more detail in the first section here.

@alevaracca
Copy link

Thanks for the quick reply @hyunjimoon, I'll have a look into this and give it a try! I'll get back to you eventually.

@martinmodrak
Copy link
Collaborator Author

AFAIK, there are multiple ways people create their generators in Stan. To prioritise right: could you share what a typical (potentially simplified) Stan file you are using looks like? Are you using rstan or cmdstanr? (I'd be happy to get feedback on the implementation, so if you are willing to do some testing on the first version, I'll first implement a version that matches your needs). Note however, that a hard limitation of current Stan core is that there is no way to get to the results of the transformed data block.

Also note that you can always create a dataset explicitly via SBC_datasets() - this requires you to do the necessary juggling between data formats, but will work immediately. I hope looking at https://hyunjimoon.github.io/SBC/reference/SBC_datasets.html and potentially looking what the results of generate_datasets(SBC_example_generator("normal"), n_sims = 50) look like makes it clear on what the expected format is.

Probably the most difficult conversion necessary is already covered by (currently undocumented) draws_rvars_to_standata() - this takes an object of type draws_rvars and converts each draw into a list that can be passed as data to Stan. I.e. the result of draws_rvars_to_standata() can be directly passed as the generated = argument of SBC_datasets()

@Dashadower
Copy link
Collaborator

Just to chime in on using generated qualities for SBC; originally this was the way that the library was implemented. But it turned out that it was easier to extract the draws from a stanfit into a R object like rvar as to writing SBC for every model(what Martin is saying above).

@alevaracca
Copy link

Thank you both for the reply. Martin, I'll give your approach a try in the next few days and get back with some feedback (plus some code of the model that I am SBC-ing for).

@alevaracca
Copy link

Sorry if it took this much to get to this point, but I have followed Martin's instructions and they all worked out smoothly. Turns out it was not too complicated to tinker with the different formats. The conversion using draws_rvars_to_standata() did the job as well. Cheers.

@maugavilla
Copy link

I think my issue relates to this overall thread. I am working to impement SBC with the blavaan package, here we have pre compile Stan model, for usually large models with a lot of parameters. And within blavaan I can generate data sets from priors, so I could skip the generator function for example. But I cant include my list of data sets, as it is not an SBC_datasets type object.

From this 2 questions and possible additions:

  • how to add a list of data sets that was generated from another function? Or how to make this list into an SBC_dataset object?
  • how to ask to save only a few parameters? As I am not interested in all the parameters from this large model

Appreciate any guidelines

@martinmodrak
Copy link
Collaborator Author

@maugavilla as this will likely require discussing a bunch of stuff that's specific to blavaan, I've moved the discussion to a new issue: #69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants