-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gibbs sampling #736
Comments
Related #799 |
Is Gibbs still on our to-do list? Seems like more trouble than its worth -- its neither general enough on the one hand nor efficient enough on the other, relative to other samplers. |
With Gibbs I meant the strict definition of non-blocked sampling, not exploiting conjugacy to sample from the conditional. We do have this for Categoricals, which is where it matters most. I actually implemented this here for Metropolis (-> MetropolisWithinGibbs): ddd9651 We haven't really received much requests for this so we can probably just drop it. |
I'd just like to add a request for Gibbs sampling (including the case where we can exploit conjugacy). My use case is fitting hierarchical, multinomial logit models to data from the California Household Travel Survey. The most general formulation of the model is given on pages 13-15 here. So far, the state of the art in fitting such models is to use Gibbs sampling as described on pg 133-136 here and in section 12.6 here. Here, conjugacy is exploited for the parameters at the top of the hierarchy. From what I can tell, Gibbs sampling is used because the hierarchical formulation results in needing to fit thousands of parameters, which can be a challenge with MCMC methods that propose moves for the entire vector of parameters all at once. As an example, I have 4,004 individuals and 8 travel mode alternatives overall (though not all alternatives are available to all individuals). If fitting a model where only the constant in each utility is allowed to be individual specific, I end up with more than 20,000 parameters. The NUTS sampler for this example was very slow (on the order of about 1.5-2 seconds per iteration). Being able to make use of Gibbs sampling, which may be faster, would be useful. |
Gibbs isn't likely to be of help in situations like this. It is particularly bad with models having a large number of parameters, where many of those parameters are correlated (see the example at the end of the original NUTS paper for instance). I would take another look at NUTS, with an eye to optimizing the effective number of samples per second, rather than samples per second, as the criterion. Gibbs might be "fast" under the latter criterion, but the autocorrelation will make the effective number of samples very small. Your model is likely slow because your sample size is so large. You might consider variational methods with mini-batch, if you can. Gibbs falls into a suboptimal valley in the tradeoff between ease of implementation, as random-walk Metropolis is, and being very efficient at sampling, as Hamiltonian methods are. Even if you like it, its very difficult to implement in a general way because every model requires full conditional posteriors to be specified. |
Concur with @fonnesbeck. Gibbs is worse than HMC in almost every respect, especially for high-dimensional hierarchical models. |
@twiecki @fonnesbeck , thanks for pointers. I can see why Gibbs sampling might perform badly when one has high posterior correlations. In terms of using mini-batch ADVI, am I incorrect in thinking this will fail for me? Since I have individual specific intercept terms, when I take a subset of the data, there will be no way to get information on the intercept terms that belong to the observations not taken in the mini-batch. In terms of issues in the code, my |
@twiecki Hey, and thank you for great comments. I have an external model, which I call if using python. I have two questions: |
@madanh If you can't do autodiff NUTS will probably not be feasible. I would try SMC which is included in PyMC3. |
Currently we only allow non-blocked (i.e. Gibbs) sampling across different RVs. However, a RV that's expressed as a vector will always be non-blocked sampled which is a limitation. I think that's needed for models like #443.
Not sure how this can be implemented but @jsalvatier mentioned that he had some ideas.
The text was updated successfully, but these errors were encountered: