Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gibbs sampling #736

Closed
twiecki opened this issue Jun 5, 2015 · 9 comments
Closed

Add Gibbs sampling #736

twiecki opened this issue Jun 5, 2015 · 9 comments

Comments

@twiecki
Copy link
Member

twiecki commented Jun 5, 2015

Currently we only allow non-blocked (i.e. Gibbs) sampling across different RVs. However, a RV that's expressed as a vector will always be non-blocked sampled which is a limitation. I think that's needed for models like #443.

Not sure how this can be implemented but @jsalvatier mentioned that he had some ideas.

@twiecki
Copy link
Member Author

twiecki commented Aug 17, 2015

Related #799

@fonnesbeck
Copy link
Member

Is Gibbs still on our to-do list? Seems like more trouble than its worth -- its neither general enough on the one hand nor efficient enough on the other, relative to other samplers.

@twiecki
Copy link
Member Author

twiecki commented Jul 27, 2016

With Gibbs I meant the strict definition of non-blocked sampling, not exploiting conjugacy to sample from the conditional. We do have this for Categoricals, which is where it matters most. I actually implemented this here for Metropolis (-> MetropolisWithinGibbs): ddd9651

We haven't really received much requests for this so we can probably just drop it.

@twiecki twiecki closed this as completed Jul 27, 2016
@timothyb0912
Copy link

timothyb0912 commented Mar 15, 2017

I'd just like to add a request for Gibbs sampling (including the case where we can exploit conjugacy).

My use case is fitting hierarchical, multinomial logit models to data from the California Household Travel Survey. The most general formulation of the model is given on pages 13-15 here. So far, the state of the art in fitting such models is to use Gibbs sampling as described on pg 133-136 here and in section 12.6 here. Here, conjugacy is exploited for the parameters at the top of the hierarchy.

From what I can tell, Gibbs sampling is used because the hierarchical formulation results in needing to fit thousands of parameters, which can be a challenge with MCMC methods that propose moves for the entire vector of parameters all at once. As an example, I have 4,004 individuals and 8 travel mode alternatives overall (though not all alternatives are available to all individuals). If fitting a model where only the constant in each utility is allowed to be individual specific, I end up with more than 20,000 parameters. The NUTS sampler for this example was very slow (on the order of about 1.5-2 seconds per iteration). Being able to make use of Gibbs sampling, which may be faster, would be useful.

@fonnesbeck
Copy link
Member

fonnesbeck commented Mar 15, 2017

Gibbs isn't likely to be of help in situations like this. It is particularly bad with models having a large number of parameters, where many of those parameters are correlated (see the example at the end of the original NUTS paper for instance). I would take another look at NUTS, with an eye to optimizing the effective number of samples per second, rather than samples per second, as the criterion. Gibbs might be "fast" under the latter criterion, but the autocorrelation will make the effective number of samples very small. Your model is likely slow because your sample size is so large. You might consider variational methods with mini-batch, if you can.

Gibbs falls into a suboptimal valley in the tradeoff between ease of implementation, as random-walk Metropolis is, and being very efficient at sampling, as Hamiltonian methods are. Even if you like it, its very difficult to implement in a general way because every model requires full conditional posteriors to be specified.

@twiecki
Copy link
Member Author

twiecki commented Mar 15, 2017

Concur with @fonnesbeck. Gibbs is worse than HMC in almost every respect, especially for high-dimensional hierarchical models.

@timothyb0912
Copy link

@twiecki @fonnesbeck , thanks for pointers. I can see why Gibbs sampling might perform badly when one has high posterior correlations.

In terms of using mini-batch ADVI, am I incorrect in thinking this will fail for me? Since I have individual specific intercept terms, when I take a subset of the data, there will be no way to get information on the intercept terms that belong to the observations not taken in the mini-batch. In terms of issues in the code, my minibatch_tensors would change on every mini-batch sample and that's not currently supported right? Feel free to let me know if I'm thinking about this incorrectly!

@madarshahian
Copy link

madarshahian commented Feb 24, 2018

@twiecki Hey, and thank you for great comments. I have an external model, which I call if using python. I have two questions:
1- Without derivative of my model being available, I cannot use NUTS sampler, what is the most reliable sampler that I can use instead of that? Metropolis?
2-Imagine somehow I can have derivative of my external model with respect to its parameters. I already know if I have an external model, how should I run it using pymc3 (and I know how to sample its parameters' posterior using Metropolis). But, is there any way that I let NUTS sampler know that I also have model derivative wrt its parameters as an output of the external model? I really like to use NUTS sampler instead of Metropolis. Any hints?

@twiecki
Copy link
Member Author

twiecki commented Feb 26, 2018

@madanh If you can't do autodiff NUTS will probably not be feasible. I would try SMC which is included in PyMC3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants