New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient sampling from mixture models #547
Conversation
…d vector inputs without calculating unneccessary and possibly invalid expressions
… proposal distributions. Allows to create efficient custom samplers.
… from using Metropolis Hastings and a custom Proposal Distribution
…r containing 1's.
…. Fixed some problems that occurred.
…s with other examples
…s with other examples
def bound(logp, *conditions): | ||
impossible = constant(-inf, dtype=float64) # We re-use that constant | ||
|
||
def bound_scalar(logp, *conditions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove bound_scalar
and replace it with bound
or is there something bound
below can't do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to write a generic version of bound that can deal with all the kinds of different inputs that bound is actually fed with. Notable variants of arguments to bound included:
logp: Can be theano scalar or theano vector (in practice, not by contract)
conditions: If logp is a vector, these can be (mixed) vectors or scalars. If logp is a scalar, these are, too.
In the case that logp is a vector, I tried to write a general version which used scan and theano.ifelse to handle every case correctly. It's hard to exactly mimic theano.switch behaviour using scan and ifelse, though,
So in order not to introduce any new bugs, I fell back to the old solution using switch, except for the case where I know logp to be a scalar. Which is the only specific case where I knew using switch introduced a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theano.switch evaluates both alternatives it swiches between, regardless of the conditions. Unlike ifelse, which evaluates only the expression which gets returned by it.
Consequently, the case that bound using switch doesn't handle correctly is if I feed it a logp which leads to an exception if one of the conditions is invalid. This can happen in Categorical.logp, since the logp is created by looking up an index in a weights vector. If the index is out of bounds, an exception gets raised. This should be prevented by the conditions to bound (which check if the index is out of bound), but since switch evaluates the (invalid) logp anyway, an exception gets thrown.
Apart from that, since switch evaluates dead expression branches, it's inefficient as hell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, since most users will be using Bounded
and not bound
.
This looks really promising. I wonder if this could replace the previous Also, could it help here: #443 and #452 ? Also, how difficult do you think it would be to write some basic tests? That would definitely go a long way. |
Hi Thomas, Yes, tests are essential. I don't think they are difficult to write, just I think it can replace the previous Metropolis sampler in time, once we BTW - I'm not an expert in HMC, but as far as I remember, it involves a best, Kai 2014-06-07 11:11 GMT+02:00 Thomas Wiecki notifications@github.com:
|
That's an interesting idea. I think HMC and NUTS could definitely be On Sat, Jun 7, 2014 at 11:50 AM, Kai Londenberg notifications@github.com
Thomas Wiecki |
""" | ||
class MvGaussianMixture(Continuous): | ||
|
||
def __init__(self, shape, cat_var, mus, taus, model=None, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using singular mu
and tau
rather than plurals. It should be obvious that they are vectors, since its explicitly a multivariate variable.
Thanks for the addition! |
… Added references to papers describing how to use these for both.
Status update? This hasn't been updated in over a year. |
@kadeng Is there any chance you could break this into more distinct parts. Since the MH sampler is a dependency. Could you create another branch and copy that in. Is the MH sampler ready to go? |
I am going to go ahead and close this. Please open a new request if there is ongoing interest. |
The changes in this pull request are the following
Metropolis Hastings Step Method ===
A new step method, called MetropolisHastings which allows arbitrary (non-symmetric) proposal distributions. Implemented wrappers for the existing Proposals as well as explicit support for Categorical proposal distributions. This can easily be extended to cover more proposals (how about HamiltonianMC Proposals, or Swendsen-Wang ?).
The implementation could be made more efficient (some potentially complex terms get added to the acceptance logp, and then subtracted again later on), but it's hard to do that in a general manner. Maybe one could teach Theano to optimize this out.
Efficient sampling from Mixture Models
Using that Metropolis Hastings step method, it's easy to create an efficient sampler for Mixture models. So I wrote a first implementation for a Multivariate Normal Mixture distribution. Seems to work well. This could easily be extended to allow flexible mixture kernels.
Small Examples
For both, the new Step Method and the Mixture Model, I provided examples.
Small fixes
I noticed that the "bound" function from distributions/dist_math.py fails to work correctly in some cases. Especially in the logp of the Categorical distribution, it failed not to evaluate at disallowed values, which
could then lead to index out of bounds exceptions. I fixed that by introducing a new "bound_scalar" function
which uses theano.ifelse.ifelse(..), which has the advantage over theano.switch(..) of not evaluating inactive parts of the expression
graph.
Another small fix had to do with TransformedDistributions. In one example I used a transformed Dirichlet distribution where one of the hyperparameters was a 1. This led to an invalid mode for the distribution. That's no problem, unless you transform that Dirichlet, in which case the mode becomes the starting point for the sampler. Since it's invalid (contains NaN values), that leads to problems.
Potential uses
The GMMs can be used as flexible prior densities. They can approximate any function.
Another use case for efficient GMMs is to use them to represent posterior densities. Since it's possible to
efficiently sample from products of mixtures of gaussians, these can be used to enable distributed scalable MCMC, both in the form of embarassingly parallel MCMC, as well as in distributed Nonparametric Belief Propagation (NBP) inference algorithms. I plan to write more on that later. See also my PyPGMc project on GitHub.