-
-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Analogue to TraceELBO class, but with MMD instead of KL #1780
Comments
@varenick sure, PRs are welcome! You should be able to use some of the existing kernels in |
Also check out this line of literature https://arxiv.org/abs/1608.04471 for
Stein estimators.
…On Sat, Mar 2, 2019 at 7:07 AM Eugene Golikov ***@***.***> wrote:
Feature:
A new MMDTraceELBO class, that will implement a Mean Measure Discrepancy
between samples from guide an from model instead of KL-divergence as in
TraceELBO class.
Motivation:
Elbo is a sum of an expected loglikelihood and a minus KL-divergence
between the posterior distribution and the prior. In order to compute a
KL-term, we have to either have an ability to compute log-probabilities of
both prior and posterior distributions at posterior samples, or train a
classifier to distinguish between prior and posterior samples. The second
alternative have not been implemented in pyro yet, however, using a
classifier for computing density-ratios leads to a minimax-game objective
and seems quite unreliable.
In Wasserstein Auto-Encoder paper https://arxiv.org/abs/1711.01558
authors propose two alternatives to distinguish between prior and posterior
distributions: the first one is training a classifier, as discussed above,
and the second one is using a Maximum Mean Discrepancy (MMD) instead of KL.
*Advantages of MMD:*
1. Requires only samples from prior and posterior distributions, does
not require explicit log-probabilities;
2. Does not produce a minimax-game objective.
The main disadvantage of using MMD instead of KL is that the former does
not provide us a valid variational lower bound for evidence. However, it
leads us to an approximation for an optimal transport cost between training
dataset and model distribution.
If this looks acceptable, I would like to try to implement this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1780>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABVhL9pLNAJc7su1a4flsy4J0ieNz7Hkks5vSpPGgaJpZM4babhi>
.
|
@eb8680 Thanks for a tip with existing kernels; I didn't know about them. I was thinking of using (generally) different kernel |
@eb8680 I've recently made a working prototype, planning to make a PR soon. I have a small problem: I don't know how to name the corresponding class. Candidates:
Could you please suggest the name? I've only see such an objective in the context of VAE: see Ermon Group blogpost, where it is called MMD-VAE, and InfoVAE paper, where it is called InfoVAE. |
How about |
@fritzo @eb8680 Hmm, |
You'll need to fork Pyro on GitHub and push your branch to that fork.
…On Wed, Apr 10, 2019, 6:56 AM Eugene Golikov ***@***.***> wrote:
@fritzo <https://github.com/fritzo> @eb8680 <https://github.com/eb8680>
I've tried to push my branch into remote repo, but git returns error 403:
permission denied
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1780 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB8CwF0posqjI_dyPArE0vJctWKv7ZWbks5vfe2pgaJpZM4babhi>
.
|
Hey @varenick would you be willing to share and example VAE code with your Trace_MMD class? |
@wthrif this is to be expected, since the @varenick also wrote a nice example notebook in #1818 with a version of |
Thanks for the link @eb8680 I'll work on implementing it. |
Feature:
A new
MMDTraceELBO
class, that will implement a Maximum Mean Discrepancy between samples from guide an from model instead of KL-divergence as inTraceELBO
class.Motivation:
Elbo is a sum of an expected loglikelihood and a minus KL-divergence between the posterior distribution and the prior. In order to compute a KL-term, we have to either have an ability to compute log-probabilities of both prior and posterior distributions at posterior samples, or train a classifier to distinguish between prior and posterior samples. The second alternative have not been implemented in pyro yet, however, using a classifier for computing density-ratios leads to a minimax-game objective and seems quite unreliable.
In Wasserstein Auto-Encoder paper https://arxiv.org/abs/1711.01558 authors propose two alternatives to distinguish between prior and posterior distributions: the first one is training a classifier, as discussed above, and the second one is using a Maximum Mean Discrepancy (MMD) instead of KL.
Advantages of MMD:
The main disadvantage of using MMD instead of KL is that the former does not provide us a valid variational lower bound for evidence. However, it leads us to an approximation for an optimal transport cost between training dataset and model distribution.
If this looks acceptable, I would like to try to implement this.
The text was updated successfully, but these errors were encountered: