-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code vs paper #19
Comments
So the log 2 in the code is an artifact of another model (BGAN) where I just wanted all the f-divergences to align in a way that the estimate of p/q ended up being e^T. This doesn't change anything from a learning perspective, but it does shift the JSD. q_samples are just the negative samples. So in the case of MI estimation, these would come from the product of marginals. If you write out the Eq part as log (1+ exp (-qsamples)) + log exp qsamples, then refactor, you get something of the form of (4). For the prior matching, this is just done as in GAN, where the second term is all that's used for training the generator in the case of minmax. Or if you do non-saturating, you replace the first term of the f-divergence with q_samples (from the generator). Does that make sense? |
Okey, that makes sense! Thank you for clarification :) One question - why would you use the |
I think it's an artifact of the olden days of Theano, as I'm guessing in PyTorch these are equivalent. |
Hi devon, Thanks for your interesting work! I have some trouble in reading your code about prior matching. Like your said:
But besides training the generator, we also need to train the discriminator, that is, to maximize Equation (7) in your paper, but I didn't find any code to do this part. Am I missing anything? Or have I misunderstood anything? |
Oh sorry, I think you have done this maximize procedure in class Discriminator. Sorry to disturb. |
Hello,
thank you for publishing your code - outstanding work :)
![image](https://user-images.githubusercontent.com/18394313/60875863-0d372a00-a23b-11e9-9751-39754d3b0a69.png)
However I have a question regarding JSD/GAN based estimators and differences between implementation and formulation in your paper:
Eq. 4:
Eq. 7:
![image](https://user-images.githubusercontent.com/18394313/60875894-1f18cd00-a23b-11e9-878f-921f2d31747f.png)
At the same time, in your code:
(for the JSD estimator)
While I do know, where does thie log_2 come from [Nowozin et al., 2016], the addition of q_samples in
Eq
is a bit more mysterious :DAnd then for the prior matching:
Seems like you are using just half of the equation 7 to obtain loss value.
Could you clarify those differences (maybe I am missing something in the code)? I have been trying to merge DIM with my existing code (a bit different setup, yet should work together properly) and cannot get it to work well.
Thanks in advance!
The text was updated successfully, but these errors were encountered: