New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missmatch between loss in paper and code #9
Comments
This is a duplicate of issue #5 it seems. They backprop with -1. |
Hi, I have a related question. |
Referring to sguada's original comment: This is true, thanks for the spotting! Inadvertently we used different signs in the code and the paper. Let me explain why this is the same. The reason for this equivalence is that f is 1-Lip if and only if -f is 1-Lip, so indeed which choise of signs we use for equation 2 ( INSIDE the maximum ) doesn't matter. Therefore, as long as we stick to this sign choice during the code it's fine (in the code we always use the second, in the paper we used the first one). While this is annoying, it has no practical implications (if you want to be nitpicky about this, as I would be, you can prove that the algorithm does the same thing by the fact that the init of the weights is symmetrical). This explains aosokin's comment as well. If you have anymore questions let us know. |
Yeah really subtle difference, which is not obvious to me, what makes following the code with respect to the paper harder. At least I would include a comment in the code about. The main remaining issue is what loss is the Generator minimizing? and which sign did you use to compute it and to plot it to get the graphs in the paper. |
To get the plots in the paper just plot -errD, regardless of the choice of signs. Because of the equality I posted above, -errD will be your estimate regardless of you maximizing E_real[f(x)] - E_fake[f(x)] or E_fake[f(x)] - E_real[f(x)]. I will try to make a comment in the paper or we might just change the code, it's the same. Thanks for the spotting :), it's much appreciated! |
Hello, Thanks for the explaining. I have a question, since you backward through the network twice, why is |
@ypxie they are different networks with different inputs |
I know this is long closed, but this might be worthwhile to explain in the README.MD file? It's definitely an initial point of confusion for those who read the paper and then go to use the code. |
I am not clear as to why the Wasserstein estimate should be |
how to testify the equation ?thank you |
I have few questions:
errD
, but the code seems to be minimizing it.-errG
, but the code seems to be minimizingerrG
instead.Maybe I'm missing something about how the losses are computed and optimized.
The text was updated successfully, but these errors were encountered: