Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missmatch between loss in paper and code #9

Closed
sguada opened this issue Feb 7, 2017 · 10 comments
Closed

Missmatch between loss in paper and code #9

sguada opened this issue Feb 7, 2017 · 10 comments

Comments

@sguada
Copy link

sguada commented Feb 7, 2017

I have few questions:

  • According eq (2) and pseudo-code line 6, one should maximize errD, but the code seems to be minimizing it.
  • Similarly in pseudo-code line 10, one should minimize -errG, but the code seems to be minimizing errG instead.

Maybe I'm missing something about how the losses are computed and optimized.

@HaraldKorneliussen
Copy link

This is a duplicate of issue #5 it seems. They backprop with -1.

@aosokin
Copy link

aosokin commented Feb 8, 2017

Hi, I have a related question.
It looks like Line 5 of Algorithm 1 in the paper perfectly matchers https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L196
and the quantity being maximized is computed in https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L197 as errD.
Later in https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L218 this errD is printed ad Loss_D.
However, in the README.md it is said that one should plot -Loss_D. Does Loss_D mean the same things in the two places or is there a mismatch?

@martinarjovsky
Copy link
Owner

Referring to sguada's original comment:

This is true, thanks for the spotting! Inadvertently we used different signs in the code and the paper. Let me explain why this is the same.

image

The reason for this equivalence is that f is 1-Lip if and only if -f is 1-Lip, so indeed which choise of signs we use for equation 2 ( INSIDE the maximum ) doesn't matter. Therefore, as long as we stick to this sign choice during the code it's fine (in the code we always use the second, in the paper we used the first one). While this is annoying, it has no practical implications (if you want to be nitpicky about this, as I would be, you can prove that the algorithm does the same thing by the fact that the init of the weights is symmetrical).

This explains aosokin's comment as well. If you have anymore questions let us know.

@sguada
Copy link
Author

sguada commented Feb 10, 2017

Yeah really subtle difference, which is not obvious to me, what makes following the code with respect to the paper harder. At least I would include a comment in the code about.

The main remaining issue is what loss is the Generator minimizing? and which sign did you use to compute it and to plot it to get the graphs in the paper.

@martinarjovsky
Copy link
Owner

To get the plots in the paper just plot -errD, regardless of the choice of signs. Because of the equality I posted above, -errD will be your estimate regardless of you maximizing E_real[f(x)] - E_fake[f(x)] or E_fake[f(x)] - E_real[f(x)].

I will try to make a comment in the paper or we might just change the code, it's the same. Thanks for the spotting :), it's much appreciated!

@ypxie
Copy link

ypxie commented Jun 7, 2017

Hello, Thanks for the explaining. I have a question, since you backward through the network twice, why is retain_variable=True not used in the code?

@fungtion
Copy link

@ypxie they are different networks with different inputs

@meder411
Copy link

I know this is long closed, but this might be worthwhile to explain in the README.MD file? It's definitely an initial point of confusion for those who read the paper and then go to use the code.

@jainshobhit
Copy link

I am not clear as to why the Wasserstein estimate should be -errD.
If the Discriminator is say, trying to maximize errD = E_real[f(x)] - E_fake[f(x)], shouldn't this directly correspond to the Wassterstein estimate?(Equation 2 in the paper)
@martinarjovsky @sguada @aosokin

@feixiangdekaka
Copy link

Referring to sguada's original comment:

This is true, thanks for the spotting! Inadvertently we used different signs in the code and the paper. Let me explain why this is the same.

image

The reason for this equivalence is that f is 1-Lip if and only if -f is 1-Lip, so indeed which choise of signs we use for equation 2 ( INSIDE the maximum ) doesn't matter. Therefore, as long as we stick to this sign choice during the code it's fine (in the code we always use the second, in the paper we used the first one). While this is annoying, it has no practical implications (if you want to be nitpicky about this, as I would be, you can prove that the algorithm does the same thing by the fact that the init of the weights is symmetrical).

This explains aosokin's comment as well. If you have anymore questions let us know.

how to testify the equation ?thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants