Missmatch between loss in paper and code #9

sguada · 2017-02-07T17:19:34Z

I have few questions:

According eq (2) and pseudo-code line 6, one should maximize errD, but the code seems to be minimizing it.
Similarly in pseudo-code line 10, one should minimize -errG, but the code seems to be minimizing errG instead.

Maybe I'm missing something about how the losses are computed and optimized.

The text was updated successfully, but these errors were encountered:

HaraldKorneliussen · 2017-02-08T07:24:24Z

This is a duplicate of issue #5 it seems. They backprop with -1.

aosokin · 2017-02-08T10:55:46Z

Hi, I have a related question.
It looks like Line 5 of Algorithm 1 in the paper perfectly matchers https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L196
and the quantity being maximized is computed in https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L197 as errD.
Later in https://github.com/martinarjovsky/WassersteinGAN/blob/master/main.py#L218 this errD is printed ad Loss_D.
However, in the README.md it is said that one should plot -Loss_D. Does Loss_D mean the same things in the two places or is there a mismatch?

martinarjovsky · 2017-02-09T16:52:31Z

Referring to sguada's original comment:

This is true, thanks for the spotting! Inadvertently we used different signs in the code and the paper. Let me explain why this is the same.

The reason for this equivalence is that f is 1-Lip if and only if -f is 1-Lip, so indeed which choise of signs we use for equation 2 ( INSIDE the maximum ) doesn't matter. Therefore, as long as we stick to this sign choice during the code it's fine (in the code we always use the second, in the paper we used the first one). While this is annoying, it has no practical implications (if you want to be nitpicky about this, as I would be, you can prove that the algorithm does the same thing by the fact that the init of the weights is symmetrical).

This explains aosokin's comment as well. If you have anymore questions let us know.

sguada · 2017-02-10T01:32:00Z

Yeah really subtle difference, which is not obvious to me, what makes following the code with respect to the paper harder. At least I would include a comment in the code about.

The main remaining issue is what loss is the Generator minimizing? and which sign did you use to compute it and to plot it to get the graphs in the paper.

martinarjovsky · 2017-02-20T05:11:11Z

To get the plots in the paper just plot -errD, regardless of the choice of signs. Because of the equality I posted above, -errD will be your estimate regardless of you maximizing E_real[f(x)] - E_fake[f(x)] or E_fake[f(x)] - E_real[f(x)].

I will try to make a comment in the paper or we might just change the code, it's the same. Thanks for the spotting :), it's much appreciated!

ypxie · 2017-06-07T19:28:00Z

Hello, Thanks for the explaining. I have a question, since you backward through the network twice, why is retain_variable=True not used in the code?

fungtion · 2017-06-13T00:33:45Z

@ypxie they are different networks with different inputs

meder411 · 2018-02-21T18:44:22Z

I know this is long closed, but this might be worthwhile to explain in the README.MD file? It's definitely an initial point of confusion for those who read the paper and then go to use the code.

jainshobhit · 2019-03-26T06:00:32Z

I am not clear as to why the Wasserstein estimate should be -errD.
If the Discriminator is say, trying to maximize errD = E_real[f(x)] - E_fake[f(x)], shouldn't this directly correspond to the Wassterstein estimate?(Equation 2 in the paper)
@martinarjovsky @sguada @aosokin

feixiangdekaka · 2019-04-10T04:38:04Z

Referring to sguada's original comment:

This is true, thanks for the spotting! Inadvertently we used different signs in the code and the paper. Let me explain why this is the same.

The reason for this equivalence is that f is 1-Lip if and only if -f is 1-Lip, so indeed which choise of signs we use for equation 2 ( INSIDE the maximum ) doesn't matter. Therefore, as long as we stick to this sign choice during the code it's fine (in the code we always use the second, in the paper we used the first one). While this is annoying, it has no practical implications (if you want to be nitpicky about this, as I would be, you can prove that the algorithm does the same thing by the fact that the init of the weights is symmetrical).

This explains aosokin's comment as well. If you have anymore questions let us know.

how to testify the equation ?thank you

martinarjovsky closed this as completed Feb 20, 2017

aosokin mentioned this issue Feb 28, 2017

is the estimate of Wasserstein distance negative? #17

Closed

martinarjovsky mentioned this issue May 22, 2017

Question about 'one' and 'mone' #41

Closed

eifuentes mentioned this issue Nov 29, 2017

shouldn't it be D_real.backward(one)? caogang/wgan-gp#9

Closed

merlinyx mentioned this issue Nov 8, 2018

Problems with the optimization of loss. #61

Open

JurijsNazarovs mentioned this issue Dec 13, 2021

Generator update #80

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missmatch between loss in paper and code #9

Missmatch between loss in paper and code #9

sguada commented Feb 7, 2017

HaraldKorneliussen commented Feb 8, 2017

aosokin commented Feb 8, 2017

martinarjovsky commented Feb 9, 2017

sguada commented Feb 10, 2017

martinarjovsky commented Feb 20, 2017

ypxie commented Jun 7, 2017

fungtion commented Jun 13, 2017

meder411 commented Feb 21, 2018

jainshobhit commented Mar 26, 2019

feixiangdekaka commented Apr 10, 2019

Missmatch between loss in paper and code #9

Missmatch between loss in paper and code #9

Comments

sguada commented Feb 7, 2017

HaraldKorneliussen commented Feb 8, 2017

aosokin commented Feb 8, 2017

martinarjovsky commented Feb 9, 2017

sguada commented Feb 10, 2017

martinarjovsky commented Feb 20, 2017

ypxie commented Jun 7, 2017

fungtion commented Jun 13, 2017

meder411 commented Feb 21, 2018

jainshobhit commented Mar 26, 2019

feixiangdekaka commented Apr 10, 2019