New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the objective correct? #106
Comments
I think using MSE is base on Least Squares GAN paper. https://arxiv.org/pdf/1611.04076.pdf |
Thanks for sharing, was not aware about that article. Still my question persists as even in that article the least square is described as minimizing avg((y_real_original-y_real_prediction)²+(y_fake_original-y_fake_prediction)²) [for the discriminator] which (in my mind) is different than just setting discriminator to "mse" in Keras. I see now that the author has a Pytorch library where things are implemented more like they are described in the papers. So I'm assuming this is a simplification because of the way loss functions are implemented in Keras. He even states in the Keras lib help that "These models are in some cases simplified versions of the ones ultimately described in the papers". I just which to confirm if this makes them significantly different from the papers, if any instability in training may be do to this, and If I'd be better off writing the loss functions in Tensorflow or Pytorch. Thanks again. |
In LSGANs paper, the discriminator's objective function is described as equation (2) (you can check it from original LSGANs paper ). In this code as: fake_A = self.generator.predict(imgs_B)
# Train the discriminators (original images = real / generated = Fake)
d_loss_real = self.discriminator.train_on_batch([imgs_A, imgs_B], valid)
d_loss_fake = self.discriminator.train_on_batch([fake_A, imgs_B], fake)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) This code is make some modifies from the original pix2pix paper, and these modifies are based on LSGANs. So, using mse won't be a significant difference, in my opinion. Thanks |
Oh, I see what you mean. The loss printed out during training is certainly correct, the problem is that this is not the loss being used in the training. From Keras-GAN/lsgan.py lines 29-31:
That is the regular 'mse' implemented in keras which is actually this: This is not the same as the PyTorch implementation. In PyTorch the d_loss is what is actually being back-propagated:
|
yes, I miss this. That's a problem, looking for the answer too. |
@gustavoeb why the d_loss_real and d_loss_fake give loss values like [0.002,0.222] a two elements list, instead of just one single value? |
One is for the loss an the other is for the metrics. In the gan.py for example the loss is 'binary_cross_entropy" and the metrics is 'accuracy'. |
@gustavoeb okay, I got it. Thank you ! |
@gustavoeb in conditional gan pix2pix.py: g_loss = self.combined.train_on_batch([imgs_A, imgs_B], [valid, imgs_A]) returns list like [32.91153, 0.8362595, 0.3207527], so the first value is total loss and the last two value are accuracy for valid and generated image ? |
@jerevon 32.91153 = 0.8362595 + 100*0.3207527 |
@gustavoeb The negative log-likelihood mentioned in the paper and the crossentropy (binary in this case) loss used in the implementation are equivalent. You can find more information e.g. here https://stats.stackexchange.com/questions/198038/cross-entropy-or-log-likelihood-in-output-layer. |
If using MSE for the GAN loss in the generator (such as in Erik's implementation), should we still be using Real labels when training the generator? This 'label flipping' for generator is claimed by some online posts to help when using binary cross entropy, and I'm just wondering if it similarly benefits the use of MSE. Any intuition for using Real labels on the generator? Thanks in advance. |
Sorry for the lay question but is the objective of these GANs in accord with the original paper?
In the original paper, they seem to be minimizing the log(prob_real)+log(1-prob_fakes); but in most Keras implementations I find on the internet people train the discriminator with binary cross entropy. Does this end up being the same, mathematically?
The text was updated successfully, but these errors were encountered: