Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label smoothing should be one-sided? #10

Closed
MustafaMustafa opened this issue Jan 10, 2017 · 15 comments
Closed

Label smoothing should be one-sided? #10

MustafaMustafa opened this issue Jan 10, 2017 · 15 comments

Comments

@MustafaMustafa
Copy link

Regarding trick 6, label smoothing should be one-sided, real images only (Salimans et. al. 2016). Their rationale makes sense. Did you find evidence to the contrary?

@MustafaMustafa
Copy link
Author

MustafaMustafa commented Jan 10, 2017

I think your advice is slightly different, you are replacing the labels with a random number instead of a fixed label. This will ameliorate the optimal discriminator numerator problem with a fixed fake label and probably avoid it. Is this your reasoning?

Thanks,

@zhangqianhui
Copy link

https://arxiv.org/abs/1701.00160 Have you read this paper ? Hope to help you.

@MustafaMustafa
Copy link
Author

@zhangqianhui, thank you, as you can read there, the suggestion is to do one-sided label smoothing. I was wondering why Soumith chose a double sided one with randomization and if there is empirical evidence that this works better.

@black-puppydog
Copy link

I'd acutally also be interested to know this. Did you find any good theoretical or empirical hints to this?

@rafaelvalle
Copy link

From what I understand, Soumith is describing one-sided label smoothing, that is, only the labels of the discriminator receives label smoothing, not the generator. Double-sided would be smoothing the labels of the discriminator and generator

@black-puppydog
Copy link

From what I understood, this trick (no. 6)

if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake
sample, replace it with 0.0 and 0.3 (for example).

Would smooth both real and fake labels for the discriminator.
I just went through Salimans et al. 2016 again, and yes, they do mention non-random, two-sided label smoothing. I totally forgot they had that in there, so shame on me for not re-checking this before.

But: In the NIPS 2016 tutorial, Soumith explicitely writes:

It is important to not smooth the labels for the fake samples.

So I'm still wondering if he has new insight on that, and if so, where to read up on it. :)

@DanielTakeshi
Copy link

Also wondering about this. Ian Goodfellow specifically said not to smooth the fake labels.

@dantp-ai
Copy link

dantp-ai commented Aug 9, 2017

@MustafaMustafa This issue did not get any concrete answer, merely opened more, maybe let it still open?

@dantp-ai
Copy link

dantp-ai commented Aug 9, 2017

What is meant by

occasionally flip the labels when training the discriminator

E.g., should we ocassionally assign for a datapoint with real label the fake label and vice-versa? Why?

@MustafaMustafa
Copy link
Author

@plopd, I think the advice is to smooth the real and fake labels of the discriminator only. It can be rephrased to make that clearer.

@woozzu
Copy link

woozzu commented Aug 16, 2017

See the implementation of Salimans et. al. 2016. According to it, only the real labels of the discriminator should be smoothed.

@robot010
Copy link

I tried to smooth both real and fake samples, and things get messed up. It seems that only smoothing real samples gets a better result.

@makeyourownalgorithmicart

I don't this issue should be closed.

there are two opinions:

  • only smooth D (0s and 1s)
  • only smooth 1s for data sources coming from the training data and G

which one is not clear yet

@arthurarj
Copy link

arthurarj commented Dec 2, 2019

I was just trying to figure that out and I wanted to give my two cents. As others here have already mentioned, I believe that the "one-sided" stands for only real samples. After all, it makes sense to talk about sides when we're referring to the interval [0,1]. whereas not so much when talking about the pair generator/discriminator. So I don't see ambiguity in that sense. Moreover, the whole motivation behind this technique seems to be making the discriminator's job harder in order to avoid a series of problems regarding stability [1].

TLDR: smooth only real samples, and only for the discriminator.

@CielAl
Copy link

CielAl commented Sep 12, 2020

I don't this issue should be closed.

there are two opinions:

  • only smooth D (0s and 1s)
  • only smooth 1s for data sources coming from the training data and G

which one is not clear yet

I think in "NIPS 2016 tutorial: Generative adversarial networks" (‎Goodfellow) Section 4.2, it pointed out that the label smoothing proposed for GAN is specifically for D and for real data only (sample tensorflow code in section 4.2 as well).

and if I understand it correctly, with equation 15 and quote:

"When β is zero, then smoothing by α does nothing but scale down the optimal value of the discriminator"

the most important reason to do one-sided smoothing for real data is to cap the optimal D, and thus, there will be more adversarial signals for G. That mitigates the problem of a too powerful discriminator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests