Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about adversarial loss #38

Open
BakerBunker opened this issue Mar 17, 2023 · 3 comments
Open

A question about adversarial loss #38

BakerBunker opened this issue Mar 17, 2023 · 3 comments
Labels
question Further information is requested

Comments

@BakerBunker
Copy link

BakerBunker commented Mar 17, 2023

❓ Questions

In the paper 3.4 "Discriminative Loss" section, adversarial loss is constructed as $l_g(\hat{x})=\mathbb{E}[max(0,1-D_k(\hat{x}))]$, but in the original hinge loss paper, adversarial loss is constructed as $-\mathbb{E}[D(\hat{x})]$.

So I want to know, why the adversarial loss in this paper is different from the original hinge loss?

@BakerBunker BakerBunker added the question Further information is requested label Mar 17, 2023
@jhauret
Copy link

jhauret commented Mar 28, 2023

This point has also triggered my attention. The change has occured between SEANet that introduced $\mathbb{E}[\textrm{max}(0,1-D(\hat{x}))]$ and MelGAN that used $-\mathbb{E}[D(\hat{x})]$. After that, many similar papers have used it, including its direct parent SoundStream.

For my case with EBEN, I just figured out that this criterion was working well (better than LSGAN, WGAN, classic GAN, or original geometric GAN formulation). A possible explanation may be the symmetric use case of the discriminator that should only output values in the range [-1,1], helping to stabilize training by avoiding overconfidence.

@turian
Copy link

turian commented Apr 28, 2023

@jhauret it's worth noting that BigVGAN, which is also SOTA, uses an LSGAN loss.

I am not aware that the discriminators only output values in the range [-1, 1]. Why do you say that? It appears to me that many discriminators do not apply a squashing function at the last layer, in order to avoid vanishing gradient to the generator.

With that said, to answer @BakerBunker's question why
image isn't used, one good reason is the loss balancer that encodec uses, so that many reconstruction, generator, and feature map losses can be combined elegantly. It's not clear how loss balancing should work if any of those loss values are negative, which could be the case with the hinge loss or the LSGAN loss.

@jhauret
Copy link

jhauret commented Apr 28, 2023

Thanks for pointing out the use of LSGAN loss in BigVGAN.

Sorry if I was unclear. In fact, the values of the discriminators can be outside [-1, 1], but if you minimize $\mathbb{E}[\textrm{max}(0, 1-D(\hat{x}))]$ there is no further optimization needed if $D(\hat{x}))>1$ and vice versa for $\mathbb{E}[\textrm{max}(0,1+D(\hat{x}))]$ if $D(\hat{x}))<-1$. So the values of the discriminators tend to be in [-1,1] once they are trained.

You are also right about your last point, but this loss change has been seen in other papers before such a loss balancer was introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants