A question about adversarial loss #38

BakerBunker · 2023-03-17T10:21:37Z

❓ Questions

In the paper 3.4 "Discriminative Loss" section, adversarial loss is constructed as $l_g(\hat{x})=\mathbb{E}[max(0,1-D_k(\hat{x}))]$, but in the original hinge loss paper, adversarial loss is constructed as $-\mathbb{E}[D(\hat{x})]$.

So I want to know, why the adversarial loss in this paper is different from the original hinge loss?

jhauret · 2023-03-28T10:08:56Z

This point has also triggered my attention. The change has occured between SEANet that introduced $\mathbb{E}[\textrm{max}(0,1-D(\hat{x}))]$ and MelGAN that used $-\mathbb{E}[D(\hat{x})]$. After that, many similar papers have used it, including its direct parent SoundStream.

For my case with EBEN, I just figured out that this criterion was working well (better than LSGAN, WGAN, classic GAN, or original geometric GAN formulation). A possible explanation may be the symmetric use case of the discriminator that should only output values in the range [-1,1], helping to stabilize training by avoiding overconfidence.

turian · 2023-04-28T04:19:51Z

@jhauret it's worth noting that BigVGAN, which is also SOTA, uses an LSGAN loss.

I am not aware that the discriminators only output values in the range [-1, 1]. Why do you say that? It appears to me that many discriminators do not apply a squashing function at the last layer, in order to avoid vanishing gradient to the generator.

With that said, to answer @BakerBunker's question why
isn't used, one good reason is the loss balancer that encodec uses, so that many reconstruction, generator, and feature map losses can be combined elegantly. It's not clear how loss balancing should work if any of those loss values are negative, which could be the case with the hinge loss or the LSGAN loss.

jhauret · 2023-04-28T08:46:18Z

Thanks for pointing out the use of LSGAN loss in BigVGAN.

Sorry if I was unclear. In fact, the values of the discriminators can be outside [-1, 1], but if you minimize $\mathbb{E}[\textrm{max}(0, 1-D(\hat{x}))]$ there is no further optimization needed if $D(\hat{x}))>1$ and vice versa for $\mathbb{E}[\textrm{max}(0,1+D(\hat{x}))]$ if $D(\hat{x}))<-1$. So the values of the discriminators tend to be in [-1,1] once they are trained.

You are also right about your last point, but this loss change has been seen in other papers before such a loss balancer was introduced.

BakerBunker added the question Further information is requested label Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about adversarial loss #38

A question about adversarial loss #38

BakerBunker commented Mar 17, 2023 •

edited

Loading

jhauret commented Mar 28, 2023

turian commented Apr 28, 2023 •

edited

Loading

jhauret commented Apr 28, 2023

A question about adversarial loss #38

A question about adversarial loss #38

Comments

BakerBunker commented Mar 17, 2023 • edited Loading

❓ Questions

jhauret commented Mar 28, 2023

turian commented Apr 28, 2023 • edited Loading

jhauret commented Apr 28, 2023

BakerBunker commented Mar 17, 2023 •

edited

Loading

turian commented Apr 28, 2023 •

edited

Loading