Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes on Theoretical and Heuristic Findings in GANs papers #24

Open
howardyclo opened this issue Jul 28, 2018 · 0 comments
Open

Notes on Theoretical and Heuristic Findings in GANs papers #24

howardyclo opened this issue Jul 28, 2018 · 0 comments

Comments

@howardyclo
Copy link
Owner

howardyclo commented Jul 28, 2018

Preface

In this note, I'll continue recording several findings whatever I think it's important or useful. I'll be focusing on the theoretical and heuristic parts in several GANs papers. This thread will be actively updated whenever I read a GANs paper! 😊

Notations:

  • p_{data}: Probability density/mass function of real data.
  • p_{g}/{d}: Probability density/mass function of generator/discriminator.
  • G/D: Generator/Discriminator.
  • z: Noise input vector to the generator.

Generative Adversarial Nets (NIPS 2014)

  • For G fixed, the optimal D is: D*{G} (x) = p{data}(x) / (p_{data}(x) + p_{g}(x)).
  • Global optimality: GANs has a global optimum for p_{g} = p_{data} (i.e., the generator perfectly replicating the real data distribution).
  • Essentially, the loss function of GAN quantifies the similarity between the p_{g} and p_{data} by JS divergence (symmetric) when the discriminator is optimal.
  • Convergence: If G and D have enough capacity, and at each step of training, the discriminator is allowed to reach its optimum, given G, and p_{g} is updated so as to improve the criterion then p_{g} converges to p_{data}.
  • G must not be trained too much without updating D, in order to avoid mode collapse in G.

NIPS 2016 Tutorial: Generative Adversarial Networks (Video version)

  • Note: The discussion is under the scope of vanilla GANs.
  • Training GANs requires finding the Nash equilibrium of a game, which is a more difficult problem than optimizing an objective function.
  • Simply flipping the sign on the discriminator's objective function for the generator (i.e., maximizing the cross-entropy loss of the discriminator) could make the generator's gradient be vanished when the discriminator successfully rejects generator samples with high confidence.
  • MLE (maximum likelihood estimation) is equivalent to minimizing KL divergence KL(p_{data} || p_{g}).
  • VAE (variational autoencoder) v.s. GAN: VAE maximizes MLE but GANs aims to generate realistic samples instead of maximizing MLE.
  • GANs minimizes JS divergence which is similar to minimizing reverse KL divergence (i.e. KL(p_{g} || p_{data}). (KL divergence is not symmetric).
  • GANs do not use MLE, but it can be do so by modifying the generator's objective function, under the assumption that the discriminator is optimal. GANs still generate realistic samples even using MLE. (See the paper "On Distinguishability Criteria for Estimating Generative Models" by Goodfellow. ICLR 2015. Also see the video at 55:00). Thus, the choice of the divergence (KL v.s. reverse KL) cannot explain why GANs can generate realistic samples.
  • Maybe it is the approximation strategy of using supervised learning to estimate the density ratio that leads to the generated samples very realistic. (See the video at 59:15)
  • GANs often choose to generate from very few modes; fewer than the limitation imposed by the model capacity. The reverse KL prefers to generate from as many modes of the data distribution as the model is able to; it does not prefer fewer modes in general. This suggests that the mode collapse is driven by a factor other than the choice of divergence.
  • Comparison to MLE and NCE: See On Distinguishability Criteria for Estimating Generative Models #25.
  • Training tricks:
  • Mode collapse is believed not be caused by minimizing the reverse KL, since minimizing the forward KL still happens mode collapse. The deficiency design of minimax game could be a reason causing mode collapse. See the paper "Unrolled Generative Adversarial Networks" that successfully generate different modes of data.
  • Model architectures that cannot capture global structure will cause generated images with wrong global structure.
  • See "A note on the evaluation of generative models" for a good overview of evaluating GANs.

Generative Adversarial Networks (GANs): What it can generate and What it cannot? (Arxiv 2018)

This paper summarizes many GANs papers for addressing different challenges. Nice summary!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant