Notes on Theoretical and Heuristic Findings in GANs papers #24

howardyclo · 2018-07-28T17:27:27Z

Preface

In this note, I'll continue recording several findings whatever I think it's important or useful. I'll be focusing on the theoretical and heuristic parts in several GANs papers. This thread will be actively updated whenever I read a GANs paper! 😊

Notations:

p_{data}: Probability density/mass function of real data.
p_{g}/{d}: Probability density/mass function of generator/discriminator.
G/D: Generator/Discriminator.
z: Noise input vector to the generator.

Generative Adversarial Nets (NIPS 2014)

For G fixed, the optimal D is: D*{G} (x) = p{data}(x) / (p_{data}(x) + p_{g}(x)).
Global optimality: GANs has a global optimum for p_{g} = p_{data} (i.e., the generator perfectly replicating the real data distribution).
Essentially, the loss function of GAN quantifies the similarity between the p_{g} and p_{data} by JS divergence (symmetric) when the discriminator is optimal.
Convergence: If G and D have enough capacity, and at each step of training, the discriminator is allowed to reach its optimum, given G, and p_{g} is updated so as to improve the criterion then p_{g} converges to p_{data}.
G must not be trained too much without updating D, in order to avoid mode collapse in G.

NIPS 2016 Tutorial: Generative Adversarial Networks (Video version)

Note: The discussion is under the scope of vanilla GANs.
Training GANs requires finding the Nash equilibrium of a game, which is a more difficult problem than optimizing an objective function.
Simply flipping the sign on the discriminator's objective function for the generator (i.e., maximizing the cross-entropy loss of the discriminator) could make the generator's gradient be vanished when the discriminator successfully rejects generator samples with high confidence.
MLE (maximum likelihood estimation) is equivalent to minimizing KL divergence KL(p_{data} || p_{g}).
VAE (variational autoencoder) v.s. GAN: VAE maximizes MLE but GANs aims to generate realistic samples instead of maximizing MLE.
GANs minimizes JS divergence which is similar to minimizing reverse KL divergence (i.e. KL(p_{g} || p_{data}). (KL divergence is not symmetric).
GANs do not use MLE, but it can be do so by modifying the generator's objective function, under the assumption that the discriminator is optimal. GANs still generate realistic samples even using MLE. (See the paper "On Distinguishability Criteria for Estimating Generative Models" by Goodfellow. ICLR 2015. Also see the video at 55:00). Thus, the choice of the divergence (KL v.s. reverse KL) cannot explain why GANs can generate realistic samples.
Maybe it is the approximation strategy of using supervised learning to estimate the density ratio that leads to the generated samples very realistic. (See the video at 59:15)
GANs often choose to generate from very few modes; fewer than the limitation imposed by the model capacity. The reverse KL prefers to generate from as many modes of the data distribution as the model is able to; it does not prefer fewer modes in general. This suggests that the mode collapse is driven by a factor other than the choice of divergence.
Comparison to MLE and NCE: See On Distinguishability Criteria for Estimating Generative Models #25.
Training tricks:
- Virtual batch norm > batch norm (avoid to generate highly correlated samples within a batch)
- See more on "How to Train a GAN? Tips and tricks to make GANs work" by Chintala et al.
Mode collapse is believed not be caused by minimizing the reverse KL, since minimizing the forward KL still happens mode collapse. The deficiency design of minimax game could be a reason causing mode collapse. See the paper "Unrolled Generative Adversarial Networks" that successfully generate different modes of data.
Model architectures that cannot capture global structure will cause generated images with wrong global structure.
See "A note on the evaluation of generative models" for a good overview of evaluating GANs.

Generative Adversarial Networks (GANs): What it can generate and What it cannot? (Arxiv 2018)

This paper summarizes many GANs papers for addressing different challenges. Nice summary!

howardyclo added the Generative Adversarial Networks label Jul 28, 2018

howardyclo mentioned this issue Aug 1, 2018

On Distinguishability Criteria for Estimating Generative Models #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on Theoretical and Heuristic Findings in GANs papers #24

Notes on Theoretical and Heuristic Findings in GANs papers #24

howardyclo commented Jul 28, 2018 •

edited

Loading

Notes on Theoretical and Heuristic Findings in GANs papers #24

Notes on Theoretical and Heuristic Findings in GANs papers #24

Comments

howardyclo commented Jul 28, 2018 • edited Loading

Preface

Notations:

Generative Adversarial Nets (NIPS 2014)

NIPS 2016 Tutorial: Generative Adversarial Networks (Video version)

Generative Adversarial Networks (GANs): What it can generate and What it cannot? (Arxiv 2018)

howardyclo commented Jul 28, 2018 •

edited

Loading