In [1]:
%run Latex_macros.ipynb
%run beautify_plots.py

<IPython.core.display.Latex object>

# Generative Adversarial Networks: creating realistic fake examples

**Aside**

The [GAN](https://arxiv.org/pdf/1406.2661.pdf) was invented by Ian Goodfellow in one night, following a party at a [bar](https://www.technologyreview.com/2018/02/21/145289/the-ganfather-the-man-whos-given-machines-the-gift-of-imagination/) !

Our goal is to generate new *synthetic* examples.

Let
- $\x$ denote a *real* example
    - vector of length $n$
- $\pdata$ be the distribution of real examples
   - $\x \in \pdata$
   

We will create a Neural Network called the *Generator*

Generator $G_{\Theta_G}$ (parameterized by $\Theta_G$) will
- take a vector $\z$ of random numbers from distribution $p_\z$ as input
- and output $\hat{\x}$ 
- a *synthetic/fake* example
    - vector of length $n$

Let
- $\pmodel$ be the distribution of fake examples

<table>
    <tr>
        <th><center>GAN Generator</center></th>
    </tr>
    <tr>
        <td><img src="images/GAN_generator.png"></td>
    </tr>
</table>

The Generator will be paired with another Neural Network called the *Discriminator*.

The Discriminator $D_{\Theta_D}$ (parameterized by $\Theta_D$) is a binary Classifier
- takes a vector $\tilde{\x} \in \pdata \cup \pmodel$

**Goal of Discriminator**
$$
\begin{array} \\
D( \tilde{\x} ) & = & \text{Real} & \text{ for } \tilde{\x} \in p_\text{data} \\
D (\tilde{\x} ) & = &\text{Fake}  &\text{ for } \tilde{\x} \in p_\text{model}
\end{array}
$$

That is
- the Discriminator tries to distinguish between Real and Fake examples

<table>
    <tr>
        <th><center>GAN Discriminator</center></th>
    </tr>
    <tr>
        <td><img src="images/GAN_discriminator.png"></td>
    </tr>
</table>

In contrast, the goal of the Generator

**Goal of Generator**
$$
\begin{array} \\
D (\hat{\x} ) & = & \text{Real} & \text{ for } \hat{\x} = G_{\Theta_G}(\z)  \in p_\text{model}
\end{array}
$$

That is
- the Generator tries to create fake examples that can fool the Discriminator into classifying as Real

How is this possible ?

We describe a training process (that updates $\Theta_G$ and $\Theta_D$)
- That follows an *iterative* game
- Train the Discriminator to distinguish between 
    - Real examples
    - and the Fake examples produced by the Generator on the prior iteration
- Train the Generator to produce examples better able to fool the updated Discriminator

Sounds reasonable, but how do we get the Generator to improve it's fakes ?

We will define loss functions 
- $\loss_G$ for the Generator
- $\loss_D$ for the Discriminator

Then we can improve the Generator (parameterized by $\Theta_G$) by Gradient Descent
- updating $\Theta_G$ by $- \frac{\partial\loss_G}{\partial {\Theta_G}}$

That is
- The Discriminator will indirectly give "hints" to the Generator as to why a fake example failed to fool

<table>
    <tr>
        <th><center>GAN Generator training</center></th>
    </tr>
    <tr>
        <td><img src="images/GAN_generator_train.png"></td>
    </tr>
</table>

<table>
    <tr>
        <th><center>GAN Discriminator training</center></th>
    </tr>
    <tr>
        <td><img src="images/GAN_discriminator_train.png"</td>
    </tr>
</table>

After enough rounds of the "game" we hope that the Generator and Discriminator battle to a stand-off
- the Generator produces realistic fakes
- the Discriminator has only a $50 \%$ chance of correctly labeling a fake as Fake

**Notation**

text | meaning                       
:----|:---|
<img width=100 /> | <img width=300 /> 
$p_\text{data}$ | Distribution of real data 
$\x \in p_\text{data}$  | Real sample 
$p_\text{model}$ | Distribution of fake data 
$\hat{\x}$ | Fake sample
           | $\hat{\x} \not\in p_\text{data}$ 
           | $\text{shape}(\hat{\x}) = \text{shape} ( \x ) $
           $\tilde{\x}$ | Sample (real of fake)
             | $\text{shape} ( \tilde{\x} ) =\text{shape}(\x)$
$D_{\Theta_D}$ | Discriminator NN, parameterized by $\Theta_D$ 
               | Binary classifier:  $\tilde{\x} \mapsto \{ \text{Real}, \text{Fake} \} $
               | $D_{\Theta_D}(\tilde{x}) \in \{ \text{Real}, \text{Fake} \} \text{ for } \text{shape}(\tilde{\x}) = \text{shape}(\x)$ 
$\z$ | vector or randoms with distribution $p_\z$
$G_{\Theta_G}$  | Generator NN, parameterized by $\Theta_G$  
                | $\z \mapsto \hat{\x}$
                | $\text{shape}( G(\z) ) = \text{shape}(\x)$
                | $G(\z) \in p_\text{model}$



# Loss functions

The goal of the generator can be stated as
- Creating $\pmodel$ such that
- $\pmodel \approx \pdata$



 
There are a number of ways to measure the dis-similarity of two distributions
- KL divergence
    - equivalent to Maximum Likelihood estimation
- Jensen Shannon Divergence (JSD)
- Earth Mover Distance (Wasserstein GAN)

The original paper choose the minimization of the KL divergence, so we illustrate with that measure.

To be concrete. let the Discriminator uses labels
- $1$ for Real
- $0$ for Fake


The Discriminator tries to maximize

$$
- \loss_D = 
\begin{cases} 
\log D(\tilde{\x}) & \text{ when } \tilde{\x} \in \pdata \\
1 - \log D(\tilde{\x}) & \text{ when } \tilde{\x} \in \pmodel \\
\end{cases}
$$

That is
- Classify real $\x$ as Real
- Classify fake $\hat{\x}$ as Fake

The per-example Loss for the Generator is 
$$\loss_G = 1 - \log D(G(\z))$$

which is achieved when the fake example 
$$D(G(\z)) = 1$$

That is
- the Discriminator mis-classifies the fake example as Real

So the iterative game seeks to solve a minimax problem

$$
\min{G}\max{D} \left( { \mathbb{E}_{\x \in p_\text{data}} \log D(\x) + \mathbb{E}_{\z \in p_z} ( 1 - \log D(G(\z))} \right)
$$
- $D$ tries to 
    - make $D(\x)$ big: correctly classify (with high probability) real $\x$
    - and $D(G(\z))$ small: correctly classify (with low probability) fake $G(\z))$
- $G$ tries to
    - make $D(G(\z))$ high: fool $D$ into a high probability for a fake

Note that the Generator improves 
- by updating $\Theta_G$
- so as to increase $D(G(\z))$
    - the mis-classification of the fake as Real

# Training

We will train Generator $G_{\Theta_G}$ Discriminator $D_{\Theta_D}$ by turns
- creating sequence of updated parameters
    - $\Theta_{G, (1)} \ldots \Theta_{G,(T)}$
    - $\Theta_{D, (1)} \ldots \Theta_{D,(T)}$
- Trained *competitively*

**Competitive training**

Iteration $\tt$

- Train $D_{\Theta_{D, (\tt-1)}}$ on samples
    - $\tilde{\x} \in p_\text{data} \cup p_{\text{model}, (\tt-1)}$
        - where $G_{\Theta_{G, (\tt-1)}} ( \z) \in p_{\text{model}, (\tt-1)}$
    - Update $\Theta_{D, (\tt-1)}$ to $\Theta_{D, \tp}$ via gradient $\frac{\partial \loss_D}{\partial \Theta_{D,(\tt-1)}}$
        - $D$ is a maximizer of $\int_{\x \in p_\text{data}} \log D(\x) + \int_{\z \in p_\z} \log ( \, 1 - D(G(\z)) \, )$
- Train $G_{\Theta_{G, (\tt-1)}}$ on random samples $\z$
    - Create samples $\hat{\x}_\tp \in G_{\Theta_{G, (\tt-1)}}(\z)  \in p_\text{model}$
    - Have Discriminator $D_{\Theta_{D, \tp}}$ evaluate $D_{\Theta_{D,\tp}} ( \hat{\x}_\tp )$
    - Update $\Theta_{G, (\tt-1)}$ to $\Theta_{G, \tp}$ via gradient $\frac{\partial \loss_G}{\partial \Theta_{G,(\tt-1)}}$
        - $G$ is a minimizer of $\int_{\z \in p_\z} \log ( \, 1 - D(G(\z)) \, )$
            - i.e., want $D(G(\z))$ to be high
    - May update $G$ multiple times per update of $D$

**Training code for a simple GAN**

[Here](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/generative/ipynb/dcgan_overriding_train_step.ipynb#scrollTo=AOO8AqLy86jb)
       is the code for the training step of a simple GAN.

# Code

- [GAN on Colab](https://keras.io/examples/generative/dcgan_overriding_train_step/)
- [Wasserstein GAN with Gradient Penalty](https://keras.io/examples/generative/wgan_gp/#create-the-wgangp-model)