# Generative Adversarial Network

Original Paper: https://arxiv.org/abs/1406.2661

Given a generative function $G$, the value of the discriminative function $D$ can be defined as:

$$
V(D, G) 
= \mathop{\mathbb{E}}_{\textbf{x}} \big[\ \text{log}\ D(\textbf{x})\ \big]
+ \mathop{\mathbb{E}}_{\textbf{z}} \big[\ \text{log}\ (1 - D(G(\textbf{z})))\ \big]
$$

In this equation, $\textbf{x} \sim  p_{\text{data}}(\textbf{x})$ is a random sample from the training data,
while $\textbf{z} \sim \text{U}$ is a random sample from the uniform distribution.
The first term in the sum rewards positives response for instances from the training data, while
the second term rewards negative responses for samples from the generator.

During training the models are updated one at a time. The cost of the discriminator is calculated on a batch of $m$ pairs of $(\textbf{x}^{(i)}, \textbf{z}^{(i)})$ as follows:

$$
C_D
= -\frac{1}{m} 
\sum_{i=1}^m
{\big[\ 
    \text{log}\ D(\textbf{x}^{(i)})
    + \text{log}\ (1 - D(G(\textbf{z}^{(i)})))\ 
\big]\ 
}
$$

The generator is trained on batch of $m$ samples of $(\textbf{z}^{(i)})$ with the following cost:
$$
C_G
= -\frac{1}{m} 
\sum_{i=1}^m
{\big[\ 
    \text{log}\ (D(G(\textbf{z}^{(i)})))\ 
\big]\ 
}
$$

Note that the generator does not look at the training data at all. Instead it learns to map $(\textbf{z}^{(i)})$ into a distribution that looks similar to $p_{\text{data}}(\textbf{x})$ by maximizing the discriminator's probability to respond positively to the generator's output $G(\textbf{z}^{(i)})$. The term inside of the log is not the same as the false negative cost for the discriminator. This adjustment increases the magnitude of the gradient when the generator's samples are bad (e.g. $D$ is almost always 0).

Combined training is the minimax optimization that converges on the training data distribution:

$$ \min_G\max_D V(D,G) $$

$$ G(\textbf{z}) \rightarrow p_{\text{data}}(\textbf{x}) $$

$$ D(G(\textbf{z})) \rightarrow 0.5 $$