# GAN Loss Functions

## Introduction

There are a few special kinds of functions when working with GANs. These functions are used to train the generator and the discriminator. In this notebook, we will discuss the following loss functions:

1. Discriminator Loss
2. Generator Loss
3. Binary Cross Entropy Loss
4. Wasserstein Loss
5. KL Divergence Loss
6. JSD Loss

## GAN Loss in Terms of Cross-Entropy Loss

### Discriminator Loss

The discriminator $D(r)$ outputs the probability that $r$ is real (i.e., sampled from $p_r$).

- For real samples $r \sim p_{r}$, the true label is $1$, and the discriminator wants to maximize $\log D(r)$.

- For fake samples $f \sim P_{F}$, the true label is $0$, and the discriminator wants to maximize $\log(1 - D(f))$.

The discriminator's objective can be written as:
$$
L_D = - \mathbb{E}_{R} [\log D(R)] - \mathbb{E}_{Z} [\log(1 - D(G(Z)))]
$$

This is equivalent to the binary cross-entropy loss for the discriminator:
$$
L_D = \text{BCE}(D(R), 1) + \text{BCE}(D(G(Z)), 0)
$$

where:

$$
\text{BCE}(p, y) = -[y \log p + (1-y) \log (1-p)]
$$

is the binary cross-entropy loss, and:
- $D(R)$: The discriminator's prediction for real data.
- $D(G(Z))$: The discriminator's prediction for generated data.

## Generator Loss

The generator $G(Z)$ aims to produce samples that the discriminator classifies as real. For generated samples $f$, the generator's objective is to maximize $\log D(G(Z))$, which is equivalent to minimizing:
$$
L_G = - \mathbb{E}_{Z} [\log D(G(Z))]
$$

This is also a binary cross-entropy loss, where the true label for generated samples is $1$ (i.e., the generator wants the discriminator to classify fake samples as real):
$$
L_G = \text{BCE}(D(G(Z)), 1)
$$


## Alternate Formulation for Generator Loss

In some GAN formulations, the generator minimizes $\log(1 - D(G(Z)))$ instead of maximizing  $\log D(G(Z))$, leading to:
$$
L_G' = - \mathbb{E}_{Z} [\log(1 - D(G(Z)))]
$$

This corresponds to the generator trying to "trick" the discriminator indirectly, but it has weaker gradients when $D(G(Z))$ is close to $0$. Therefore, the $\log D(G(Z))$ version is more commonly used for stable training.

## Connection to Cross-Entropy

Ultimately, both the discriminator and generator losses in GANs are defined using binary cross-entropy loss. The difference is in how the true labels are assigned:
- For the discriminator: $1$ for real samples, $0$ for fake samples.
- For the generator: $1$ for generated samples (as it tries to fool the discriminator).
