# Best Neural Networks for GANs

For GANs the neural networks D and G are trained simultaneously. The training process is a minimax game where D tries to maximize the probability of assigning the correct label to the training data and G tries to minimize the probability of D assigning the correct label to the generated data. The training process is repeated until the generator produces data that is indistinguishable from the training data.

## Discriminator Loss Optimization

We want estimate the best discriminator that minimizes the loss function. This is done by finding the best weight parameters W_D of the D neural network that minimizes the loss function. The discriminator loss function is defined as the cross entropy loss function. The cross entropy loss function is defined as:

$$
 argmin_{W_D} L_D(W_D) = \sum_{i} ( BCE(D(x_i; W_D), 0) + BCE(D(G(z_i; W_G)), 1) )
$$

and the binary cross entropy loss function is defined as:

$$
BCE(p, t) = -t * log(p) - (1 - t) * log(1 - p)
$$

where:
- $x_i$ is the training data
- $z_i$ is the noise data
- $W_D$ is the weight parameters of the $D$ neural network
- $W_G$ is the weight parameters of the $G$ neural network

## Generator Loss Optimization

We want estimate the best generator that minimizes the loss function. This is done by finding the best weight parameters $W_G$ of the $G$ neural network that minimizes the loss function. The generator loss function is defined as the cross entropy loss function. The cross entropy loss function is defined as:

$$
 argmin_{W_G} L_G(W_G) = \sum_{i} ( BCE(D(G(z_i; W_G)), 0) ) = \sum_{i} ( -log(D(G(z_i; W_G))) )
$$

or alternatively:

$$
 argmax_{W_G} L_G(W_G) = \sum_{i} ( Log(1 - D(G(z_i; W_G))) )
$$

which has been observed to train faster and more stable because the generator loss saturates less often.

$(W_G, W_D)$ are the pair of weight parameters of the G and D neural networks.

## Optimal Discriminator & Generator

For a given Generator G with fixed $W_G$, the optimal discriminator is given by optimizing the following loss function:

$$
Opt_D(W_G) = argmin_{W_D} L_D(W_D) = \sum_{i} ( BCE(D(x_i; W_D), 0) + BCE(D(G(z_i; W_G)), 1) )
$$

For a given Discriminator D with fixed $W_D$, the optimal generator is given by optimizing the following loss function:

$$
Opt_G(W_D) = argmin_{W_G} L_G(W_G) = \sum_{i} ( BCE(D(G(z_i; W_G)), 0) )
$$


$$
Opt(W_G, W_D) = (Opt_D(W_G), Opt_G(W_D)) 
$$

is the optimal pair of weight parameters of the G and D neural networks, given the other.

Note the domain and range of the functions $OPT_G$ and $OPT_D$

$$
OPT_D: \mathbb{R}^{\text{tensor\_shape}(W_D)} \rightarrow \mathbb{R}^{\text{tensor\_shape}(W_G)}
$$

$$
OPT_G: \mathbb{R}^{\text{tensor\_shape}(W_G)} \rightarrow \mathbb{R}^{\text{tensor\_shape}(W_D)}
$$

## Brouwer's Fixed Point Theorem

Let $f: X \to X$ be a continuous function on a compact convex set  $X$ in a Euclidean space. Then, there exists at least one point  $x^* \in X$ such that:

$$
f(x^*) = x^*
$$

### Application to GANs

$OPT$ is effectively continuous on a compact convex set on the Euclidean space.

This means that there exists a fixed point $s$ for the function $OPT$.

$$
OPT(s) = s
$$

Let's call that $s = (W^{*}_D, W^{*}_G)$.

## Conclusion: GANs Existence

This is another way of saying that there exists a stable point where generator and discriminator can reach a Nash equilibrium.

