# Demystifying GAN Loss Function

Now that we have understood how GANs work in detail, we will examine the loss function of GAN. Before going ahead let us recap the notations. 

* A noise which is fed as an input to the generator is represented by $z$ 

* Uniform or normal distribution from which the noise $z$ is sampled is represented by $p_z$

* An input image is represented by $x$

* Real data distribution i.e distribution of our training set is represented by $p_r$

* Fake data distribution i.e distribution of the generator is represented by $p_g$

When we write, $x \sim p_{r}(x)$ , it implies that image $x$ is sampled from the real distribution $p_r$
. Similarly, $x \sim p_{g}(x)$ denotes that image $x$ is sampled from the generator
distribution $p_g$  and $z \sim p_{z}(z)$ implies that the generator input $z$ is sampled from the
uniform distribution $p_z$.

As we learned that both the generator and discriminator are neural networks and both of
them update their parameters through backpropagation. We need to find the
optimal generator parameter $\theta_g $ and discriminator parameter $\theta_d$.

## Discriminator Loss 

Now we will see the loss function of the discriminator. We know that the goal of the
discriminator is to classify whether the image is real or fake image. Let us denote
discriminator by $D$.

The loss function of the discriminator is given as, 

$$\max _{d} L(D, G)=\mathbb{E}_{x \sim p_{r}(x)}\left[\log D\left(x ; \theta_{d}\right)\right]+\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

What does this mean though? Let us see each term by term. 

### First term

Let us look at the first term,

$$ \mathbb{E}_{x \sim p_{r}} \log (D(x))$$

* $x \sim p_{r}(x)$  implies we are sampling input $x$ from the real data distribution $p_r$, so $x$ is a
real image. 

* $D(x)$ implies that we are feeding the input image $x$ to the discriminator $D$ and it will
return the probability of input image $x$ to be a real image. 

Since we know that $x$ is a real image i.e from a real data distribution, we need to maximize the probability of $D(x)$:

$$\max D(x)$$

But instead of maximizing raw probabilities we maximize log probabilities as we learned in
chapter 7, we can write, 

$$ \max \log D(x)$$

So our final equation becomes:

$$\max \mathbb{E}_{x \sim p_{r}(x)}[\log D(x)]$$

__$\mathbb{E}_{x \sim p_{r}(x)}[\log D(x)]$ implies the expectations of the log likelihood of
input images sampled from the real data distribution being real.__

### Second term

Now, let us look at the second term

$$\mathbb{E}_{z \sim p_{(z)}}[\log (1-D(G(z)))] $$


* $z \sim p_{z}(z)$ implies we are sampling a random noise $z$ from the uniform distribution $p_z$.

* $G(z)$ implies that the generator $G$ takes the random noise $z$ as an input and returns an
image based on its implicitly learned distribution $p_g$.

* $D(G(z))$  implies we are feeding the image generated by the generator to the
discriminator $D$ and it will return the probability that input image to be a real image. 


If we subtract 1 from $D(G(z))$  then it will return the probability of the input image being
a fake image.

$$1-D(G(z))$$

Since we know $z$ is not a real image, the discriminator will maximize this probability,
ie discriminator maximizes the probability $z$ of being classified as a fake image. So we write

$\max 1-D(G(z))$

Instead of maximizing raw probabilities, we maximize the log probability, so we write,

$$ \max \log (1-D(G(z)))$$

__$\mathbb{E}_{z \sim p_{z}(z)}[\log (1-D(G(z)))]_{\mathrm{i}}$  implies the expectations i.e expectations of the log
likelihood of input images generated by the generator being fake.__

### Final term

So, combining these two terms, loss function of the discriminator is given as,

$$ \max _{d} L(D, G)=\mathbb{E}_{x \sim p_{r}(x)}\left[\log D\left(x ; \theta_{d}\right)\right]+\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

Where $\theta_d$ and $\theta_g$ are the parameters of the discriminator and generator network
respectively

## Generator loss

The loss function of the generator can be given as,

$$ \min _{g} L(D, G)=\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

We know that the goal of the generator is to fool the discriminator to classify the fake image
as a real image. 

In the previous section, we saw, $\mathbb{E}_{z \sim p_{z}(z)}[\log (1-D(G(z)))]_{\mathrm{}}$    implies the probability of classifying the input image as a
fake image and the discriminator maximizes this probabilities for correctly classifying the
fake image as fake. 


But the generator wants to minimize this probability. As the generator wants to fool the
discriminator, it minimizes this probability of input image being classified as fake. The loss
function of the generator can be given as,

$$\min _{g} L(D, G)=\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

## Total Loss


We just learned the loss function of generator and discriminator, combining these two
losses, we write our final loss function can be written as,

$$ \min _{G} \max _{D} L(D, G)=\mathbb{E}_{x \sim p_{r}(x)}[\log D(x)]+\mathbb{E}_{z \sim p_{z}(z)}[\log (1-D(G(z)))]$$


So our objective function is basically a min-max objective function i.e maximization for the
discriminator and minimization for the generator and we find the optimal generator
parameter $\theta_g$ and discriminator parameter $\theta_d$ through backpropagating the respective
networks.



So we perform gradient ascent i.e maximization on the discriminator and update the discriminator parameter $\theta_d$:
   
   $$ \nabla_{\theta_{d}} \frac{1}{m} \sum_{i=1}^{m}\left[\log D\left(\boldsymbol{x}^{(i)}\right)+\log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)\right]$$
   
   
And gradient descent i.e minimization on the generator and update the generator parameter $\theta_g$:

$$\nabla_{\theta_{g}} \frac{1}{m} \sum_{i=1}^{m} \log \left(1-D\left(G\left(\boldsymbol{z}^{(i)}\right)\right)\right)$$

However, optimizing the above generator objective does not work properly and causes a
stability issue. So we introduce a new form of loss called heuristic loss. 

## Heuristic Loss

There is no change in the loss function of the discriminator it is written as,

$$ \max _{d} L(D, G)=\mathbb{E}_{x \sim p_{r}(x)}\left[\log D\left(x ; \theta_{d}\right)\right]+\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

Now, let us look at the generator loss, 

$$ \min _{g} L(D, G)=\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right] $$

Can we change it to a maximizing equation just like our discriminators? How can we do
that? We know that $ 1-D(G(Z)$ returns the probability of input image being fake and
generator is minimizing this probability. 


Instead of doing this, we can write $D(G(z))$ it implies the probability of input image
being real and now our generator can maximize this probability. It implies a generator is
maxing the probability of the input fake image being classified as a real image. So the loss
function of our generator now becomes,

$$\max _{g} L(D, G)=\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

So, now we have both the loss function of our discriminator and generator as maximizing
terms i.e,

$$\max _{d} L(D, G)=\mathbb{E}_{x \sim p_{r}(x)}\left[\log D\left(x ; \theta_{d}\right)\right]+\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(1-D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$

$$\max _{g} L(D, G)=\mathbb{E}_{z \sim p_{z}(z)}\left[\log \left(D\left(G\left(z ; \theta_{g}\right) ; \theta_{d}\right)\right)\right]$$



But instead of maximizing, if we can minimize the loss then we can apply our favorite
gradient descent algorithms. Now how can we convert our maximizing problem into a
minimization problem? It;'s so simple, just add a negative sign.
So, our final loss function for the discriminator is given as,


$$ \boxed{L^{D}=-\mathbb{E}_{x \sim p_{r}(x)}[\log D(x)]-\mathbb{E}_{z \sim p_{z}(z)}[\log (1-D(G(z))]}$$



and the generator loss is,

$$ \boxed{L^{G}=-\mathbb{E}_{z \sim p_{z}(z)}[\log (D(G(z)))]}$$

In the next section, we will learn how to use GAN to generate images of handwritten digits. 