<h1 style="font-size:30px;">Introduction to Generative Adversarial Networks</h1>

Generative Adversarial Networks (GANs), are a type of neural network architecture that can be used to generate synthetic data. Although we can use GANs to generate any type of data we want, they are best known for generating various types of image data. They are either completely unsupervised models or semi-supervised models, depending on the architecture and structure. For instance, take a look at the following images.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-this-person-does-not-exist.png" width="900">

None of the people above actually exist! All these images were generated by a Generative Adversarial Network, which was trained on millions of images of human faces. You can try this out yourself by visiting the website:  <a href="https://thispersondoesnotexist.com/" target="_blank">This Person Does Not Exist</a>. Every time you refresh the page, a neural network running in the backend will generate a face that does not exist. We have chosen some of the more impressive examples as shown above. If you visit this website, you may notice that a small percentage of generated images show unrealistic renderings, which also include some failure cases, but for the most part, many of the images are stunning and indistinguishable from photographs of real people. 

Let's now take a step back and look at where this all started back in 2014.

The very first <a href="https://arxiv.org/pdf/1406.2661.pdf" target="_blank">paper on GANs</a> was published by Goodfellow et al. in 2014. At the time, it was a simple, yet revolutionary idea. So much so that, Yann Lecun (the creator of convolutional neural networks) said "***[GANs]... and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion***"  <a href="https://qr.ae/pvMkDi" target="_blank">source</a>. Since that time, the number of research papers on GANs has exploded, and this remains a highly active field of research.

## Table of Contents

* [1 The General Architecture of GANs](#1-The-General-Architecture-of-GANs)
* [2 Applications and Types of GANs](#2-Applications-and-Types-of-GANs)
* [3 Pitfalls when Training GANs](#3-Pitfalls-when-Training-GANs)

## 1 The General Architecture of GANs

Since their inception in 2014, GANs have undegone various changes and more sophisticated techniques have been developed through the years that make them interesting yet powerful. But the underlying idea remains the same.

At the core, a GAN model is composed of two different neural networks. One is called the **Generator**, and the other one is called the **Discriminator**. At a high level, we can say that the generator network is responsible for generating fake images similar to the training set but completely new (not part of the training set), and the discriminator network is responsible for detecting whether the generated images are real or fake. It is a bit like game-playing, where the generator tries to fool the discriminator by generating more and more realistic images, and the discriminator tries to detect that the images generated by the generator are fake. As such, both networks slowly train each other to be better at what they do.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-architecture.png" width=900>

### 1.1 The General Intuition Behind GANs

The following description is from "Deep Learning with Python, 2nd Edition" (by Francois Chollet).

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

"An intuitive way to understand GANs is to imagine a forger trying to create a fake
Picasso painting. At first, the forger is pretty bad at the task. He mixes some of his
fakes with authentic Picassos and shows them all to an art dealer. The art dealer makes
an authenticity assessment for each painting and gives the forger feedback about what
makes a Picasso look like a Picasso. The forger goes back to his studio to prepare some
new fakes. As time goes on, the forger becomes increasingly competent at imitating
the style of Picasso, and the art dealer becomes increasingly expert at spotting fakes.
In the end, they have on their hands some excellent fake Picassos."

<hr style="border:none; height: 4px; background-color:#D3D3D3" />

A GAN is composed of two networks: a forger network (the Generator) and an expert network (the Discriminator), each being trained to best the other. In the GAN model, the generator and discriminator, both perform two different task:

* The generator tries to learn the distribution of the training dataset so that it can generate images similar to those.
* The discriminator works as a binary classifier. When we feed the original images from the dataset to the discriminator, it should be able to recognize them as real images. Similarly, when we feed it with the fake images generated by the generator, it should be able to recognize them as fake images as well.

As the training progresses, the generator tries to generate more realistic images which are similar to the training set so as to fool the discriminator. At the same time, the discriminator is also learning what the fake and real images look like so that it can correctly differentiate between the two. In such a manner, each network tries to outdo the other and simultaneously train each other to do a better job.

The generator always generates fake data from a latent noise vector. As training progresses it learns to generate more plausible images to fool the discriminator. When the generator is not able to generate real-enough images, the discriminator penalizes it for generating bad images. 

### 1.2 Generative vs Discriminative Models

Before moving into the core details of GANs, it's worth covering the distinction between generative and discriminative models. Most machine learning and deep learning models can be categorized as either generative or discriminative.

As an example, discriminative classifiers model the direct mapping from inputs to outputs.These include CNNs used to classify data (like ResNet, VGG, MobileNet, etc...). Other types of discriminative models include Support Vector Machines (SVMs), Logistic Regression, and Decision Trees.  From a simple mathematical point of view, discriminative models learn to model the conditional probability distribution, $P(y|x)$ directly without regard to any assumptions about the joint distribution $P(x,y)$. For example, given an image, $x$, a discriminative model predicts the class label, $y$, without any explicit knowledge of how $x$ and $y$ are distributed. 

Generative classifiers, on the other hand, learn a model of the joint probability distributions, $P(x,y)$ of the inputs, $x$, and class labels, $y$, from the training data and use the model distribution parameters to make predictions. The figure below highights the differneces between discriminative and generative models. Notice that the decision boundary can be different. However, the main point here is that generative models attempt to model the joint distribution of the input data and the associated labels and therefore **new data can be generated by sampling  the distributions.** 

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-discriminative-vs-generative-models.png" width=800>

In GANs, we don't model the data distribtion in a statistical sense by computing the parameters of the distribtuion (such as the mean and variance as shown above), but the generator in a GAN learns the distribution implicitly. A GAN is therefore able to generate new random instances of data that are similar to the training data. 

### 1.3 The Generator

The purpose of the generator in a GAN model is to generate realistic looking data that is indistinguishable from the training dataset. The input to the generator is a random noise vector (called a "latent" noise vector). Initially, the generator does not generate plausible images becuase it needs to be trained from the output of the discriminator (not shown), but it is important to remember that the generator never sees real training images (not even once). As the training progresses, the generator learns to create better quality images by utilizing the feedback from the discriminator, but the input is always a random noise vector. The noise vector is typically generated from a Gaussian distribution and is typically smaller than the size of the output space (i.e., the generator learns to upsample random noise to produce realistic-looking images). 

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-generator.png" width=450>

As mentioned above, the input to a GAN generator is a random input vector that represents a lower dimensional latent space from which the generator learns the mapping to image space. Latent spaces are best introduced by first studying autoencoders and variational autoencoders, but those topics are beyond the scope of this module. If you are interetsed in learning more about these topics we have provided the following link on <a href="https://www.youtube.com/watch?v=5WoItGTWV54&t=156s" target="_blank">generative models</a> and <a href="https://gaussian37.github.io/deep-learning-chollet-8-4/" target="_blank">variational autoencoders</a>. In many cases, GANs operate as unsupervised models, meaning we don't use class labels to train the model.


###  1.4 The Discriminator

The purpose of the discriminator is to classify real (training) and fake (generated) data correctly. Real data points have a label `1` and the fake data points (from the generator) have a label `0`. So the discriminator is simply a binary classifier. But the important feature of a GAN is that the output of the discriminator is used to train both the discriminator AND the generator.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-discriminator.png" width=550>


### 1.5 Complete GAN Architecture

Let's now see how these two networks are configured in a GAN architecture. The following figure shows the basic configuration of most GANs. Training the GAN involves a two-step process where the discriminator and generator are trained in a sequential manner. The process is summarized below, but in the following notebook we will provide much more detail regarding the intuition and implementation details for the training process. The key thing to remember is that the discriminator is just a binary classifier which is traning in the usual manner. Training the generator is much different beucase we only feed fake images to the discriminator and compute a loss for the generator that is based on "real" labels even though we used fake images.  

**Step 1: Train the Discriminator**

1. Use the generator to generate fake images.
2. Feed the discriminator **both** real and fake images.
3. Compute a **discriminator loss** based on the discriminator predictions and their associated ground truth labels.
4. Compute the gradients of the loss w.r.t. the trainable parameters of the discriminator.
5. Use back-propagation to update the parameters of the discriminator model.

**Step 2: Train the Generator**

1. Generate fake images.
2. Feed the discriminator **only** fake images.
3. Compute a **generator loss** based "real" labels for these (fake) images.
4. Compute the gradients of the loss w.r.t. the trainable parameters of the generator.
5. Use back-propagation to update the parameters of the generator model.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-architecture.png" width=850>

The architecture shown above represents a simple GAN model that illustrates the basic concept. The specific network architectures and training process tend to change depending on the task that we want to perform. This sometimes involves modifications to the loss functions as well. In the following notebook, we will model the above architecture in detail to demonstrate the implementation, including the individual architectures, loss function and other coding details required.

## 2 Applications and Types of GANs

### 2.1 Image Super-Resolution

This is very similar to image generation but here we use GANs to upsample low-resolution images to high-resolution ones. This is very helpful when we have older images taken from traditional cameras where normal upsampling techniques generate blurry images. In such cases, super-resolution GANs can create pretty sharp images without losing any detail or introducing blurriness. <a href="https://arxiv.org/pdf/1609.04802v5.pdf" target="_blank">SRGAN</a> is one of the best examples of GANs than can create high-resolution images.  

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-srgan-example.png">



### 2.2 Video Generation

GANs are also very good at generating videos which can speed up the process of content creators and enhance productivity as well. <a href="https://openaccess.thecvf.com/content/WACV2021/papers/Munoz_Temporal_Shift_GAN_for_Large_Scale_Video_Generation_WACV_2021_paper.pdf" target="_blank">Temporal Shift GAN</a> is one such example of GAN that can be used to generate videos.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-video-generation.png" width=1000>

### 2.3 Text-to-Image Synthesis

There are variants of GANs which can also synthesize images from text prompts.  The following is an example of text-to-image synthesis using <a href="https://arxiv.org/pdf/1612.03242.pdf" target="_blank">StackGAN</a>. StackGAN can synthesize high-quality images from text descriptions.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-stackgan-example.png">

 

### 2.4 Image-to-Image Translation

There are instances when we want to convert a sketch into an image, an image taken during the daytime to nighttime, or we may change the style of an image to that of a particular painter, say, Picasso. Well, we can do that using GANs trained for Image-to-Image Translation. One such GAN is the Pix2Pix GAN introduced in the paper <a href="https://arxiv.org/pdf/1611.07004.pdf" target="_blank">Image-to-Image Translation using Conditional Adversarial Networks</a>. The Pix2Pix GAN model can generate RGB images from segmentation maps, images from sketches, can convert black & white images to colored images, and even create aerial maps from satellite images.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-pix2pix-example.png">


### 2.5 Image Inpainting

There are cases when we have old photographs where one part of the photo may be damaged. Or just instances when some part of an image may be missing. We can use Image Inpainting GANs which can fill those gaps in images or those missing parts in photographs. Image inpainting was introcuced in the paper <a href="https://www.researchgate.net/publication/323904616_Patch-Based_Image_Inpainting_with_Generative_Adversarial_Networks#:~:text=We%20present%20an%20image%20inpainting%20method%20that%20is,global%20GAN%20%28G-GAN%29%20architecture%20with%20a%20patchGAN%20approach." target="_blank">Patch-Based Image Inpainting with Generative Adversarial Networks</a>.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-image-inpainting-example.png">



### 2.6 Face Aging

Face Aging is an application of GANs where we provide an image a person and the GAN generates a younger or older version of the same person. One important application of this technology is to aid investgators who are looking for missing persons (often years after they became missing). Generally, a Condtional Generative Adversarial Network is used to achieve this. It was introduced by Grigory Antipov et al. in the paper <a href="https://arxiv.org/pdf/1702.01983.pdf" target="_blank">Face Aging with Conditional Generative Adversarial Networks</a>.


<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-face-aging.png" width=600>

### 2.8 3D Object Generation

Another interesting application of GANs is the generation of 3D objects.We can use 3D convolutional layers to achieve this which use the volumetric data as well by learning the 3D latent space of the training data. This can have many real-life applications such as creating 3D models of objects without relying on expensive and very sophisticated software. You can learn more about 3D image generation using GANs <a href="http://3dgan.csail.mit.edu/" target="_blank">here</a>.

<img src="https://opencv.org/wp-content/uploads/2022/09/gan-3d-object-generation.png" width=800>

## 3 Pitfalls when Training GANs

Training GANs can be really tricky. Even when we take care of all the parameters and hyperparameters, there is a chance that the training may not go as intended, and you should be aware of the common issues that we may face while training GANs. 

### 3.1 Mode Collapse

Mode collapse is one of the most common issues in GAN training. Mode collapse refers to a condition in which the generator is only able to generate a limited set of modes from the data generating distribution. In some cases, the generator learns to fool the discriminator easily by generating similar types of images. Also, if the generator network is very large compared to the discriminator, then the discriminator may not be able to classify the images correctly in the initial few epochs. This also leads the generator to fool the discriminator by generating similar types of images. Such a scenario, where the generator generates similar types of images without much diversity, is called **mode collapse**.

The following figure shows an example of mode collapse that occurred while training a conditional GAN to generate the faces of celebrities. These images were generated after 40 epochs of training. Normally each image would represent a unique instance from the data distribution, but as shown below, the images are all nearly identical, with very little diversity. This is a classic example of mode collapse. One remedy that may help you avoid mode collapse is to experiment with a slightly different learning rate or batch size.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-mode-collapse.png" width=1000>


### 3.2 Training Loss

In the first figure below, we show the training loss curves for a GAN that has reached a pretty stable equilibrium point where the generator and discriminator losses are fairly stable. However, when training GANs it is not uncommon for the loss curves to show a trend that looks divergent. For example, a discriminator loss trending toward lower and lower values while the generator loss that continues to climb, as shown in the 2nd plot. In this situation, it's tempting to conclude that something must be wrong and that it's time to stop training. Although such a trend is not in equilibrium, it does not necessarily mean that the generated image quality is declining. This is why we recommend monitoring a small sample of generated images during the training process (at every epoch or every nth epoch). Also, with additional training, it's possible that the system will reach a more ideal state of equilibrium if given enough time to recover, so for both reasons, it's important to monitor the quality of the generated images during training and not to stop training pre-maturely unless it is evident that the image quality has stopped improving.

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-loss-ideal.png" width=850>

<img src="https://opencv.org/wp-content/uploads/2022/09/c4-gan-loss-not-ideal.png" width=850>