# Generative adversarial network (GAN) basics

The `13_gan_basics` notebook introduces the foundational concepts of Generative Adversarial Networks (GANs), a powerful class of models used for generating realistic data. GANs consist of two networks, a Generator and a Discriminator, that are trained simultaneously in a competitive setting. 

This notebook covers preparing the dataset, defining the Generator model responsible for creating synthetic data, and defining the Discriminator model, which distinguishes between real and generated data.

## Table of contents

1. [Understanding GANs](#understanding-gans)
2. [Setting up the environment](#setting-up-the-environment)
3. [Preparing the dataset](#preparing-the-dataset)
4. [Defining the Generator model](#defining-the-generator-model)
5. [Defining the Discriminator model](#defining-the-discriminator-model)
6. [Conclusion](#conclusion)

## Understanding GANs

Generative Adversarial Networks (GANs) are a class of machine learning models used to generate new data that resembles a given dataset. GANs consist of two neural networks that compete against each other: the **generator** and the **discriminator**. The generator attempts to create realistic data, while the discriminator tries to distinguish between real data (from the training set) and fake data (created by the generator).

### **How GANs work**

GANs are based on a game-theoretic scenario where two players—the generator and the discriminator—are set against each other in a zero-sum game. The generator’s goal is to produce data that looks as real as possible, while the discriminator’s goal is to detect whether the input data is real (from the dataset) or fake (from the generator). Through this adversarial process, the generator gradually improves its ability to create realistic data, and the discriminator becomes better at distinguishing between real and generated data.

#### **Generator**

The generator is a neural network responsible for creating new data instances from random noise or latent vectors. Its goal is to generate data that is indistinguishable from the real data provided in the training set. The generator doesn’t have access to the real data directly. Instead, it learns through feedback from the discriminator.

The generator takes a random input (often referred to as the **latent vector**) and maps it to a data space that resembles the real data. Initially, the generator produces poor-quality samples, but over time, it learns to generate more realistic data as it receives feedback from the discriminator.

#### **Discriminator**

The discriminator is a separate neural network tasked with determining whether a given data instance is real or generated (fake). It acts as a binary classifier, outputting a probability indicating whether the input data is from the real dataset or the generator.

The discriminator is trained on both real data (labeled as real) and generated data (labeled as fake). Its goal is to correctly classify real and generated data. As the generator improves, the discriminator must also improve to distinguish between real and high-quality generated data.

### **The adversarial process**

The training process of GANs involves alternating between two phases:

1. **Training the discriminator**: The discriminator is presented with both real data from the training set and fake data produced by the generator. It is trained to classify real data as real and fake data as fake. The discriminator’s goal is to maximize its ability to correctly classify the inputs.
2. **Training the generator**: The generator is trained to produce data that fools the discriminator. The generator’s objective is to generate data that the discriminator misclassifies as real. The generator receives feedback from the discriminator in the form of gradients, which help it adjust its weights to produce more convincing fake data.

This adversarial process is iterative, with both networks improving over time. As the generator becomes better at creating realistic data, the discriminator must also improve its ability to detect fakes. The two networks are in constant competition, pushing each other to become more accurate.

### **Role of the latent space**

The generator in a GAN takes random noise as input, typically from a **latent space**. This latent space is a lower-dimensional representation of the data distribution, and the generator learns to map this latent space to the data distribution of the real dataset. By exploring different points in the latent space, the generator can create diverse data samples that resemble the training data.

In practical applications, exploring the latent space allows us to generate a wide range of data, even from random inputs. For example, in image generation, different points in the latent space correspond to different styles or variations of generated images.

### **Challenges in training GANs**

Training GANs is known to be challenging for several reasons:

- **Mode collapse**: Mode collapse occurs when the generator produces limited diversity in its outputs, effectively collapsing into generating only a few variations of data. This happens when the generator focuses too much on fooling the discriminator in specific ways without covering the entire data distribution.
- **Vanishing gradients**: If the discriminator becomes too strong early in training, it may classify all generator outputs as fake with high confidence. This results in very small gradient updates for the generator, hindering its ability to improve.
- **Training instability**: Since GANs involve two networks learning simultaneously, their losses are interdependent, leading to potential instability in the training process. Finding the right balance between the generator and discriminator’s learning rates and capacities is critical for stable training.

### **Applications of GANs**

GANs have gained popularity for their ability to generate high-quality, realistic data across a variety of domains. Some of the common applications of GANs include:

- **Image generation**: GANs can generate realistic images from random noise. They are widely used in tasks like photo-realistic image generation, face synthesis, and style transfer.
- **Data augmentation**: GANs can generate additional data samples to augment training datasets, which is useful in fields where labeled data is scarce.
- **Super-resolution**: GANs can be used to enhance the resolution of images, generating high-quality versions of low-resolution inputs.
- **Video generation**: GANs can generate video frames based on previous ones, leading to applications in video synthesis and prediction.
- **Art and creativity**: GANs are being used in creative fields to generate art, music, and other forms of media, often producing novel and unique outputs.
- **Text-to-image generation**: GANs can be used to generate images from textual descriptions, enabling tasks like automatic illustration or image captioning.

### **Variants of GANs**

Over time, several variants of the basic GAN framework have been developed to address specific challenges or extend GANs' capabilities:

- **DCGAN (Deep Convolutional GAN)**: This variant uses convolutional neural networks in the generator and discriminator to improve the quality of generated images.
- **Conditional GAN (cGAN)**: In cGANs, both the generator and discriminator are conditioned on additional information, such as class labels. This allows the generator to produce data that belongs to a specific class, enabling more control over the generated outputs.
- **CycleGAN**: This model enables image-to-image translation tasks without the need for paired training examples. It has been applied to tasks such as converting photos to paintings or transferring styles between different image domains.
- **Wasserstein GAN (WGAN)**: The WGAN improves training stability by using a different distance metric (the Wasserstein distance) to measure the difference between real and generated data distributions. This helps mitigate issues related to vanishing gradients and mode collapse.

### **Maths**

#### **The adversarial game**

GANs are built on a game-theoretic framework, where two neural networks—the generator $ G $ and the discriminator $ D $—compete in a zero-sum game. The generator $ G $ aims to generate data that mimics the real data distribution, while the discriminator $ D $ tries to distinguish between real data and fake (generated) data. This is formalized as a minimax problem.

The objective function for GANs is defined as:

$$
\min_G \max_D \mathbb{E}_{x \sim p_{\text{data}}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))]
$$

Where:
- $ G(z) $ is the generator’s output given a random input $ z $ from the latent space,
- $ D(x) $ is the discriminator’s estimate of the probability that input $ x $ is real,
- $ p_{\text{data}}(x) $ is the distribution of real data,
- $ p_z(z) $ is the distribution of the latent variable $ z $ (often Gaussian or uniform).

The generator tries to minimize this objective, attempting to generate realistic samples that maximize $ D(G(z)) $, while the discriminator tries to maximize the objective by distinguishing between real and fake data.

#### **Discriminator loss**

The discriminator’s objective is to maximize the probability of correctly identifying real data and minimizing the probability of misclassifying generated (fake) data. This is captured by the following loss function:

$$
L_D = - \mathbb{E}_{x \sim p_{\text{data}}(x)} [\log D(x)] - \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))]
$$

The first term $ \mathbb{E}_{x \sim p_{\text{data}}(x)} [\log D(x)] $ represents the discriminator's success in classifying real data correctly, while the second term $ \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))] $ reflects its ability to correctly classify generated (fake) data.

#### **Generator loss**

The generator’s objective is to fool the discriminator, i.e., to generate data such that the discriminator classifies it as real. The generator's loss function is thus:

$$
L_G = - \mathbb{E}_{z \sim p_z(z)} [\log D(G(z))]
$$

This is equivalent to maximizing $ \mathbb{E}_{z \sim p_z(z)} [\log D(G(z))] $, where $ G $ aims to maximize the discriminator’s probability of misclassifying the fake data as real.

#### **Training process**

The training of GANs involves alternating between two steps:

1. **Training the discriminator**: The discriminator $ D $ is trained to maximize the probability of correctly classifying real data and generated data. This corresponds to maximizing $ L_D $.
2. **Training the generator**: The generator $ G $ is trained to fool the discriminator, i.e., to generate data such that $ D(G(z)) $ is close to 1. This corresponds to minimizing $ L_G $.

In practice, the training process alternates between updating $ D $ (to make it better at distinguishing real from fake data) and updating $ G $ (to generate better, more realistic data).

#### **Gradients for the discriminator**

For the discriminator, the gradients with respect to its parameters $ \theta_D $ are calculated by applying backpropagation to its loss function:

$$
\frac{\partial L_D}{\partial \theta_D} = - \mathbb{E}_{x \sim p_{\text{data}}(x)} \left[ \frac{1}{D(x)} \frac{\partial D(x)}{\partial \theta_D} \right] - \mathbb{E}_{z \sim p_z(z)} \left[ \frac{1}{1 - D(G(z))} \frac{\partial D(G(z))}{\partial \theta_D} \right]
$$

This gradient is then used to update the discriminator’s parameters $ \theta_D $ using a gradient descent optimizer.

#### **Gradients for the generator**

For the generator, the gradients with respect to its parameters $ \theta_G $ are computed by applying backpropagation to the generator’s loss function:

$$
\frac{\partial L_G}{\partial \theta_G} = - \mathbb{E}_{z \sim p_z(z)} \left[ \frac{1}{D(G(z))} \frac{\partial D(G(z))}{\partial G(z)} \frac{\partial G(z)}{\partial \theta_G} \right]
$$

This gradient is used to update the generator’s parameters $ \theta_G $ so that it can produce better-quality data that can fool the discriminator more effectively.

#### **Nash equilibrium**

In theory, the optimal point in GAN training is a **Nash equilibrium**, where both the generator and the discriminator reach a point where neither can improve their objective. At equilibrium:
- The discriminator $ D $ cannot distinguish between real and generated data, i.e., $ D(x) = 0.5 $ for both real and generated data.
- The generator produces data that perfectly mimics the real data distribution.

Mathematically, at the Nash equilibrium, the generator’s distribution $ p_G $ matches the real data distribution $ p_{\text{data}} $, and the discriminator’s output is:

$$
D(x) = \frac{1}{2}, \forall x
$$

In practice, achieving this equilibrium is difficult, and training often requires careful balancing between the generator and discriminator to prevent one from overpowering the other.

#### **Mode collapse**

One common issue during GAN training is **mode collapse**, where the generator starts producing a limited variety of outputs, effectively collapsing to a single mode of the data distribution. This happens because the generator finds a subset of data that consistently fools the discriminator, leading to limited diversity in generated samples. Mathematically, the generator’s output distribution $ p_G $ may collapse to a distribution with very low variance, failing to cover the entire range of the real data distribution $ p_{\text{data}} $.

## Setting up the environment


##### **Q1: How do you install the necessary libraries for working with GANs in PyTorch?**


##### **Q2: How do you import the required modules for building and training GANs in PyTorch?**


##### **Q3: How do you set up the environment to use a GPU, and how do you fallback to CPU if necessary in PyTorch?**


##### **Q4: How do you set a random seed in PyTorch to ensure reproducibility when training a GAN?**

## Preparing the dataset


##### **Q5: How do you load an image dataset such as MNIST or CIFAR-10 using `torchvision.datasets` in PyTorch?**


##### **Q6: How do you apply transformations such as resizing and normalization to the dataset to prepare it for training a GAN?**


##### **Q7: How do you create DataLoaders in PyTorch to efficiently load batches of data for GAN training?**

## Defining the Generator model


##### **Q8: How do you define the architecture of the Generator model using PyTorch’s `nn.Module`?**


##### **Q9: How do you create the latent vector (noise) that serves as input to the Generator model?**


##### **Q10: How do you implement the forward pass for the Generator model in PyTorch to output fake data?**

## Defining the Discriminator model


##### **Q11: How do you define the architecture of the Discriminator model using PyTorch’s `nn.Module`?**


##### **Q12: How do you implement the forward pass for the Discriminator model to classify real and fake data?**


##### **Q13: How do you initialize the weights for both the Generator and Discriminator models in PyTorch?**

## Conclusion