# Comprehensive Tutorial on Basic GANs

Generative Adversarial Networks (GANs) are a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014. They consist of two neural networks, a generator and a discriminator, which are trained simultaneously through adversarial processes. The generator aims to produce realistic data, while the discriminator tries to distinguish between real and generated data.

## Mathematical Foundations

1. **Generator (G)**: This network takes a random noise vector $( \mathbf{z} )$ from a prior distribution $( p_{\mathbf{z}} )$ (often a Gaussian or uniform distribution) and maps it to the data space $( G(\mathbf{z}; \theta_G) )$. The generator's objective is to generate data that resembles the true data distribution $( p_{\text{data}} )$.

2. **Discriminator (D)**: This network takes a data sample (either real or generated) and outputs a single scalar $( D(\mathbf{x}; \theta_D) )$ representing the probability that the sample is real. The discriminator's objective is to correctly classify real and generated samples.

The networks are trained with the following min-max objective function:
$$
\min_G \max_D V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log (1 - D(G(\mathbf{z})))]
$$

## Training Procedure

The training of GANs involves the following steps, typically repeated iteratively:

1. **Sample real data** $( \mathbf{x} \sim p_{\text{data}} )$.
2. **Sample noise** $( \mathbf{z} \sim p_{\mathbf{z}} )$ and generate fake data $( \hat{\mathbf{x}} = G(\mathbf{z}) )$.
3. **Update Discriminator**:
   - Compute discriminator loss: $L_D = -\left(\mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log (1 - D(G(\mathbf{z})))]\right)
   $
   - Perform a gradient descent step on $ L_D $ to update $ \theta_D $.
   
4. **Update Generator**:
   - Compute generator loss using the non-saturating loss: $L_G' = -\mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log D(G(\mathbf{z}))]
  $
   - Perform a gradient descent step on $ L_G' $ to update $ \theta_G $.

## Mathematical Derivatives of the GAN Training Process

To delve deeper into the training process of GANs, we need to examine the mathematical derivatives that guide the optimization of both the generator and the discriminator.

### Discriminator Training

The discriminator aims to maximize the probability of correctly classifying real and generated samples. The loss function for the discriminator is:
$$
L_D = -\left( \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log (1 - D(G(\mathbf{z})))] \right)
$$

To update the discriminator, we compute the gradient of $ L_D $ with respect to the discriminator's parameters $ \theta_D $:
$$
\nabla_{\theta_D} L_D = -\mathbb{E}_{\mathbf{x} \sim p_{\text{data}}} \left[ \frac{1}{D(\mathbf{x})} \nabla_{\theta_D} D(\mathbf{x}) \right] - \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}} \left[ \frac{1}{1 - D(G(\mathbf{z}))} \nabla_{\theta_D} D(G(\mathbf{z})) \right]
$$

### Generator Training

The generator aims to fool the discriminator, which can be framed as minimizing the following objective:
$$
L_G = -\mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log D(G(\mathbf{z}))]
$$

Alternatively, the generator's loss can be derived from the discriminator's perspective. The generator minimizes the following:
$$
L_G = \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log (1 - D(G(\mathbf{z})))]
$$

To update the generator, we compute the gradient of $ L_G $ with respect to the generator's parameters $ \theta_G $:
$$
\nabla_{\theta_G} L_G = -\mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}} \left[ \frac{1}{D(G(\mathbf{z}))} \nabla_{\theta_G} D(G(\mathbf{z})) \right]
$$

### Improved Training with Non-Saturating Loss

One of the problems with the original GAN formulation is that when the discriminator becomes too good, the gradients for the generator vanish. To address this, the non-saturating loss for the generator is often used:
$$
L_G' = -\mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}[\log D(G(\mathbf{z}))]
$$

This alternative loss helps maintain significant gradients for the generator, even when the discriminator performs well.

### Training Procedure with Gradients

The training procedure of GANs with the detailed gradient steps is as follows:

1. **Discriminator Update**:
    - Sample real data $ ( \mathbf{x} \sim p_{\text{data}} ) $.
    - Sample noise $ ( \mathbf{z} \sim p_{\mathbf{z}} ) $ and generate fake data $ ( \hat{\mathbf{x}} = G(\mathbf{z}) ) $.
    - Compute the discriminator loss:
      $$
      L_D = -\left( \log D(\mathbf{x}) + \log (1 - D(\hat{\mathbf{x}})) \right)
      $$
    - Compute gradients:
      $$
      \nabla_{\theta_D} L_D = -\left( \frac{\nabla_{\theta_D} D(\mathbf{x})}{D(\mathbf{x})} + \frac{\nabla_{\theta_D} D(\hat{\mathbf{x}})}{1 - D(\hat{\mathbf{x}})} \right)
      $$
    - Update $ \theta_D $ using gradient descent.

2. **Generator Update**:
    - Sample noise $ ( \mathbf{z} \sim p_{\mathbf{z}} ) $.
    - Generate fake data $ ( \hat{\mathbf{x}} = G(\mathbf{z}) ) $.
    - Compute the generator loss using the non-saturating loss:
      $$
      L_G' = -\log D(\hat{\mathbf{x}})
      $$
    - Compute gradients:
      $$
      \nabla_{\theta_G} L_G' = -\frac{\nabla_{\theta_G} D(\hat{\mathbf{x}})}{D(\hat{\mathbf{x}})}
      $$
    - Update $ \theta_G $ using gradient descent.

## Key Innovations

1. **Adversarial Loss**: The adversarial loss allows the generator to produce highly realistic data by constantly improving to fool the discriminator.
2. **Two-player Game**: The interplay between the generator and discriminator models provides a dynamic and robust training procedure.
3. **Unsupervised Learning**: GANs can learn to generate data without labeled examples, making them powerful tools for unsupervised learning tasks.

## Advantages of GANs

1. **High-Quality Data Generation**: GANs can produce highly realistic data that closely mimics the real data distribution.
2. **Flexible and General**: GANs can be applied to various types of data, including images, text, and audio.
3. **Unsupervised Feature Learning**: GANs can learn rich features from data without the need for labeled samples.

## Drawbacks of GANs

1. **Training Instability**: GANs can be difficult to train due to issues like mode collapse, vanishing gradients, and oscillatory behavior.
2. **Mode Collapse**: The generator might produce limited varieties of samples, ignoring large parts of the data distribution.
3. **Sensitive Hyperparameters**: The performance of GANs is highly dependent on the choice of hyperparameters and network architecture.
4. **Computationally Intensive**: Training GANs requires significant computational resources.

## Conclusion

GANs have revolutionized the field of generative modeling by providing a robust framework for generating realistic data. Despite their challenges, the innovations brought by GANs continue to drive advancements in various domains, from image synthesis to unsupervised learning. Understanding the mathematical foundations and training dynamics of GANs, including the derivatives of the training process and improved loss functions, is crucial for leveraging their full potential and addressing their limitations.