# Introduction to GANs

# Generative Adversarial Networks

> A different framework for generative learning.

<br>

(GANs) were introduced by Ian Goodfellow in 2014.  


<br>

- $\Omega$ universe of every possible data sample, not matter how unlikely

- There's a machine that generates fake data samples, using some noise as input. The machine is called the **generator**.

- r are the real data samples. $R$ is the distribution of the real data samples. $p_R$ is the pdf of **realistic** data

- f are the fake data samples. $F$ is the distribution of the fake data samples. $p_F$ is the pdf of **fake** data

- The goal is to **sequentially** improve the quality of the generator machine samples until they are indistinguishable from the real data samples

- The signal for improvement comes from a high pass filter that detects the fake data samples, a machine called the **discriminator**.

<br>
<br>

-----


# GAN Framework 

<br>
<br>

- $z$ are samples from the latent space $Z$, the space of all possible noise vectors. $p_Z$ is the distribution of the latent noise vector

<br>

- Trained GANs generate realistic data samples (sophisticated fakes) $ F \subseteq \Omega $ by learning the unnormalized data distribution

<br>

- GANs framework consist of two neural network functions and a loss function with two phase gradient descent:  

<br>

  -   **Generator (G):** Produces fake samples $f=G(z)$ from noise $z \sim p(z)$.    $ G: Z \rightarrow \Omega $

<br>

  -   **Discriminator (D):** Distinguishes real data $r \in R$ (where $r \sim p_R$) from fake samples  $f=G(z) \in \Omega$  

<br>
<br>

---

<br>
<br>

# How GANs Work

<br>

- Focus is to learn how to generate realistic data samples from noise, a function $G$ called the generator. Avoids direct calculation of the partition function.

<br>

1. Generator

- $G$: Trained to maximize the probability of fooling $D$  

<br>

- For an innovation $z$, $G(z)$ is a novel observation => random variable $G(Z)$ transforms the input noise $Z$ to quality fakes, close to real data

<br>

2. Discriminator

- $D$: Trained to minimize the probability of being fooled by $G$

<br>

- For a novel observation $u$, $D(u)$ is  a probability: $D(u) = p(\text{Real Observation} | u) = 1 - p(\text{Fake Observation} | u)$ defines a Bernoulli random variable

<br>

- Loss function:  

<br>

$$
\min_G \max_D V(D, G) = \mathbb{E}_{R}[\log D(R)] + \mathbb{E}_{Z}[\log(1 - D(G(Z)))]
$$  

<br>
<br>

---

<br>
<br>

# Key Insights

<br>

- At **equilibrium**, $G$ generates samples indistinguishable from real data  

<br>

- $D$ provides feedback to $G$ for improvement  

<br>

- Challenges in GAN training:  

<br>

  - **Mode Collapse:** Limited variety in generated data  

<br>

  - **Training Instability:** Difficult to balance $G$ and $D$  

<br>
<br>

---

<br>
<br>

# Innovations in GANs

<br>

- Adversarial training using specialized loss functions

<br>

- Presented as a zero-sum **minimax optimization problem** (not needed)

<br>

- Wrapped in Fixed Point Theory (or Nash Equilibrium if you want to be lazy)

<br>

- Roots in Jurgen Schmidhuber's 1990 work [Artificial Curiosity](https://people.idsia.ch/~juergen/artificial-curiosity-since-1990.html) and 2012 work from Battista Biggio et al. on [Poisoning Attacks](https://arxiv.org/abs/1206.6389)
<br>


<br>
<br>

---
<br>
<br>

# Comments: GANs

<br>

- Original paper goes on side tangents about **zero sum games**, **minimax optimization** and nash equilibria.


<br>

- Notation is awkward and inconsistent

<br>

- No obvious evaluation/fit scoring for GANs 

<br>

<br>

- Misses that the generator is non-invertible mapping from the latent space to the data space.

<br>

- Doesn't carefully discuss the partition function is avoided by the GAN framework.

<br>

- Invertible GANs (or normalizing flows) generalize GANs to firmer statistical grounds.

<br>
<br>



---


<br>
<br>

# Extensions of Vanilla GANs

<br>

- **Deep Convolutional GANs (DC-GANs):** Use CNNs for image generation.

<br>

- **Wasserstein GANs (W-GANs):** Use Wasserstein distance for more stable training.

<br>

- **CycleGANs:** Convert images between domains (e.g., photos → paintings).

<br>

- **Conditional GANs:** Generate samples conditioned on specific data.

<br>

- **StyleGANs:** Generate images with specific styles and features.

<br>

- **InfoGANs:** Learn interpretable representations in latent space.

<br>
<br>


---

<br>
<br>

# GANs less Hype in 2024

<br>

* 	Initial Hype Over: GANs are no longer the only game in town for generative learning

<br>

* 	Diffusion Models: Superior sample diversity, stability, and quality (e.g., DALL·E 2, Imagen).


<br>

* Normalizing Flows: Stable, interpretable, and tractable for density estimation (e.g., RealNVP, Glow).

<br>

* VAEs: Stable training and better latent space interpretability, but lower sample quality.

<br>

* Autoregressive Models: Best for complex dependencies (e.g., PixelCNN, GPT-3).


<br>
<br>

---

<br>
<br>

# GANs still SoTA in few Areas

<br>

* 	Ultra High-Resolution Image Generation: StyleGAN2/3 produce realistic, detailed images, especially for faces

<br>

* 	Image-to-Image Translation: GANs like Pix2Pix and CycleGAN excel in tasks like style transfer and photo enhancement 

<br>

* 	Creative Content Generation: Used widely in art, design, and animation for producing novel content

<br>

*	Conditional Generation: cGANs generate images from text or sketches with high fidelity

<br>

*	Real-Time Performance: GANs are preferred in applications requiring fast image generation, like gaming and VR

<br>
<br>

---

<br>
<br>

# Conclusion

<br>

- GANs demonstrate the power of **adversarial training** for synthetic data generation.  

<br>

- GANs are a **zero-sum game** between a generator and a discriminator.

<br>

- Although innovative, GANs have limitations like **mode collapse** and **training instability**.

<br>

- Probably not going to solve AGI

<br>

**References:**  

<br>

For more, see [Goodfellow et al. (2014)](https://arxiv.org/abs/1406.2661).  

<br>
<br>

---

<br>
<br>

# Conditional GANs

<br>

- **Conditional GANs:** Generate samples conditioned on specific data.

<br>

- In practice, the generator and discriminator are augmented with the conditioning data as inputs.

<br>

- if GAN generator has input $Z$ and output $G(Z)$, the conditional GAN generator has input $(Z, y)$ and output $G(Z|y)$. 

<br>

- If GAN discriminator has input $X$ and output $D(X)$, the conditional GAN discriminator has input $(X, y)$ and output $D(X|y)$.

<br>

- **Conditional GANs Applications:**

  - Image-to-Image Translation
  - Text-to-Image Generation
  - Image Super-Resolution
  - Image Inpainting

<br>
<br>

---

<br>
<br>

# Wasserstein GANs

<br>

- **Wasserstein GANs (W-GANs):** Use Wasserstein distance for more stable training.

<br>

- **Wasserstein Distance:** Measures the distance between two probability distributions.

<br>

- Wasserstein GAN Loss Function:

$$
L_D = \mathbb{E}_{R}[D(R)] - \mathbb{E}_{Z}[D(G(Z))]
$$

$$
L_G = -\mathbb{E}_{Z}[D(G(Z))]
$$


- Regularized Wasserstein GAN

$$
L_D = \mathbb{E}_{R}[D(R)] - \mathbb{E}_{Z}[D(G(Z))] + \lambda \mathbb{E}_{Z}[(||\nabla_{G(Z)} D(G(Z))||_2 - 1)^2]
$$

$$
L_G = -\mathbb{E}_{Z}[D(G(Z))]
$$

<br>
<br>

---

<br>
<br>

# CycleGANs

<br>

- **CycleGANs:** Convert images between domains (e.g., photos → paintings).

<br>

- **Cycle-Consistency Loss:** Ensures that the converted image can be converted back to the original image.

<br>

- CycleGAN Loss Function:

$$
L_{GAN}(G, D_Y, X, Y) = \mathbb{E}_{y \sim p_{data}(y)}[\log D_Y(y)] + \mathbb{E}_{x \sim p_{data}(x)}[\log(1 - D_Y(G(x)))]
$$

$$
L_{cyc}(G, F) = \mathbb{E}_{x \sim p_{data}(x)}[||F(G(x)) - x||_1]
$$

<br>
<br>

---
