```{contents}
```

## Architecture

### Autoencoders (AE)

#### Core Idea:

Autoencoders learn to **compress (encode)** data into a lower-dimensional latent space and then **reconstruct (decode)** it back to the original form.

#### Workflow:

* **Encoder:** Converts input data (e.g., an image) into a compact latent vector (representation).
* **Decoder:** Reconstructs the input data from this latent vector.
* **Training goal:** Minimize the reconstruction error between input and output.

#### Generative Variant:

**Variational Autoencoder (VAE)** — Instead of deterministic encoding, it learns a *probability distribution* of latent variables, allowing random sampling for new data generation.

#### Applications:

* Image denoising and inpainting
* Anomaly detection
* Generating new faces, text, or molecular structures

---

### Generative Adversarial Networks (GANs)

#### **Core Idea:**

GANs use **two networks competing against each other**:

* **Generator (G):** Creates fake samples.
* **Discriminator (D):** Classifies samples as real or fake.

Through competition, the generator learns to produce data indistinguishable from real samples.

#### **Workflow:**

1. Generator produces fake data from random noise.
2. Discriminator evaluates real vs. fake.
3. Both improve iteratively — the generator gets better at “fooling” the discriminator.

#### **Key Variants:**

* **DCGAN:** Uses convolution layers for image synthesis.
* **CycleGAN:** Translates between domains (e.g., horse ↔ zebra).
* **StyleGAN:** Generates high-resolution, photorealistic faces.

#### **Applications:**

* Image and video generation
* Deepfake creation (and detection)
* Image-to-image translation

---

### Diffusion Models

#### Core Idea:

Start with random **noise** and gradually **denoise** it into meaningful data.
They model the *reverse of a noise process* that destroys data during training.

#### Workflow:

1. Add Gaussian noise to data until it becomes random noise.
2. Train a neural network to learn the reverse denoising steps.
3. Generate new samples by starting from random noise and reversing the process.

#### Examples:

* **DDPM (Denoising Diffusion Probabilistic Model)**
* **Stable Diffusion** (text-to-image generation)

#### Applications:

* Text-to-image generation (e.g., DALL·E, Midjourney)
* Super-resolution
* Image editing and inpainting

---

### Flow-Based Models

#### Core Idea:

Learn an **invertible transformation** between data space and a simple latent distribution (like Gaussian).
They allow *exact likelihood computation* and *reversible generation*.

#### Workflow:

* Forward pass: Data → Latent space (compression).
* Reverse pass: Latent → Data (generation).

#### Examples:

* **RealNVP (Non-Volume Preserving Transformations)**
* **Glow (OpenAI)**

#### Applications:

* High-quality image synthesis
* Anomaly detection
* Density estimation

---

### Transformer-Based Generative Models

#### Core Idea:

Use **self-attention** mechanisms to process entire sequences (text, image patches, audio) in parallel.
They predict the next element (token or pixel) in a sequence given the previous ones.

#### Examples:

* **GPT family:** Text generation (OpenAI)
* **BERT / T5:** Language understanding and generation
* **DALL·E / Imagen:** Text-to-image generation
* **Music Transformer:** Music generation

#### Applications:

* Text generation and completion
* Code generation (e.g., GitHub Copilot)
* Text-to-image and text-to-video tasks

---

### Energy-Based Models (EBMs)

#### Core Idea:

EBMs assign an **energy score** to each possible data configuration.
Low energy = more likely (realistic) data.
Generation happens by sampling from this energy function.

#### Examples:

* Boltzmann Machines
* Deep Energy Models

#### Applications:

* Representation learning
* Anomaly detection
* Unsupervised generative modeling

---

### Hybrid Architectures

Modern generative models often **combine multiple approaches** for better quality and control:

* **VAE + GAN:** Combines stability of VAEs and realism of GANs (e.g., VAE-GAN).
* **Transformer + Diffusion:** Powers multimodal models like **DALL·E 3** and **Stable Diffusion XL**.
* **RNN + VAE:** Used in music and sequence generation.

---

**Summary Table**

| **Architecture**  | **Core Principle**                        | **Representative Models**        | **Applications**                   |
| ----------------- | ----------------------------------------- | -------------------------------- | ---------------------------------- |
| Autoencoder / VAE | Latent-space compression & reconstruction | VAE, β-VAE                       | Data synthesis, anomaly detection  |
| GAN               | Adversarial competition                   | DCGAN, StyleGAN, CycleGAN        | Realistic image/video generation   |
| Diffusion Model   | Reverse noise-to-data                     | Stable Diffusion, DDPM           | Text-to-image, super-resolution    |
| Flow-Based        | Invertible transformations                | RealNVP, Glow                    | Exact likelihood, image generation |
| Transformer       | Attention-based sequence modeling         | GPT, DALL·E, T5                  | Text, image, code generation       |
| EBM               | Energy landscape modeling                 | Boltzmann Machine                | Representation learning            |
| Hybrid            | Combined strengths of multiple models     | VAE-GAN, Diffusion + Transformer | Multimodal generation              |

```{dropdown} Click here for Sections
```{tableofcontents}