```{contents}
```
## Flow-Based Generative Models

---

### Core Idea

Flow-based models are **generative models** that learn to map complex data (like images) into a **simple, tractable distribution** (like a Gaussian) — and vice versa — using a **reversible (invertible) transformation**.

In simple terms:

> They **learn an exact mathematical mapping** between real data and random noise — allowing both **generation** and **density estimation** directly.

So, unlike GANs (which are implicit) or diffusion models (which are iterative), **flow-based models** are **explicit and invertible**.

---

### Intuition

Imagine you have:

* A **complex distribution** $p_X(x)$: real images, audio, etc.
* A **simple distribution** $p_Z(z)$: standard normal $\mathcal{N}(0, I)$.

Flow models learn a **bijective function** $f_\theta$ such that:

$$
z = f_\theta(x) \quad \text{and} \quad x = f_\theta^{-1}(z)
$$

So:

* **Forward pass:** maps data → latent noise
* **Inverse pass:** maps noise → realistic data

This **invertibility** makes flows unique.

---

### Probability Transformation (Change of Variables)

The probability density of $x$ is derived from $z = f_\theta(x)$:

$$
p_X(x) = p_Z(f_\theta(x)) \cdot \left| \det \frac{\partial f_\theta(x)}{\partial x} \right|
$$

Taking log:

$$
\log p_X(x) = \log p_Z(f_\theta(x)) + \log \left| \det J_{f_\theta}(x) \right|
$$

where $J_{f_\theta}(x)$ is the **Jacobian matrix** of the transformation.

Thus, **training** means maximizing the likelihood of real data under this model — i.e.,
$$
\max_\theta \sum_x \log p_X(x)
$$

---

### Key Requirements

For a transformation $f_\theta$ to be valid in flow-based models, it must be:

1. **Invertible** (can compute both forward and inverse mapping)
2. **Differentiable** (for gradient-based learning)
3. **Jacobian determinant computable efficiently**

---

### Architecture Overview

Flow-based models are composed of **a sequence of invertible transformations** (flows):

$$
z = f_K \circ f_{K-1} \circ \dots \circ f_1(x)
$$

Each transformation layer simplifies the data progressively.

At the end, you get a latent variable $z$ following a Gaussian distribution.

---

### Common Building Blocks

| **Component**                  | **Purpose**                                              |
| ------------------------------ | -------------------------------------------------------- |
| **Affine coupling layer**      | Enables invertibility and efficient Jacobian computation |
| **ActNorm**                    | Normalization for stable training                        |
| **1×1 Invertible Convolution** | Improves expressiveness by permuting channels            |
| **Squeeze Layer**              | Reshapes spatial dimensions for hierarchical processing  |

---

### Training and Sampling

#### Training (Maximum Likelihood Estimation)

* Compute $z = f_\theta(x)$
* Compute log-likelihood:
  $$
  \log p_X(x) = \log p_Z(z) + \log \left| \det J_{f_\theta}(x) \right|
  $$
* Maximize this with respect to $\theta$

#### Sampling

* Sample $z \sim \mathcal{N}(0, I)$
* Generate $x = f_\theta^{-1}(z)$

This makes generation **fast and exact**.

---

### Example Models

| **Model**                                               | **Year** | **Key Idea**                                            |
| ------------------------------------------------------- | -------- | ------------------------------------------------------- |
| **NICE** (Non-linear Independent Components Estimation) | 2014     | Additive coupling layers, invertible mapping            |
| **RealNVP** (Real-valued Non-Volume Preserving)         | 2016     | Affine coupling layers for better flexibility           |
| **Glow** (Generative Flow)                              | 2018     | 1×1 invertible convolutions for improved expressiveness |

---

### Workflow Summary

| **Step** | **Operation**                                               | **Description**                                   |
| -------- | ----------------------------------------------------------- | ------------------------------------------------- |
| 1        | Input $x$                                                 | Real data (e.g., image)                           |
| 2        | Apply sequence of invertible functions $f_1, f_2, …, f_K$ | Forward transformation (data → latent)            |
| 3        | Compute log-likelihood                                      | Use change-of-variable formula                    |
| 4        | Optimize parameters $\theta$                              | Maximize log-likelihood                           |
| 5        | For generation, sample $z \sim \mathcal{N}(0, I)$         | Reverse transformation $f^{-1}(z)$ to get $x$ |

---

###  Intuitive Comparison

| **Model**       | **Key Idea**                    | **Pros**                        | **Cons**                         |
| --------------- | ------------------------------- | ------------------------------- | -------------------------------- |
| **GANs**        | Learn to fool discriminator     | Sharp images, fast inference    | No explicit likelihood, unstable |
| **VAEs**        | Encode–decode with latent space | Probabilistic, efficient        | Blurry samples                   |
| **Diffusion**   | Learn to denoise noise          | High quality, flexible          | Slow sampling                    |
| **Flow Models** | Exact invertible mapping        | Exact likelihood, fast sampling | High memory, training complexity |

---

###  Applications

| **Domain**             | **Use Case**                                        |
| ---------------------- | --------------------------------------------------- |
| **Image generation**   | Generate realistic images (Glow, RealNVP)           |
| **Density estimation** | Compute exact data likelihoods                      |
| **Super-resolution**   | Generate high-resolution versions of low-res images |
| **Audio synthesis**    | Models like WaveGlow for speech                     |
| **Anomaly detection**  | Detect out-of-distribution samples by likelihood    |

---

### Visual Summary

```
Forward (Training):         Inverse (Sampling):

  Data x ─► $$Flows f1→f2→f3→...$$ ─► Latent z      z ~ N(0, I)
                                               └──► x = f⁻¹(z)
```

---

### Summary Table

| **Concept**        | **Formula / Description**                    |                      |   |
| ------------------ | -------------------------------------------- | -------------------- | - |
| Mapping            | $z = f_\theta(x)$, invertible              |                      |   |
| Log-likelihood     | $\log p_X(x) = \log p_Z(f_\theta(x)) + \log | \det J_{f_\theta}(x) |$ |
| Training objective | Maximize likelihood of data                  |                      |   |
| Generation         | $x = f_\theta^{-1}(z)$                     |                      |   |
| Core property      | Exact invertibility and tractable density    |                      |   |