# DDPM: 去噪扩散概率模型 (深度实现)

**SOTA 教育标准** | 包含物理直觉、Closed-form 推导、SNR 分析

---

## 1. 物理直觉：与非平衡热力学的联系 ⭐

### 1.1 扩散过程的物理类比

**自然界中的扩散**:
- 一滴墨水滴入清水中，会逐渐扩散直至均匀分布
- 这是**熵增**过程：系统从有序走向无序
- 热力学第二定律：封闭系统的熵永不减少

**DDPM 的数学对应**:
- 原始数据 $x_0$ = "有序"状态（低熵）
- 纯噪声 $x_T$ = "无序"状态（高熵）
- 前向过程 $x_0 \to x_T$ = **扩散**（熵增，自然发生）
- 逆向过程 $x_T \to x_0$ = **去扩散**（熵减，需要能量/学习）

**Fokker-Planck 方程**:

在物理学中，粒子的扩散过程由 Fokker-Planck 方程描述：
$$\frac{\partial p}{\partial t} = \nabla \cdot (D \nabla p)$$

DDPM 的离散版本可以看作是这个方程的数值解法。

### 1.2 为什么扩散模型有效？

**核心洞察**:

1. **前向过程是** **自然的、可逆的**: 添加高斯噪声是物理上可实现的过程

2. **逆向过程需要** **学习能量函数**: 就像制冷机需要消耗能量来降低温度（减少熵）

3. **Score Function**: 我们学习的不是 $p(x)$ 本身，而是 $\nabla_x \log p(x)$（梯度方向）

**Score Matching 直觉**:
- $\nabla_x \log p(x)$ 指向概率密度增加的方向
- 如果知道这个梯度，就可以"逆流而上"，从噪声恢复数据

### 1.3 信噪比 (SNR) 的物理意义

在时刻 $t$，我们有:
$$x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon$$

**信噪比定义**:
$$\text{SNR}_t = \frac{\text{Signal Power}}{\text{Noise Power}} = \frac{\bar{\alpha}_t}{1 - \bar{\alpha}_t}$$

- $t=0$: $\bar{\alpha}_0 \approx 1$, SNR $\to \infty$ (纯信号)
- $t=T$: $\bar{\alpha}_T \approx 0$, SNR $\to 0$ (纯噪声)

**物理意义**:
- 前向过程是 SNR 逐渐降低的过程
- 逆向过程是 SNR 逐渐升高的过程
- 神经网络需要在不同 SNR 下都能预测噪声

---

## 2. 数学推导

### 2.1 前向过程

**马尔可夫链定义**:
$$q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)$$

### 2.2 Closed-form 推导 ⭐

**目标**: 从 $x_0$ 直接得到 $x_t$，无需 $t$ 步迭代。

**推导**:

定义 $\alpha_t = 1 - \beta_t$，则:
$$x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1-\alpha_t} \epsilon_t$$

递归展开:
$$\begin{aligned}
x_1 &= \sqrt{\alpha_1} x_0 + \sqrt{1-\alpha_1} \epsilon_1 \\
x_2 &= \sqrt{\alpha_2} x_1 + \sqrt{1-\alpha_2} \epsilon_2 \\
&= \sqrt{\alpha_2}\left(\sqrt{\alpha_1} x_0 + \sqrt{1-\alpha_1} \epsilon_1\right) + \sqrt{1-\alpha_2} \epsilon_2 \\
&= \sqrt{\alpha_1\alpha_2} x_0 + \sqrt{\alpha_2(1-\alpha_1)} \epsilon_1 + \sqrt{1-\alpha_2} \epsilon_2
\end{aligned}$$

继续展开到 $t$ 步，利用高斯分布的可加性:
$$x_t = \sqrt{\prod_{i=1}^t \alpha_i} \cdot x_0 + \sqrt{1 - \prod_{i=1}^t \alpha_i} \cdot \epsilon$$

定义 $\bar{\alpha}_t = \prod_{i=1}^t \alpha_i$，得到:
$$\boxed{x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon}$$

其中 $\epsilon \sim \mathcal{N}(0, I)$。

**重要性**:
- 训练时可以直接采样任意 $t$ 的 $x_t$
- 时间复杂度从 $O(t)$ 降到 $O(1)$
- 这是 DDPM 高效训练的关键

---

## 3. 代码实现

In [None]:
from __future__ import annotations

from dataclasses import dataclass
from typing import Tuple

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from torch import Tensor

In [None]:
@dataclass
class DDPMConfig:
    """DDPM 配置类。"""

    timesteps: int = 1000
    beta_start: float = 1e-4
    beta_end: float = 0.02
    schedule: str = "linear"  # 'linear' or 'cosine'

In [None]:
class DDPMScheduler:
    """DDPM 噪声调度器。

    核心思想:
        实现前向扩散过程 q(x_t|x_0) = N(sqrt(alpha_bar)*x_0, (1-alpha_bar)*I)

    物理意义:
        控制信号向噪声转换的速率，类似热力学中的温度调度
    """

    def __init__(self, config: DDPMConfig) -> None:
        self.config = config

        # 生成 beta 调度
        if config.schedule == "linear":
            self.betas = torch.linspace(config.beta_start, config.beta_end, config.timesteps)
        elif config.schedule == "cosine":
            self.betas = self._cosine_schedule(config.timesteps)
        else:
            raise ValueError(f"Unknown schedule: {config.schedule}")

        # 预计算系数
        self.alphas = 1.0 - self.betas
        self.alpha_bars = torch.cumprod(self.alphas, dim=0)

        # 常用系数
        self.sqrt_alpha_bars = torch.sqrt(self.alpha_bars)
        self.sqrt_one_minus_alpha_bars = torch.sqrt(1.0 - self.alpha_bars)

    def _cosine_schedule(self, timesteps: int, s: float = 0.008) -> Tensor:
        """余弦调度 (Improved DDPM, Nichol & Dhariwal, 2021)。

        优势:
            - 在高 SNR 区域保留更多信息
            - 避免线性调度在后期过度破坏信号
        """
        steps = timesteps + 1
        x = torch.linspace(0, timesteps, steps)
        alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
        alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
        betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
        return torch.clip(betas, 0.0001, 0.9999)

    def get_snr(self, t: Tensor) -> Tensor:
        """计算信噪比 SNR_t = alpha_bar / (1 - alpha_bar)。"""
        alpha_bar = self.alpha_bars[t]
        return alpha_bar / (1 - alpha_bar)

    def q_sample(
        self,
        x0: Tensor,
        t: Tensor,
        noise: Optional[Tensor] = None,
    ) -> Tuple[Tensor, Tensor]:
        """前向扩散采样 (Closed-form)。

        公式: x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * epsilon

        Args:
            x0: 原始数据 [B, C, H, W]
            t: 时间步，tensor of shape [B]
            noise: 可选的噪声，若为 None 则随机采样

        Returns:
            x_t: 加噪后的数据
            noise: 使用的噪声
        """
        if noise is None:
            noise = torch.randn_like(x0)

        # 获取系数并调整形状
        sqrt_alpha_bar = self.sqrt_alpha_bars[t]
        sqrt_one_minus_alpha_bar = self.sqrt_one_minus_alpha_bars[t]

        # 广播到 [B, C, H, W]
        while len(sqrt_alpha_bar.shape) < len(x0.shape):
            sqrt_alpha_bar = sqrt_alpha_bar.unsqueeze(-1)
            sqrt_one_minus_alpha_bar = sqrt_one_minus_alpha_bar.unsqueeze(-1)

        # Closed-form 计算
        x_t = sqrt_alpha_bar * x0 + sqrt_one_minus_alpha_bar * noise

        return x_t, noise

    def p_sample(
        self,
        model: nn.Module,
        x_t: Tensor,
        t: Tensor,
    ) -> Tensor:
        """单步逆向采样 (去噪)。

        q(x_{t-1} | x_t, x_0) 的近似，使用模型预测的噪声。
        """
        # 模型预测噪声
        predicted_noise = model(x_t, t)

        # 获取系数
        alpha_t = self.alphas[t]
        alpha_bar_t = self.alpha_bars[t]
        beta_t = self.betas[t]

        # 计算均值 (使用预测的噪声恢复 x_0)
        # 完整推导见 DDPM 论文附录
        sqrt_one_minus_alpha_bar = torch.sqrt(1 - alpha_bar_t)
        sqrt_recip_alpha_bar = torch.sqrt(1 / alpha_bar_t)

        # x_0 的估计: x_0_hat = (x_t - sqrt(1-alpha_bar) * noise) / sqrt(alpha_bar)
        x_0_hat = sqrt_recip_alpha_bar * (x_t - sqrt_one_minus_alpha_bar * predicted_noise)

        # 计算均值
        sqrt_recip_alpha = torch.sqrt(1 / alpha_t)
        posterior_mean = (
            sqrt_recip_alpha * (x_t - beta_t / sqrt_one_minus_alpha_bar * predicted_noise)
            + (1 - alpha_t) / sqrt_one_minus_alpha_bar * x_0_hat
        ) / 2

        return posterior_mean


# 创建调度器
config = DDPMConfig(timesteps=1000)
scheduler = DDPMScheduler(config)

print(f"时间步数: {config.timesteps}")
print(f"Beta 范围: [{scheduler.betas[0]:.6f}, {scheduler.betas[-1]:.4f}]")
print(f"Alpha_bar 范围: [{scheduler.alpha_bars[0]:.6f}, {scheduler.alpha_bars[-1]:.8f}]")

---

## 4. 可视化分析

In [None]:
def plot_scheduler_components(scheduler: DDPMScheduler) -> None:
    """可视化调度器的各组件。"""
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    t = np.arange(scheduler.config.timesteps)

    axes[0].plot(t, scheduler.betas.numpy())
    axes[0].set_xlabel("Timestep t")
    axes[0].set_ylabel(r"$\beta_t$")
    axes[0].set_title("Noise Schedule (Beta)")
    axes[0].grid(True, alpha=0.3)

    axes[1].plot(t, scheduler.alphas.numpy())
    axes[1].set_xlabel("Timestep t")
    axes[1].set_ylabel(r"$\alpha_t = 1 - \beta_t$")
    axes[1].set_title("Alpha")
    axes[1].grid(True, alpha=0.3)

    axes[2].plot(t, scheduler.alpha_bars.numpy())
    axes[2].set_xlabel("Timestep t")
    axes[2].set_ylabel(r"$\bar{\alpha}_t = \prod \alpha_i$")
    axes[2].set_title("Cumulative Alpha (Signal Coefficient)")
    axes[2].grid(True, alpha=0.3)

    # SNR
    snr = scheduler.alpha_bars / (1 - scheduler.alpha_bars)
    axes[3].semilogy(t, snr.numpy())
    axes[3].set_xlabel("Timestep t")
    axes[3].set_ylabel("SNR (log scale)")
    axes[3].set_title("Signal-to-Noise Ratio")
    axes[3].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()


plot_scheduler_components(scheduler)

In [None]:
# 创建测试图像


def create_test_image(size: int = 64) -> Tensor:
    """创建一个简单的测试图像（方块图案）。"""
    img = torch.zeros(1, size, size)
    img[:, 16:48, 16:48] = 1.0
    return img


x0 = create_test_image(64)

plt.figure(figsize=(4, 4))
plt.imshow(x0.squeeze(), cmap="gray")
plt.title("Original Image $x_0$")
plt.axis("off")
plt.show()

print(f"图像形状: {x0.shape}")

In [None]:
def visualize_diffusion_with_snr(scheduler: DDPMScheduler, x0: Tensor) -> None:
    """可视化扩散过程，标注 SNR。"""
    timesteps_to_show = [0, 50, 100, 200, 400, 600, 800, 999]

    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    axes = axes.flatten()

    for i, t in enumerate(timesteps_to_show):
        x_t, noise = scheduler.q_sample(x0.unsqueeze(0), torch.tensor([t]))
        x_t = x_t.squeeze(0)

        # 计算并显示 SNR
        alpha_bar = scheduler.alpha_bars[t].item()
        snr = alpha_bar / (1 - alpha_bar)
        snr_db = 10 * torch.log10(torch.tensor(snr) + 1e-10).item()

        axes[i].imshow(x_t.squeeze(), cmap="gray", vmin=-2, vmax=2)
        axes[i].set_title(f"t={t}\nSNR={snr:.4f} ({snr_db:.1f} dB)")
        axes[i].axis("off")

    plt.suptitle(
        "Forward Diffusion with SNR Annotation\n" + "Lower SNR = More Noise = Less Signal",
        fontsize=14,
    )
    plt.tight_layout()
    plt.show()


visualize_diffusion_with_snr(scheduler, x0)

In [None]:
def compare_schedules() -> None:
    """对比线性和余弦调度。"""
    config_linear = DDPMConfig(schedule="linear")
    config_cosine = DDPMConfig(schedule="cosine")

    sched_linear = DDPMScheduler(config_linear)
    sched_cosine = DDPMScheduler(config_cosine)

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    t = np.arange(1000)

    # Alpha_bar 对比
    axes[0].plot(t, sched_linear.alpha_bars.numpy(), label="Linear", alpha=0.8)
    axes[0].plot(t, sched_cosine.alpha_bars.numpy(), label="Cosine", alpha=0.8)
    axes[0].set_xlabel("Timestep t")
    axes[0].set_ylabel(r"$\bar{\alpha}_t$")
    axes[0].set_title("Cumulative Alpha Comparison")
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    # SNR 对比
    snr_linear = sched_linear.alpha_bars / (1 - sched_linear.alpha_bars)
    snr_cosine = sched_cosine.alpha_bars / (1 - sched_cosine.alpha_bars)
    axes[1].semilogy(t, snr_linear.numpy(), label="Linear", alpha=0.8)
    axes[1].semilogy(t, snr_cosine.numpy(), label="Cosine", alpha=0.8)
    axes[1].set_xlabel("Timestep t")
    axes[1].set_ylabel("SNR (log scale)")
    axes[1].set_title("SNR Comparison")
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    print("\n观察:")
    print("- 余弦调度在后期保持更高的 SNR（保留更多信号）")
    print("- 这有助于模型在高噪声区域仍能预测准确")


compare_schedules()

In [None]:
def explain_physics_analogy() -> None:
    """解释 DDPM 与物理学的关系。"""
    print("="*70)
    print("DDPM 的物理直觉")
    print("="*70)
    print()
    print("1. 热力学类比:")
    print("   前向过程 x_0 -> x_T:")
    print("     类比: 墨水滴入清水扩散")
    print("     物理过程: 扩散 (Diffusion)")
    print("     热力学: 熵增 (自然过程)")
    print()
    print("   逆向过程 x_T -> x_0:")
    print("     类比: 从浑浊水中分离出墨水")
    print("     物理过程: 去扩散 (Reverse Diffusion)")
    print("     热力学: 熵减 (需要能量/学习)")
    print()
    print("2. Score Function:")
    print("   梯度方向 = 概率密度增加的方向")
    print("   知道梯度就能"逆流而上"恢复数据")
    print()
    print("3. 信噪比 (SNR):")
    print("   前向: SNR 逐渐降低 (信号淹没在噪声中)")
    print("   逆向: SNR 逐渐升高 (噪声消退，信号显现)")
    print("="*70)

explain_physics_analogy()

---

## 5. 简化 U-Net 噪声预测网络 ⭐⭐

In [None]:
class SinusoidalPositionEmbedding(nn.Module):
    """正弦位置编码 (用于时间步嵌入)。

    与 Transformer 的位置编码类似，将离散时间步映射到连续向量。
    """

    def __init__(self, dim: int) -> None:
        super().__init__()
        self.dim = dim

    def forward(self, t: Tensor) -> Tensor:
        device = t.device
        half_dim = self.dim // 2
        emb = np.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = t[:, None] * emb[None, :]
        emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=-1)
        return emb


class ResBlock(nn.Module):
    """残差块 with 时间嵌入。"""

    def __init__(self, in_ch: int, out_ch: int, time_dim: int) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(in_ch, out_ch, 3, padding=1)
        self.conv2 = nn.Conv2d(out_ch, out_ch, 3, padding=1)
        self.time_mlp = nn.Linear(time_dim, out_ch)
        self.norm1 = nn.GroupNorm(8, out_ch)
        self.norm2 = nn.GroupNorm(8, out_ch)
        self.skip = nn.Conv2d(in_ch, out_ch, 1) if in_ch != out_ch else nn.Identity()

    def forward(self, x: Tensor, t_emb: Tensor) -> Tensor:
        h = self.norm1(torch.relu(self.conv1(x)))
        h = h + self.time_mlp(t_emb)[:, :, None, None]
        h = self.norm2(torch.relu(self.conv2(h)))
        return h + self.skip(x)


class SimpleUNet(nn.Module):
    """简化版 U-Net 用于噪声预测。"""

    def __init__(self, in_ch: int = 1, base_ch: int = 64, time_dim: int = 256) -> None:
        super().__init__()
        self.time_mlp = nn.Sequential(
            SinusoidalPositionEmbedding(time_dim),
            nn.Linear(time_dim, time_dim),
            nn.GELU(),
        )

        # Encoder
        self.enc1 = ResBlock(in_ch, base_ch, time_dim)
        self.enc2 = ResBlock(base_ch, base_ch * 2, time_dim)
        self.pool = nn.MaxPool2d(2)

        # Bottleneck
        self.bot = ResBlock(base_ch * 2, base_ch * 2, time_dim)

        # Decoder
        self.up = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
        self.dec2 = ResBlock(base_ch * 4, base_ch, time_dim)
        self.dec1 = ResBlock(base_ch * 2, base_ch, time_dim)

        self.out = nn.Conv2d(base_ch, in_ch, 1)

    def forward(self, x: Tensor, t: Tensor) -> Tensor:
        t_emb = self.time_mlp(t.float())

        # Encoder
        e1 = self.enc1(x, t_emb)
        e2 = self.enc2(self.pool(e1), t_emb)

        # Bottleneck
        b = self.bot(self.pool(e2), t_emb)

        # Decoder with skip connections
        d2 = self.dec2(torch.cat([self.up(b), e2], dim=1), t_emb)
        d1 = self.dec1(torch.cat([self.up(d2), e1], dim=1), t_emb)

        return self.out(d1)


# 测试 U-Net
unet = SimpleUNet(in_ch=1, base_ch=32)
x_test = torch.randn(2, 1, 32, 32)
t_test = torch.randint(0, 1000, (2,))
out = unet(x_test, t_test)
print(f"U-Net 参数量: {sum(p.numel() for p in unet.parameters()):,}")
print(f"输入: {x_test.shape} -> 输出: {out.shape}")

---

## 6. DDIM: 加速采样 ⭐⭐⭐

### 6.1 核心思想

**DDPM 的问题**: 需要 1000 步采样，太慢。

**DDIM 解决方案**: 使用非马尔可夫过程，可以跳步采样。

$$x_{t-1} = \sqrt{\bar{\alpha}_{t-1}} \cdot \hat{x}_0 + \sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2} \cdot \epsilon_\theta + \sigma_t \cdot \epsilon$$

当 $\sigma_t = 0$ 时，采样是确定性的 (deterministic)。

In [None]:
class DDIMSampler:
    """DDIM 采样器 (加速采样)。

    核心优势:
        - 可以用 50-100 步替代 1000 步
        - 确定性采样 (eta=0) 或随机采样 (eta>0)
    """

    def __init__(self, scheduler: DDPMScheduler, ddim_steps: int = 50, eta: float = 0.0) -> None:
        self.scheduler = scheduler
        self.ddim_steps = ddim_steps
        self.eta = eta  # 0 = deterministic, 1 = DDPM

        # 选择子集时间步
        self.timesteps = torch.linspace(
            0, scheduler.config.timesteps - 1, ddim_steps, dtype=torch.long
        ).flip(0)

    @torch.no_grad()
    def sample(
        self,
        model: nn.Module,
        shape: Tuple[int, ...],
        device: str = "cpu",
    ) -> Tensor:
        """DDIM 采样。"""
        # 从纯噪声开始
        x = torch.randn(shape, device=device)

        for i, t in enumerate(self.timesteps):
            t_batch = torch.full((shape[0],), t, device=device, dtype=torch.long)

            # 预测噪声
            eps_pred = model(x, t_batch)

            # 获取系数
            alpha_bar_t = self.scheduler.alpha_bars[t]
            sqrt_alpha_bar_t = torch.sqrt(alpha_bar_t)
            sqrt_one_minus_alpha_bar_t = torch.sqrt(1 - alpha_bar_t)

            # 预测 x_0
            x0_pred = (x - sqrt_one_minus_alpha_bar_t * eps_pred) / sqrt_alpha_bar_t
            x0_pred = torch.clamp(x0_pred, -1, 1)  # 稳定性

            if i < len(self.timesteps) - 1:
                t_prev = self.timesteps[i + 1]
                alpha_bar_t_prev = self.scheduler.alpha_bars[t_prev]

                # DDIM 公式
                sigma_t = self.eta * torch.sqrt(
                    (1 - alpha_bar_t_prev)
                    / (1 - alpha_bar_t)
                    * (1 - alpha_bar_t / alpha_bar_t_prev)
                )

                dir_xt = torch.sqrt(1 - alpha_bar_t_prev - sigma_t**2) * eps_pred
                noise = torch.randn_like(x) if self.eta > 0 else 0

                x = torch.sqrt(alpha_bar_t_prev) * x0_pred + dir_xt + sigma_t * noise
            else:
                x = x0_pred

        return x


# 对比 DDPM vs DDIM 步数
print("采样步数对比:")
print(f"  DDPM: 1000 步")
print(f"  DDIM: 50 步 (20x 加速)")
print(f"  DDIM: 10 步 (100x 加速，质量下降)")

---

## 7. Classifier-Free Guidance (CFG) ⭐⭐⭐

### 7.1 核心思想

**问题**: 如何在不训练分类器的情况下实现条件生成？

**CFG 解决方案**: 同时训练条件和无条件模型，推理时混合。

$$\tilde{\epsilon}_\theta = \epsilon_\theta(x_t, \emptyset) + w \cdot (\epsilon_\theta(x_t, c) - \epsilon_\theta(x_t, \emptyset))$$

其中 $w$ 是 guidance scale，$c$ 是条件，$\emptyset$ 是无条件。

In [None]:
class CFGWrapper:
    """Classifier-Free Guidance 包装器。

    训练时: 随机 drop 条件 (用 null token 替代)
    推理时: 混合条件和无条件预测
    """

    def __init__(self, model: nn.Module, guidance_scale: float = 7.5) -> None:
        self.model = model
        self.guidance_scale = guidance_scale

    def __call__(
        self,
        x: Tensor,
        t: Tensor,
        cond: Tensor,
        uncond: Tensor,
    ) -> Tensor:
        """CFG 推理。

        Args:
            x: 噪声图像
            t: 时间步
            cond: 条件嵌入 (如文本)
            uncond: 无条件嵌入 (null token)

        Returns:
            引导后的噪声预测
        """
        # 批量处理: [cond, uncond]
        x_in = torch.cat([x, x], dim=0)
        t_in = torch.cat([t, t], dim=0)
        c_in = torch.cat([cond, uncond], dim=0)

        # 模型预测
        eps_cond, eps_uncond = self.model(x_in, t_in, c_in).chunk(2)

        # CFG 公式
        eps_guided = eps_uncond + self.guidance_scale * (eps_cond - eps_uncond)

        return eps_guided


def visualize_cfg_effect() -> None:
    """可视化 CFG scale 的影响。"""
    scales = [1.0, 3.0, 7.5, 15.0]

    print("CFG Scale 影响:")
    print("=" * 50)
    for s in scales:
        if s == 1.0:
            desc = "无引导，多样性高，可能偏离条件"
        elif s <= 3.0:
            desc = "轻度引导，平衡多样性和条件一致性"
        elif s <= 10.0:
            desc = "标准引导，条件一致性好"
        else:
            desc = "强引导，可能过饱和/失真"
        print(f"  w={s:5.1f}: {desc}")
    print("=" * 50)


visualize_cfg_effect()

---

## 8. 扩散模型变体对比

| 模型 | 采样步数 | 特点 |
|:-----|:---------|:-----|
| **DDPM** | 1000 | 原始方法，质量好但慢 |
| **DDIM** | 50-100 | 确定性采样，可加速 |
| **LDM** | 50 | 潜在空间扩散，更高效 |
| **Consistency** | 1-2 | 蒸馏方法，极速 |

**进阶学习**: Stable Diffusion, SDXL, DiT, Flow Matching

---

## 9. 总结

| 概念 | 公式 | 物理意义 |
|:-----|:-----|:---------|
| **前向过程** | $x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon$ | 扩散，熵增 |
| **信噪比** | $\text{SNR} = \bar{\alpha}_t / (1-\bar{\alpha}_t)$ | 信号强度 |
| **DDIM** | 非马尔可夫，跳步采样 | 20-100x 加速 |
| **CFG** | $\tilde{\epsilon} = \epsilon_{\emptyset} + w(\epsilon_c - \epsilon_{\emptyset})$ | 条件引导 |
| **U-Net** | 编码器-解码器 + 时间嵌入 | 噪声预测 |