# 📄 Paper Summary: Denoising Diffusion Probabilistic Models (DDPM)

**Title**: Denoising Diffusion Probabilistic Models  
**Authors**: Jonathan Ho, Ajay Jain, Pieter Abbeel  
**Published in**: NeurIPS 2020  
**Link**: [https://arxiv.org/abs/2006.11239](https://arxiv.org/abs/2006.11239)  

---

## ✅ Day 1 – Abstract & Introduction  

### 📌 Background & Motivation  
Deep generative models such as GANs, VAEs, autoregressive models, and flow-based models have produced high-quality results across multiple modalities.  
However, each has critical limitations:  
- **VAE**: often generates blurry outputs due to variational approximations  
- **GAN**: unstable training, prone to mode collapse  
- **Flow-based models**: strong inductive biases, complex architectures  

These issues motivate the exploration of new approaches for stable and high-quality image generation.  

---

### 📌 Core Idea  
- A **diffusion model** reframes image generation as a **denoising problem**.  
- **Forward process**: gradually corrupt data by adding small amounts of Gaussian noise until all signal is destroyed, leaving pure noise.  
- **Reverse process**: train a parameterized Markov chain to progressively remove noise, reconstructing data from random noise.  
- When using Gaussian noise, the reverse process can also be parameterized as Gaussians, enabling simple neural network training.  

---

### 📌 Main Contributions  
1. **High-quality synthesis**: Achieves image quality comparable to or exceeding GANs.  
2. **Stable training**: Avoids adversarial instabilities and mode collapse.  
3. **Simple objective**: Learning reduces to predicting added noise with a mean squared error (MSE) loss.  
4. **Novel theoretical connection**: Reveals equivalence between diffusion models, denoising score matching, and Langevin dynamics.  

---

### 📌 Early Results  
- **CIFAR-10**: Inception Score of **9.46** and FID of **3.17** (state-of-the-art at the time).  
- **LSUN 256×256**: Sample quality on par with **ProgressiveGAN**.  

---

### 📌 TL;DR Summary  
Diffusion probabilistic models generate data by **adding Gaussian noise step by step** and then **learning to reverse the corruption process**.  
They are stable to train, simple to define, and capable of producing **state-of-the-art image samples**, marking them as a powerful alternative to GANs and VAEs.  

## ✅ Day 2 – Forward Process (Noising) 

### 📌 Step 1: Markov Chain Definition  
The forward diffusion process gradually corrupts data through a Markov chain.  

**Equation (1):**  
\[
q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t|x_{t-1})
\]  

---

### 📌 Step 2: One-Step Transition  
Each transition is Gaussian with variance schedule $\beta_t$.  

**Equation (2):**  
\[
q(x_t | x_{t-1}) = \mathcal{N}\big(x_t;\,\sqrt{1-\beta_t}\, x_{t-1}, \, \beta_t I\big)
\]  

- $\beta_t$: variance schedule (noise amount).  
- $\alpha_t = 1-\beta_t$  
- $\bar{\alpha}_t = \prod_{s=1}^t \alpha_s$  

---

### 📌 Step 3: Marginal Distribution of $x_t$  
$x_t$ can be sampled directly from $x_0$ and Gaussian noise $\epsilon \sim \mathcal{N}(0,I)$.  

**Equation (4):**  
\[
q(x_t|x_0) = \mathcal{N}\big(x_t; \sqrt{\bar{\alpha}_t}\, x_0,\,(1-\bar{\alpha}_t)I\big)
\]  

Thus,  
\[
x_t = \sqrt{\bar{\alpha}_t}\, x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon
\]  

---

### 📌 Key Notes  
- The forward process is **parameter-free** (no learning required).  
- Thanks to Equation (4), $x_t$ can be **sampled directly from $x_0$** with noise.  
- At $t=T$, $x_T \sim \mathcal{N}(0,I)$, i.e., pure Gaussian noise.  

---
### 📌 TL;DR Summary  

Section **2.1 Forward Process** (Equations (1)–(4)) defines how data is gradually destroyed into Gaussian noise through a simple, tractable Markov chain.

## ✅ Day 3 – Reverse Process (Denoising) 

### 📌 Step 1: Goal of Reverse Diffusion
- Forward process: gradually corrupts data into pure Gaussian noise.  
- Reverse process: reconstructs data by modeling the conditional probability  
  \[
  q(x_{t-1} | x_t)
  \]  
- Problem: this distribution is **intractable**, so we need an approximation.  

---

### 📌 Step 2: Gaussian Approximation
- Assume the reverse transition is also Gaussian:  
  \[
  p_\theta(x_{t-1} | x_t) = \mathcal{N}\big(x_{t-1}; \mu_\theta(x_t,t), \Sigma_\theta(x_t,t)\big)
  \]  
- Mean and variance are parameterized by a neural network.  

---

### 📌 Step 3: Noise Prediction Network
- From the forward equation:  
  \[
  x_t = \sqrt{\bar{\alpha}_t}\,x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon, \quad \epsilon \sim \mathcal{N}(0,I)
  \]  
- Instead of predicting $\mu_\theta$ directly, the network $\epsilon_\theta(x_t,t)$ is trained to predict the noise $\epsilon$.  
- Input: $(x_t, t)$  
- Output: predicted noise $\epsilon_\theta(x_t, t)$  

---

### 📌 Step 4: Training Objective
- Final simplified loss function:  
  \[
  L_{\text{simple}} = \mathbb{E}_{t,x_0,\epsilon}\big[\, \|\epsilon - \epsilon_\theta(x_t,t)\|^2 \,\big]
  \]  
- Intuition: train the model to **remove the added noise** at each step.  

---

### 📌 Key Notes
- Reverse process is the **learned part** of diffusion models.  
- Training is stable since it reduces to a simple MSE objective.  
- This denoising formulation is the core mechanism behind DDPM.  

---
### 📌 TL;DR Summary
Reverse process = denoising.  
$q(x_{t-1}|x_t)$ is intractable, so approximate with a neural network predicting noise.  
Learning reduces to minimizing **MSE between true noise and predicted noise**.

--- 

## ✅ Day 4 – Experiments & Results (2025/09/19)

### 📌 Experimental Setup
- **Datasets**: CIFAR-10, LSUN (bedroom, church, cat), CelebA HQ  
- **Metrics**:  
  - **FID (Fréchet Inception Distance)**  
  - **IS (Inception Score)**  

---

### 📌 Key Results
- **CIFAR-10**:  
  - IS = **9.46**  
  - FID = **3.17**  
  - → State-of-the-art at the time, surpassing or matching GAN-based methods.  

- **LSUN 256×256**:  
  - Image quality comparable to **ProgressiveGAN**.  

- **CelebA HQ**:  
  - High-resolution sample generation with competitive quality.  

---

### 📌 Observations
- Generated samples are **sharp and diverse**, without the **mode collapse** often observed in GANs.  
- Training remains **stable and straightforward** due to the MSE-based objective.  
- Demonstrates that diffusion models can achieve **GAN-level or superior performance** in image synthesis.  

---

### 📌 TL;DR Summary
DDPM achieves **state-of-the-art FID/IS scores** on CIFAR-10 and competitive results on LSUN and CelebA HQ.  
It provides **GAN-quality samples** while avoiding adversarial training instabilities and mode collapse.

