# ðŸ“„ Paper Summary: Denoising Diffusion Probabilistic Models (DDPM)

**Title**: Denoising Diffusion Probabilistic Models  
**Authors**: Jonathan Ho, Ajay Jain, Pieter Abbeel  
**Published in**: NeurIPS 2020  
**Link**: [https://arxiv.org/abs/2006.11239](https://arxiv.org/abs/2006.11239)  

---

## âœ… Day 1 â€“ Abstract & Introduction  

### ðŸ“Œ Background & Motivation  
Deep generative models such as GANs, VAEs, autoregressive models, and flow-based models have produced high-quality results across multiple modalities.  
However, each has critical limitations:  
- **VAE**: often generates blurry outputs due to variational approximations  
- **GAN**: unstable training, prone to mode collapse  
- **Flow-based models**: strong inductive biases, complex architectures  

These issues motivate the exploration of new approaches for stable and high-quality image generation.  

---

### ðŸ“Œ Core Idea  
- A **diffusion model** reframes image generation as a **denoising problem**.  
- **Forward process**: gradually corrupt data by adding small amounts of Gaussian noise until all signal is destroyed.  
- **Reverse process**: train a parameterized Markov chain to progressively remove noise, reconstructing data from random noise.  
- Gaussian noise assumption makes training simple and tractable.  

---

### ðŸ“Œ Main Contributions  
1. **High-quality synthesis**: Comparable or superior to GANs.  
2. **Stable training**: Avoids adversarial instabilities.  
3. **Simple objective**: Reduces to predicting noise with MSE.  
4. **Theory**: Connects diffusion models to score matching and Langevin dynamics.  

---

### ðŸ“Œ Early Results  
- **CIFAR-10**: IS = 9.46, FID = 3.17 (SOTA at the time).  
- **LSUN 256Ã—256**: On par with ProgressiveGAN.  

---

### ðŸ“Œ TL;DR  
Diffusion models = add Gaussian noise step by step â†’ train to reverse the corruption.  
Stable, simple, and capable of SOTA results.  

---

## âœ… Day 2 â€“ Forward Process (Noising)  

### ðŸ“Œ Step 1: Markov Chain Definition  
The forward diffusion process gradually corrupts data through a Markov chain:  
\[
q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t|x_{t-1})
\]  

---

### ðŸ“Œ Step 2: One-Step Transition  
Each step adds Gaussian noise with variance schedule $\beta_t$:  
\[
q(x_t | x_{t-1}) = \mathcal{N}(x_t;\sqrt{1-\beta_t}\,x_{t-1}, \beta_t I)
\]  

---

### ðŸ“Œ Step 3: Direct Sampling from $x_0$  
\[
q(x_t|x_0) = \mathcal{N}\big(x_t;\sqrt{\bar{\alpha}_t}x_0,(1-\bar{\alpha}_t)I\big)
\]  
\[
x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon
\]  

---

### ðŸ“Œ Notes  
- Forward process is **parameter-free**.  
- At $t=T$, $x_T \sim \mathcal{N}(0,I)$ â†’ pure noise.  

---

### ðŸ“Œ TL;DR  
Forward = gradually **destroy** data into Gaussian noise, fully tractable.  

---

## âœ… Day 3 â€“ Reverse Process (Denoising)  

### ðŸ“Œ Step 1: Goal  
Model  
\[
q(x_{t-1}|x_t)
\]  
to reconstruct data. This is intractable â†’ approximate.  

---

### ðŸ“Œ Step 2: Gaussian Approximation  
Assume reverse step is Gaussian:  
\[
p_\theta(x_{t-1}|x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t,t), \Sigma_\theta(x_t,t))
\]  

---

### ðŸ“Œ Step 3: Noise Prediction  
From forward equation:  
\[
x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon
\]  
Train a network $\epsilon_\theta(x_t,t)$ to predict $\epsilon$.  

---

### ðŸ“Œ Step 4: Training Objective  
\[
L_{\text{simple}} = \mathbb{E}_{t,x_0,\epsilon}[\|\epsilon - \epsilon_\theta(x_t,t)\|^2]
\]  

---

### ðŸ“Œ Notes  
- Reverse process is the **learned part**.  
- Training = stable MSE objective.  

---

### ðŸ“Œ TL;DR  
Reverse = **denoising**. Train NN to predict noise, minimize MSE.  

---

## âœ… Day 4 â€“ Experiments & Results  

### ðŸ“Œ Setup  
- **Datasets**: CIFAR-10, LSUN, CelebA HQ  
- **Metrics**: FID, IS  

---

### ðŸ“Œ Results  
- **CIFAR-10**: IS = 9.46, FID = 3.17 â†’ SOTA.  
- **LSUN**: Comparable to ProgressiveGAN.  
- **CelebA HQ**: High-quality high-res samples.  

---

### ðŸ“Œ Observations  
- Samples = sharp & diverse, no mode collapse.  
- Training = stable.  
- Diffusion matches/exceeds GANs.  

---

### ðŸ“Œ TL;DR  
DDPM = GAN-quality results without adversarial headaches.  

---

## âœ… Day 5 â€“ Discussion & Conclusion  

### ðŸ“Œ Key Insights  
- Diffusion = strong alternative to GAN/VAEs.  
- Stable training with simple loss.  
- Theoretical bridge to score matching & Langevin dynamics.  

---

### ðŸ“Œ Limitations  
- **Slow sampling** (hundredsâ€“thousands of steps).  
- **High compute cost**.  

---

### ðŸ“Œ Future Directions  
1. Faster samplers (â†’ DDIM, latent diffusion).  
2. Broader domains: audio, video, text, multimodal.  
3. Hybrid models combining paradigms.  

---

### ðŸ“Œ Final Takeaway  
Diffusion models = **add noise â†’ learn to remove it**.  
Simple yet powerful, they laid the foundation for modern generative AI (e.g., Stable Diffusion, Imagen, DALLÂ·E 2).  
