# Lab 8, Module 3: Multi-Step Denoising (Reverse Diffusion)

**Estimated time:** 12-15 minutes

---

## **Opening: The Moment of Magic**

In Module 2, you trained a model to predict noise at any timestep.

Now comes the **exciting part**: using that model to **generate images from pure noise**!

In this module, you'll:
- Implement the **reverse diffusion algorithm**
- Start with random noise and progressively denoise it
- Watch digits **emerge from chaos** step by step!
- Generate multiple samples to see variation
- Compare quality at different step counts (50 vs 100 vs 200 steps)

**This is how DALL-E, Midjourney, and Stable Diffusion work‚Äîjust at much larger scale!**

---

## üì¶ **Setup: Import Libraries and Reconstruct Model**

**Important:** This module uses the model you trained in Module 2. We have two options:

### **Option A:** You just finished Module 2 in this same session
- Your model is still in memory
- Skip the "Quick Training" cell
- The model variable `model` is already available

### **Option B:** You're starting fresh (new session)
- We'll train a quick version (takes ~2-3 minutes)
- Run the "Quick Training" cell below
- This gives you a working model

Let's start by setting up!


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import time

print(f"‚úÖ TensorFlow version: {tf.__version__}")
print("‚úÖ Libraries imported!")

# Set seeds
np.random.seed(42)
tf.random.set_seed(42)

# Check if model exists from Module 2
try:
    test = model
    print("\n‚úÖ Model found from Module 2!")
    print("   You can skip the 'Quick Training' section below.")
    MODEL_AVAILABLE = True
except NameError:
    print("\n‚ö†Ô∏è  No model found from Module 2.")
    print("   Run the 'Quick Training' section below to train a model.")
    MODEL_AVAILABLE = False

---

## üöÄ **Quick Training (Run ONLY if model not available)**

If you're starting fresh, run this section to quickly train a denoiser.

**If you just finished Module 2**, skip this section!


In [None]:
if not MODEL_AVAILABLE:
    print("Training a quick denoiser model...\n")
    print("This will take ~2-3 minutes with GPU.\n")
    print("="*70)
    
    # Load data
    (x_train_full, y_train_full), _ = keras.datasets.mnist.load_data()
    NUM_CLASSES = 4
    mask = y_train_full < NUM_CLASSES
    x_train = x_train_full[mask]
    
    # Downsample to 16x16
    x_train_resized = tf.image.resize(x_train[..., np.newaxis], [16, 16]).numpy()
    x_train_resized = x_train_resized.astype('float32') / 255.0
    
    # Diffusion schedule
    NUM_TIMESTEPS = 200
    betas = np.linspace(0.0001, 0.02, NUM_TIMESTEPS).astype('float32')
    alphas = 1.0 - betas
    alphas_cumprod = np.cumprod(alphas).astype('float32')
    alphas_cumprod_tf = tf.constant(alphas_cumprod, dtype=tf.float32)
    
    # Build model (simplified U-Net)
    def get_timestep_embedding(timesteps, embedding_dim=32):
        half_dim = embedding_dim // 2
        emb = np.log(10000) / (half_dim - 1)
        emb = tf.exp(tf.range(half_dim, dtype=tf.float32) * -emb)
        emb = tf.cast(timesteps, dtype=tf.float32)[:, None] * emb[None, :]
        emb = tf.concat([tf.sin(emb), tf.cos(emb)], axis=-1)
        return emb
    
    # Input layers
    noisy_image_input = layers.Input(shape=(16, 16, 1), name='noisy_image')
    timestep_input = layers.Input(shape=(), dtype=tf.int32, name='timestep')
    
    # Timestep embedding
    t_emb = layers.Lambda(lambda t: get_timestep_embedding(t, 32))(timestep_input)
    t_emb = layers.Dense(64, activation='relu')(t_emb)
    t_emb = layers.Dense(64, activation='relu')(t_emb)
    t_emb_reshaped = layers.Reshape((1, 1, 64))(t_emb)
    t_emb_broadcast = layers.Lambda(lambda x: tf.tile(x, [1, 16, 16, 1]))(t_emb_reshaped)
    
    # Concatenate
    x = layers.Concatenate()([noisy_image_input, t_emb_broadcast])
    
    # Encoder
    x = layers.Conv2D(32, 3, padding='same', activation='relu')(x)
    skip1 = layers.Conv2D(32, 3, padding='same', activation='relu')(x)
    x = layers.MaxPooling2D(2)(skip1)
    
    x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
    skip2 = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
    x = layers.MaxPooling2D(2)(skip2)
    
    # Bottleneck
    x = layers.Conv2D(128, 3, padding='same', activation='relu')(x)
    x = layers.Conv2D(128, 3, padding='same', activation='relu')(x)
    
    # Decoder
    x = layers.UpSampling2D(2)(x)
    x = layers.Concatenate()([x, skip2])
    x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
    x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
    
    x = layers.UpSampling2D(2)(x)
    x = layers.Concatenate()([x, skip1])
    x = layers.Conv2D(32, 3, padding='same', activation='relu')(x)
    x = layers.Conv2D(32, 3, padding='same', activation='relu')(x)
    
    # Output
    noise_output = layers.Conv2D(1, 3, padding='same', name='noise_pred')(x)
    
    model = keras.Model(inputs=[noisy_image_input, timestep_input], outputs=noise_output)
    model.compile(optimizer=keras.optimizers.Adam(1e-3), loss='mse')
    
    print("\n‚úì Model built!")
    
    # Training function
    def create_training_batch(images, batch_size, alphas_cumprod_tf, num_timesteps):
        indices = np.random.randint(0, len(images), batch_size)
        batch_images = images[indices]
        timesteps = np.random.randint(0, num_timesteps, batch_size)
        alpha_bar_t = tf.gather(alphas_cumprod_tf, timesteps)
        noise = tf.random.normal(shape=batch_images.shape)
        sqrt_alpha_bar_t = tf.sqrt(alpha_bar_t)[:, None, None, None]
        sqrt_one_minus_alpha_bar_t = tf.sqrt(1.0 - alpha_bar_t)[:, None, None, None]
        noisy_images = sqrt_alpha_bar_t * batch_images + sqrt_one_minus_alpha_bar_t * noise
        return (noisy_images, timesteps), noise
    
    # Train
    EPOCHS = 8
    BATCH_SIZE = 128
    STEPS_PER_EPOCH = len(x_train_resized) // BATCH_SIZE
    
    start_time = time.time()
    for epoch in range(EPOCHS):
        epoch_losses = []
        for step in range(STEPS_PER_EPOCH):
            (noisy_images, timesteps), noise_targets = create_training_batch(
                x_train_resized, BATCH_SIZE, alphas_cumprod_tf, NUM_TIMESTEPS
            )
            loss = model.train_on_batch([noisy_images, timesteps], noise_targets)
            epoch_losses.append(loss)
        print(f"Epoch {epoch+1}/{EPOCHS} - Loss: {np.mean(epoch_losses):.6f}")
    
    print(f"\n‚úÖ Training complete in {time.time()-start_time:.1f}s!")
    MODEL_AVAILABLE = True
else:
    print("‚úì Skipping training - model already available from Module 2!")

---

## üîß **Setup Diffusion Schedule**

We need the same schedule from Module 1 for reverse diffusion.


In [None]:
# Recreate diffusion schedule (same as Modules 1 & 2)
NUM_TIMESTEPS = 200
BETA_START = 0.0001
BETA_END = 0.02

betas = np.linspace(BETA_START, BETA_END, NUM_TIMESTEPS).astype('float32')
alphas = 1.0 - betas
alphas_cumprod = np.cumprod(alphas).astype('float32')

print("‚úÖ Diffusion schedule ready!")
print(f"   Timesteps: {NUM_TIMESTEPS}")
print(f"   Beta range: [{BETA_START}, {BETA_END}]")

---

## üé¨ **Implement Reverse Diffusion Algorithm**

### **The Reverse Process:**

Starting from pure noise at t=200, we gradually denoise:

```
1. Start: x_200 = pure random noise
2. For t = 199, 198, 197, ..., 1, 0:
   a. Predict noise: ŒµÃÇ = model(x_t, t)
   b. Remove predicted noise (with scaling)
   c. Add tiny random noise (except at t=0)
   d. Result: x_{t-1} (slightly less noisy)
3. End: x_0 = generated digit!
```

### **Mathematical Formula (Simplified DDPM):**

**Symbols:**
$$x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}_t}} \cdot \hat{\epsilon} \right) + \sigma_t \cdot z$$

**Plain English:**
> "Next step = (current - scaled predicted noise) / signal scale + tiny random noise"

**Pseudocode:**
```python
next_step = (current - noise_removal_amount) / signal_scale + small_random_noise
```

**Why add tiny noise back?**
- Prevents model from "collapsing" to one image
- Adds stochasticity ‚Üí variety in generated images
- Amount of noise decreases over time

Let's implement it!


In [None]:
def reverse_diffusion(model, num_steps=200, image_size=16, seed=None):
    """
    Generate an image from pure noise using reverse diffusion.
    
    Args:
        model: Trained denoiser model
        num_steps: Number of denoising steps (default 200)
        image_size: Size of image to generate
        seed: Random seed for reproducibility
    
    Returns:
        generated_image: Final denoised image
        trajectory: List of images at each step
    """
    if seed is not None:
        np.random.seed(seed)
        tf.random.set_seed(seed)
    
    # Start with pure noise
    x_t = np.random.randn(1, image_size, image_size, 1).astype('float32')
    
    # Store trajectory for visualization
    trajectory = []
    
    # Reverse diffusion loop
    for t in reversed(range(num_steps)):
        # Save current state every 20 steps for visualization
        if t % 20 == 0 or t < 10:
            trajectory.append((t, x_t.copy()))
        
        # Predict noise
        t_input = np.array([t])
        predicted_noise = model.predict([x_t, t_input], verbose=0)
        
        # Get diffusion parameters
        alpha_t = alphas[t]
        alpha_bar_t = alphas_cumprod[t]
        beta_t = betas[t]
        
        # Compute denoising step
        # x_{t-1} = (1/sqrt(alpha_t)) * (x_t - ((1-alpha_t)/sqrt(1-alpha_bar_t)) * predicted_noise)
        noise_removal = (1.0 - alpha_t) / np.sqrt(1.0 - alpha_bar_t)
        x_t = (x_t - noise_removal * predicted_noise) / np.sqrt(alpha_t)
        
        # Add small random noise (except at final step)
        if t > 0:
            noise_scale = np.sqrt(beta_t)
            x_t = x_t + noise_scale * np.random.randn(*x_t.shape).astype('float32')
    
    # Add final image
    trajectory.append((0, x_t.copy()))
    
    return x_t, trajectory

print("‚úÖ Reverse diffusion function ready!")
print("\nReady to generate images from noise!")

---

## üé® **Generate Your First Image!**

Let's start from pure noise and watch a digit emerge!


In [None]:
print("üé® Generating image from pure noise...\n")
print("This will take ~20-30 seconds for 200 steps.\n")

# Generate with fixed seed for reproducibility
generated_image, trajectory = reverse_diffusion(model, num_steps=200, seed=42)

print("‚úÖ Generation complete!\n")
print(f"Generated {len(trajectory)} snapshots along the way.")
print(f"\nFinal image range: [{generated_image.min():.2f}, {generated_image.max():.2f}]")

---

## üì∏ **Visualize the Generation Trajectory**

Let's see how the image evolved from noise to digit!


In [None]:
# Select 10 evenly spaced snapshots
snapshot_indices = np.linspace(0, len(trajectory)-1, 10, dtype=int)
snapshots = [trajectory[i] for i in snapshot_indices]

# Visualize trajectory
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes = axes.flatten()

for idx, (t, img) in enumerate(snapshots):
    # Clip and normalize for display
    img_display = np.clip(img[0, :, :, 0], -2, 2)
    
    axes[idx].imshow(img_display, cmap='gray', vmin=-2, vmax=2)
    axes[idx].set_title(f't = {t}', fontsize=12, fontweight='bold')
    axes[idx].axis('off')
    
    # Color border based on progress
    progress = 1.0 - (t / 200)
    if progress < 0.3:
        color = 'red'  # Early: mostly noise
    elif progress < 0.7:
        color = 'orange'  # Middle: emerging structure
    else:
        color = 'green'  # Late: recognizable digit
    
    for spine in axes[idx].spines.values():
        spine.set_edgecolor(color)
        spine.set_linewidth(3)

plt.tight_layout()
plt.suptitle('Generation Trajectory: From Pure Noise to Digit', 
             fontsize=15, fontweight='bold', y=1.02)
plt.show()

print("\nüí° Observations:")
print("  - Early steps (red): Pure noise, no structure")
print("  - Middle steps (orange): Vague shapes emerge")
print("  - Late steps (green): Clear digit structure")
print("  - Final step: Recognizable digit!")
print("\n‚úì The model learned to gradually construct structure from randomness!")

---

## üìù **Questions Q16-Q17 (Observations)**

### **Q16. Predict: At approximately what timestep (t value) does the digit start to become recognizable?**

*Look at the trajectory above. When can you first identify what digit it might be?*

**Record your answer in the Answer Sheet.**

---

### **Q17. Looking at the full trajectory, describe how structure emerges. Does it appear suddenly or gradually?**

*Think about: Do edges appear first? Does overall shape come before fine details?*

**Record your answer in the Answer Sheet.**

---

## üé≤ **Generate Multiple Samples**

Let's generate several digits from different random seeds!


In [None]:
print("üé≤ Generating 12 different samples...\n")
print("Each starts from different random noise.\n")

# Generate 12 samples with different seeds
num_samples = 12
samples = []

for seed in range(num_samples):
    print(f"  Generating sample {seed+1}/{num_samples}...", end='\r')
    generated, _ = reverse_diffusion(model, num_steps=200, seed=seed)
    samples.append(generated[0, :, :, 0])

print("\n‚úÖ Generated 12 samples!\n")

# Display grid
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
axes = axes.flatten()

for idx, sample in enumerate(samples):
    axes[idx].imshow(np.clip(sample, 0, 1), cmap='gray', vmin=0, vmax=1)
    axes[idx].set_title(f'Sample {idx+1}', fontsize=11, fontweight='bold')
    axes[idx].axis('off')

plt.tight_layout()
plt.suptitle('12 Different Digits Generated from Different Random Starting Noise', 
             fontsize=14, fontweight='bold', y=1.01)
plt.show()

print("\nüí° Key observations:")
print("  - Each sample looks different (different random noise ‚Üí different digit)")
print("  - Some are clearer than others (stochasticity in process)")
print("  - Quality varies (toy model limitations)")
print(f"  - All are digits 0-3 (model only trained on {MODEL_AVAILABLE and 'those classes' or 'limited classes'})")
print("\n‚úì This demonstrates the generative capability of diffusion models!")

---

## üìù **Question Q18 (Analysis)**

### **Q18. Looking at the 12 generated samples, what do you notice about variety and quality? Are all digits equally clear?**

*Consider: Are some blurry? Do you see variety? What digits appear most often?*

**Record your answer in the Answer Sheet.**

---

## ‚è© **Step Count Comparison**

Does using more denoising steps improve quality? Let's find out!


In [None]:
print("‚è© Comparing different step counts...\n")

step_counts = [50, 100, 200]
results = []

for steps in step_counts:
    print(f"  Generating with {steps} steps...", end='\r')
    # Use same starting noise (seed=42) for fair comparison
    generated, _ = reverse_diffusion(model, num_steps=steps, seed=42)
    results.append((steps, generated[0, :, :, 0]))

print("\n‚úÖ Generated all comparisons!\n")

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(12, 4))

for idx, (steps, img) in enumerate(results):
    axes[idx].imshow(np.clip(img, 0, 1), cmap='gray', vmin=0, vmax=1)
    axes[idx].set_title(f'{steps} Steps', fontsize=13, fontweight='bold')
    axes[idx].axis('off')

plt.tight_layout()
plt.suptitle('Quality vs. Number of Denoising Steps (Same Starting Noise)', 
             fontsize=14, fontweight='bold', y=1.02)
plt.show()

print("\nüí° Analysis:")
print("  - 50 steps: Faster, but may be less detailed")
print("  - 100 steps: Good balance of speed and quality")
print("  - 200 steps: Most detailed (used during training)")
print("\n‚úì More steps generally = better quality, but diminishing returns")
print("   (Real models like DALL-E use 50-100 steps for speed)")

---

## üìù **Questions Q19-Q20 (Critical Thinking)**

### **Q19. Compare the 50-step vs 200-step generations. Is the quality difference large or small? Is 200 steps necessary?**

*Think about: trade-off between quality and speed. When would you use fewer steps?*

**Record your answer in the Answer Sheet.**

---

### **Q20. Why do you think the generated digits aren't perfect? What limitations does this toy model have?**

*Consider: model size, training data, resolution, number of training epochs*

**Record your answer in the Answer Sheet.**

---

## üåâ **Bridge to Module 4: From Toy to Professional Models**

### **What You've Accomplished:**
- ‚úÖ Trained a toy denoiser (~100k parameters)
- ‚úÖ Generated 16√ó16 digits from noise
- ‚úÖ Understood reverse diffusion algorithm
- ‚úÖ Seen authentic limitations of small models

### **What's Coming in Module 4:**
- üöÄ Professional diffusion model (millions of parameters)
- üé® 32√ó32 RGB images (vs 16√ó16 grayscale)
- üèÜ Much higher quality generations
- üåç Bridge to DALL-E, Stable Diffusion, Midjourney

### **Key Differences:**

| Aspect | Your Toy Model | DALL-E / Stable Diffusion |
|--------|----------------|---------------------------|
| **Parameters** | ~100,000 | Billions |
| **Training data** | ~24,000 digits | Billions of images |
| **Resolution** | 16√ó16 | 512√ó512 or higher |
| **Training time** | 2-3 minutes | Weeks on supercomputers |
| **Quality** | Blurry digits | Photorealistic images |
| **Control** | Random | Text prompts guide generation |

**But the core algorithm is THE SAME!**
- Forward diffusion: Add noise progressively
- Train denoiser: Predict noise at each timestep
- Reverse diffusion: Remove noise step by step
- Text conditioning: Guide the denoising process

---

## ‚úÖ Module 3 Complete!

You now understand:
- **Reverse diffusion algorithm** (iterative denoising)
- **Image generation from noise** (pure randomness ‚Üí structured output)
- **Trajectory visualization** (how structure emerges gradually)
- **Stochasticity and variety** (different seeds ‚Üí different outputs)
- **Step count trade-offs** (quality vs speed)
- **Toy model limitations** (why professional models need scale)

**Key insights:**
1. **Structure emerges gradually** - not suddenly!
2. **More steps = better quality** - but with diminishing returns
3. **Stochasticity creates variety** - essential for diverse generations
4. **Scale matters** - billions of parameters >> thousands

**Congratulations!** You've implemented the core algorithm that powers DALL-E, Midjourney, and Stable Diffusion!

**Ready to see professional quality?**

Move on to **Module 4: Pre-Trained Diffusion Model**, where you'll use a pre-trained model to generate CIFAR-10 images and understand how to scale up to text-to-image generation!

---