# Lab 8, Module 1: Forward Diffusion Demo

**Estimated time:** 8-12 minutes

---

## **Opening: Watching Information Disappear**

In Module 0, you learned the conceptual idea of diffusion: gradually add noise until an image becomes unrecognizable, then train a model to reverse this process.

Now it's time to see it in action!

In this module, you'll:
- Take a single MNIST digit (like a handwritten "3")
- Add progressively more noise over 200 timesteps
- Watch the digit gradually disappear into static
- Understand the **noise schedule** that controls this process

**This is the "easy" part of diffusion**‚Äîdestroying information is simple. In Module 2, you'll train a model to do the hard part: reversing this process!

---

## üì¶ **Setup: Import Libraries and Load Data**

Let's start by loading a single MNIST digit to experiment with.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize to [0, 1] range
x_train = x_train.astype('float32') / 255.0

# Pick a single digit to work with (let's choose a "3")
# Find the first "3" in the dataset
idx = np.where(y_train == 3)[0][0]
original_image = x_train[idx]

print(f"Selected image: digit {y_train[idx]}")
print(f"Image shape: {original_image.shape}")
print(f"Pixel value range: [{original_image.min():.2f}, {original_image.max():.2f}]")

# Display the original image
plt.figure(figsize=(3, 3))
plt.imshow(original_image, cmap='gray')
plt.title(f"Original Image: Digit {y_train[idx]}", fontsize=14, fontweight='bold')
plt.axis('off')
plt.tight_layout()
plt.show()

print("\n‚úì Data loaded! Ready to add noise.")

---

## üî¨ **Understanding the Forward Diffusion Process**

### **The Mathematical Formula**

At each timestep `t`, we add noise according to this formula:

**Symbols:**
$$x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon$$

**Plain English:**
> "The noisy image at time t = (signal weight √ó original) + (noise weight √ó random noise)"

**Pseudocode:**
```python
noisy_image = sqrt(alpha_bar_t) * original + sqrt(1 - alpha_bar_t) * random_noise
```

Where:
- `x_0` = original clean image
- `x_t` = noisy image at timestep t
- `Œµ` (epsilon) = random noise sampled from standard normal distribution
- `Œ±ÃÖ_t` (alpha bar) = cumulative noise schedule parameter (controls signal vs. noise ratio)

### **The Noise Schedule**

The **noise schedule** determines how quickly noise overtakes the signal:

- At `t=0`: `Œ±ÃÖ ‚âà 1.0` ‚Üí mostly signal (clean image)
- At `t=100`: `Œ±ÃÖ ‚âà 0.5` ‚Üí equal mix of signal and noise
- At `t=200`: `Œ±ÃÖ ‚âà 0.0` ‚Üí mostly noise (pure static)

We'll use a **linear schedule** where noise increases steadily from `Œ≤_start = 0.0001` to `Œ≤_end = 0.02` over 200 timesteps.

---

## üõ†Ô∏è **Implementing Forward Diffusion**

Let's code the forward diffusion process!


In [None]:
# Diffusion hyperparameters
num_timesteps = 200
beta_start = 0.0001
beta_end = 0.02

# Create linear beta schedule
betas = np.linspace(beta_start, beta_end, num_timesteps)

# Compute alpha values
alphas = 1.0 - betas

# Compute cumulative product of alphas (alpha_bar)
alphas_cumprod = np.cumprod(alphas)

print("Noise Schedule Created!")
print(f"Number of timesteps: {num_timesteps}")
print(f"Beta range: [{beta_start}, {beta_end}]")
print(f"\nAlpha_bar at key timesteps:")
print(f"  t=0:   Œ±ÃÖ = {alphas_cumprod[0]:.4f} (mostly signal)")
print(f"  t=50:  Œ±ÃÖ = {alphas_cumprod[49]:.4f}")
print(f"  t=100: Œ±ÃÖ = {alphas_cumprod[99]:.4f} (equal mix)")
print(f"  t=150: Œ±ÃÖ = {alphas_cumprod[149]:.4f}")
print(f"  t=199: Œ±ÃÖ = {alphas_cumprod[199]:.4f} (mostly noise)")

# Visualize the noise schedule
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

# Plot alpha_bar over time
ax1.plot(alphas_cumprod, linewidth=2, color='blue')
ax1.fill_between(range(num_timesteps), alphas_cumprod, alpha=0.3, color='blue')
ax1.set_xlabel('Timestep (t)', fontsize=12)
ax1.set_ylabel('Œ±ÃÖ_t (Signal Weight)', fontsize=12)
ax1.set_title('Alpha Bar Schedule: Signal Weight Over Time', fontsize=13, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.axhline(y=0.5, color='red', linestyle='--', linewidth=1, label='50% signal')
ax1.legend()

# Plot noise weight (1 - alpha_bar) over time
noise_weight = 1.0 - alphas_cumprod
ax2.plot(noise_weight, linewidth=2, color='orange')
ax2.fill_between(range(num_timesteps), noise_weight, alpha=0.3, color='orange')
ax2.set_xlabel('Timestep (t)', fontsize=12)
ax2.set_ylabel('1 - Œ±ÃÖ_t (Noise Weight)', fontsize=12)
ax2.set_title('Noise Weight Over Time', fontsize=13, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0.5, color='red', linestyle='--', linewidth=1, label='50% noise')
ax2.legend()

plt.tight_layout()
plt.show()

print("\n‚úì Noise schedule visualized!")

### **Understanding the Plots**

**Left plot (Alpha Bar - Signal Weight):**
- Starts at 1.0 (100% original image)
- Gradually decreases
- Ends near 0 (no original signal left)

**Right plot (Noise Weight):**
- Starts near 0 (no noise)
- Gradually increases
- Ends near 1.0 (100% noise)

**Key observation:** The crossover happens around t=100-120, where signal and noise are equally weighted.

---

## üé¨ **Adding Noise: The Forward Process**

Now let's apply the forward diffusion formula to our digit!


In [None]:
def forward_diffusion(x0, t, alphas_cumprod):
    """
    Apply forward diffusion to add noise at timestep t.
    
    Args:
        x0: Original clean image
        t: Timestep (0 to num_timesteps-1)
        alphas_cumprod: Cumulative product of alphas
    
    Returns:
        noisy_image: Image with noise added
        noise: The noise that was added
    """
    # Sample random noise from standard normal distribution
    noise = np.random.randn(*x0.shape)
    
    # Get alpha_bar for this timestep
    alpha_bar_t = alphas_cumprod[t]
    
    # Apply forward diffusion formula
    signal_weight = np.sqrt(alpha_bar_t)
    noise_weight = np.sqrt(1.0 - alpha_bar_t)
    
    noisy_image = signal_weight * x0 + noise_weight * noise
    
    return noisy_image, noise

# Test the function at different timesteps
test_timesteps = [0, 50, 100, 150, 199]

print("Testing forward diffusion at different timesteps:\n")
for t in test_timesteps:
    noisy, noise = forward_diffusion(original_image, t, alphas_cumprod)
    signal_ratio = alphas_cumprod[t]
    noise_ratio = 1.0 - alphas_cumprod[t]
    print(f"t={t:3d}: Signal={signal_ratio:.3f} ({100*signal_ratio:5.1f}%), Noise={noise_ratio:.3f} ({100*noise_ratio:5.1f}%)")

print("\n‚úì Forward diffusion function ready!")

---

## üì∏ **Visualization: Watching the Digit Disappear**

Let's visualize the progressive noise addition across multiple timesteps!


In [None]:
# Timesteps to visualize
timesteps_to_show = [0, 25, 50, 75, 100, 125, 150, 175, 199]

# Set random seed for reproducibility
np.random.seed(42)

# Create figure
fig, axes = plt.subplots(2, len(timesteps_to_show), figsize=(18, 5))

# Generate and display noisy images
for idx, t in enumerate(timesteps_to_show):
    noisy, noise = forward_diffusion(original_image, t, alphas_cumprod)
    
    # Display noisy image
    axes[0, idx].imshow(noisy, cmap='gray', vmin=-2, vmax=2)
    axes[0, idx].set_title(f't={t}', fontsize=11, fontweight='bold')
    axes[0, idx].axis('off')
    
    # Add border color based on noise level
    noise_level = 1.0 - alphas_cumprod[t]
    if noise_level < 0.3:
        border_color = 'green'
    elif noise_level < 0.7:
        border_color = 'orange'
    else:
        border_color = 'red'
    
    for spine in axes[0, idx].spines.values():
        spine.set_edgecolor(border_color)
        spine.set_linewidth(3)
    
    # Display SNR info
    signal_pct = 100 * alphas_cumprod[t]
    noise_pct = 100 * (1.0 - alphas_cumprod[t])
    axes[1, idx].text(0.5, 0.6, f'Signal:\n{signal_pct:.1f}%', 
                     ha='center', va='center', fontsize=10, fontweight='bold', color='blue')
    axes[1, idx].text(0.5, 0.3, f'Noise:\n{noise_pct:.1f}%', 
                     ha='center', va='center', fontsize=10, fontweight='bold', color='red')
    axes[1, idx].set_xlim(0, 1)
    axes[1, idx].set_ylim(0, 1)
    axes[1, idx].axis('off')

# Add main title
fig.suptitle('Forward Diffusion: Progressive Noise Addition', 
             fontsize=15, fontweight='bold', y=0.98)

# Add color legend
fig.text(0.5, 0.02, 'Border Colors: Green = Low Noise | Orange = Medium Noise | Red = High Noise', 
         ha='center', fontsize=11, style='italic')

plt.tight_layout(rect=[0, 0.03, 1, 0.96])
plt.show()

print("\n‚úì Visualization complete!")

### **What Do You Observe?**

**Early timesteps (t=0-50):** Green border
- Digit is still clearly visible
- Slight noise/graininess appears
- Signal dominates (>70%)

**Middle timesteps (t=75-125):** Orange border
- Digit becomes blurry and harder to read
- Roughly equal signal and noise
- This is where diffusion models do most of their work!

**Late timesteps (t=150-199):** Red border
- Digit is completely unrecognizable
- Pure static/noise dominates
- Almost all information is lost

**Key insight:** Information destruction is gradual, not sudden. The digit doesn't vanish all at once‚Äîit fades away step by step.

---

## üìä **Signal-to-Noise Ratio (SNR) Analysis**

Let's plot how the signal-to-noise ratio changes over time.


In [None]:
# Calculate SNR at each timestep
snr = alphas_cumprod / (1.0 - alphas_cumprod + 1e-10)  # Add small epsilon to avoid division by zero
snr_db = 10 * np.log10(snr + 1e-10)  # Convert to decibels

# Create visualization
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))

# Plot SNR in linear scale
ax1.plot(snr, linewidth=2, color='purple')
ax1.fill_between(range(num_timesteps), snr, alpha=0.3, color='purple')
ax1.set_xlabel('Timestep (t)', fontsize=12)
ax1.set_ylabel('Signal-to-Noise Ratio (SNR)', fontsize=12)
ax1.set_title('SNR Over Time (Linear Scale)', fontsize=13, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.axhline(y=1.0, color='red', linestyle='--', linewidth=1, label='SNR = 1 (equal signal & noise)')
ax1.legend()

# Add annotations for key regions
ax1.annotate('Signal Dominates\n(Easy to see digit)', xy=(25, snr[25]), xytext=(25, snr[25]*3),
            arrowprops=dict(arrowstyle='->', color='green', lw=2),
            fontsize=10, color='green', fontweight='bold', ha='center')
ax1.annotate('Balanced\n(Challenging)', xy=(100, snr[100]), xytext=(100, snr[100]*5),
            arrowprops=dict(arrowstyle='->', color='orange', lw=2),
            fontsize=10, color='orange', fontweight='bold', ha='center')
ax1.annotate('Noise Dominates\n(Pure static)', xy=(175, snr[175]), xytext=(175, 2),
            arrowprops=dict(arrowstyle='->', color='red', lw=2),
            fontsize=10, color='red', fontweight='bold', ha='center')

# Plot SNR in decibel scale (log scale)
ax2.plot(snr_db, linewidth=2, color='darkblue')
ax2.fill_between(range(num_timesteps), snr_db, alpha=0.3, color='darkblue')
ax2.set_xlabel('Timestep (t)', fontsize=12)
ax2.set_ylabel('SNR (decibels)', fontsize=12)
ax2.set_title('SNR Over Time (Decibel/Log Scale)', fontsize=13, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0, color='red', linestyle='--', linewidth=1, label='0 dB (equal signal & noise)')
ax2.legend()

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("KEY OBSERVATIONS:")
print("="*70)
print(f"\n1. At t=0:   SNR = {snr[0]:.2f} ({snr_db[0]:.1f} dB) - Signal completely dominates")
print(f"2. At t=100: SNR = {snr[99]:.2f} ({snr_db[99]:.1f} dB) - Roughly balanced")
print(f"3. At t=199: SNR = {snr[199]:.4f} ({snr_db[199]:.1f} dB) - Noise completely dominates")
print(f"\n4. The digit becomes unrecognizable around t={np.where(snr < 0.1)[0][0]} (SNR < 0.1)")
print(f"5. By t=150, less than {100*alphas_cumprod[150]:.1f}% of original signal remains")
print("="*70)

### **Understanding SNR**

**Signal-to-Noise Ratio (SNR):**
- SNR > 1: Signal is stronger than noise (digit visible)
- SNR = 1: Signal and noise are equal (digit barely visible)
- SNR < 1: Noise is stronger than signal (digit unrecognizable)

**Why use decibels (dB)?**
- Logarithmic scale makes it easier to see changes across a wide range
- 0 dB = equal signal and noise
- Positive dB = more signal than noise
- Negative dB = more noise than signal

**Connection to reverse diffusion:**
- Early timesteps (high SNR): Easy to denoise‚Äîlots of information remains
- Middle timesteps (SNR ‚âà 1): Challenging‚Äîneed to extract weak signal from noise
- Late timesteps (low SNR): Hardest‚Äîalmost no signal left, pure generation

---

## üé≤ **Experiment: Multiple Noise Realizations**

Let's see what happens when we add different random noise to the same image at the same timestep.


In [None]:
# Choose a middle timestep where signal and noise are balanced
test_timestep = 100

# Generate 5 different noise realizations
fig, axes = plt.subplots(1, 6, figsize=(16, 3))

# Show original
axes[0].imshow(original_image, cmap='gray')
axes[0].set_title('Original', fontsize=12, fontweight='bold')
axes[0].axis('off')

# Show 5 different noisy versions
for i in range(5):
    noisy, _ = forward_diffusion(original_image, test_timestep, alphas_cumprod)
    axes[i+1].imshow(noisy, cmap='gray', vmin=-2, vmax=2)
    axes[i+1].set_title(f'Noisy #{i+1}', fontsize=12, fontweight='bold')
    axes[i+1].axis('off')

plt.suptitle(f'Different Random Noise at t={test_timestep} (Same Original Image)', 
             fontsize=14, fontweight='bold', y=1.05)
plt.tight_layout()
plt.show()

print(f"\nAt t={test_timestep}:")
print(f"  - Signal weight: {100*alphas_cumprod[test_timestep]:.1f}%")
print(f"  - Noise weight:  {100*(1-alphas_cumprod[test_timestep]):.1f}%")
print(f"  - Each realization looks different due to random noise")
print(f"  - But all preserve roughly the same amount of original structure")
print("\n‚úì Stochasticity is built into the diffusion process!")

### **Key Observation: Randomness Matters**

Even though we started with the same image and used the same timestep:
- Each noisy version looks different
- The random noise creates variation
- This is why diffusion models can generate **diverse** images

**Connection to Module 3:**
- When we reverse the process, different starting noise ‚Üí different generated images
- This is how DALL-E can create many different "cats riding skateboards"
- The model learns structure, randomness provides variety

---

## üìù **Questions (Q5-Q9)**

Record your answers in the **Answer Sheet**.

---

### **Q5. Looking at the progression visualization, at approximately what timestep does the digit become unrecognizable to you?**

*Hint: This is subjective‚Äîthere's no single right answer. Consider when you can no longer confidently identify which digit it is.*

**Record your answer in the Answer Sheet.**

---

### **Q6. Compare the noisy images at t=50 vs t=150. Describe the visual differences.**

*Hint: Look at how much of the original digit structure remains visible in each case.*

**Record your answer in the Answer Sheet.**

---

### **Q7. Looking at the SNR plot, at what approximate timestep does noise start to dominate the signal (SNR < 1)?**

*Hint: Look for where the SNR curve crosses the red dashed line (SNR = 1) in the linear plot.*

**Record your answer in the Answer Sheet.**

---

### **Q8. In the "different noise realizations" experiment, why do all 5 noisy images look different even though they started from the same original?**

*Hint: Think about the random noise that gets added each time.*

**Record your answer in the Answer Sheet.**

---

### **Q9. Why is the forward diffusion process easy to implement, but reversing it requires training a neural network?**

*Hint: Adding noise is simple‚Äîjust add random numbers. But removing noise requires understanding what the original structure should look like.*

**Record your answer in the Answer Sheet.**

---

## ‚úÖ Module 1 Complete!

You now understand:
- **How forward diffusion works** (progressive noise addition)
- **Noise schedules** (alpha_bar controls signal vs. noise ratio)
- **Information loss** (digit gradually becomes unrecognizable)
- **SNR decay** (signal-to-noise ratio decreases over time)
- **Stochasticity** (random noise creates variation)

**Key insight:** Destroying information is easy‚Äîjust add random noise. The hard part is **reversing this process**, which we'll tackle in Module 2!

**Ready to build a denoiser?**

Move on to **Module 2: Training a Toy Denoiser**, where you'll train a neural network to predict and remove noise from images!

---