# 6. Visual Comparison: SD3 Medium vs FLUX.1-schnell
**Head-to-Head on Identical Prompts — The 2024 Diffusion Model Showdown**

This notebook generates the comparison grid for our README and LinkedIn post.

In [None]:
import sys, os, time
sys.path.insert(0, os.path.join(os.path.dirname(os.getcwd()), ""))
if os.path.basename(os.getcwd()) == "notebooks":
    sys.path.insert(0, os.path.dirname(os.getcwd()))

import torch
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from PIL import Image
import numpy as np

from config.default import DiffusionConfig
from models.memory_utils import setup_mps_environment, clear_memory, log_memory_usage
from models.pipeline_factory import load_pipeline, generate_image
from models.prompt_bank import BENCHMARK_PROMPTS

os.makedirs(os.path.join("..", "outputs", "comparisons"), exist_ok=True)
os.makedirs(os.path.join("..", "assets"), exist_ok=True)

## Strategy: Sequential Model Loading on 16GB

Since we only have 16GB of unified memory on Apple Silicon, we load models **sequentially** with aggressive memory cleanup between them. This ensures each model gets the full memory budget without OOM errors.

- **SD3 Medium** runs first: 28 steps, float16, T5-XXL encoder dropped to save ~14GB
- Memory is fully cleared (`del` pipeline + `gc.collect()` + `torch.mps.empty_cache()`)
- **FLUX.1-schnell** runs second: 4 steps, GGUF Q4_K_S quantization (~4.5GB model weight)

Same seed (42) and identical prompts ensure a fair apples-to-apples comparison.

In [None]:
print("=" * 50)
print("Phase 1: Generating with SD3 Medium")
print("=" * 50)

sd3_config = DiffusionConfig(
    model_name="sd3-medium",
    num_inference_steps=28,
    guidance_scale=7.0,
    height=512, width=512, seed=42,
    dtype="float16",
    enable_cpu_offload=True,
    drop_t5_encoder=True,
)

log_memory_usage("before SD3")
pipe_sd3 = load_pipeline(sd3_config)

sd3_images = []
sd3_times = []

for i, ps in enumerate(BENCHMARK_PROMPTS):
    print(f"[SD3 {i+1}/{len(BENCHMARK_PROMPTS)}] {ps.prompt[:50]}...")
    sd3_config.prompt = ps.prompt
    start = time.perf_counter()
    img = generate_image(pipe_sd3, sd3_config)
    elapsed = time.perf_counter() - start
    sd3_images.append(img)
    sd3_times.append(elapsed)
    print(f"  {elapsed:.1f}s")

# Free SD3 completely
del pipe_sd3
clear_memory()
log_memory_usage("after SD3 cleanup")
print("\nSD3 Medium complete. Memory cleared.")

In [None]:
print("=" * 50)
print("Phase 2: Generating with FLUX.1-schnell")
print("=" * 50)

flux_config = DiffusionConfig(
    model_name="flux-schnell",
    quantization="Q4_K_S",
    num_inference_steps=4,
    guidance_scale=0.0,
    height=512, width=512, seed=42,
    dtype="float16",
    enable_cpu_offload=True,
)

log_memory_usage("before FLUX")
pipe_flux = load_pipeline(flux_config)

flux_images = []
flux_times = []

for i, ps in enumerate(BENCHMARK_PROMPTS):
    print(f"[FLUX {i+1}/{len(BENCHMARK_PROMPTS)}] {ps.prompt[:50]}...")
    flux_config.prompt = ps.prompt
    start = time.perf_counter()
    img = generate_image(pipe_flux, flux_config)
    elapsed = time.perf_counter() - start
    flux_images.append(img)
    flux_times.append(elapsed)
    print(f"  {elapsed:.1f}s")

del pipe_flux
clear_memory()
log_memory_usage("after FLUX cleanup")
print("\nFLUX.1-schnell complete. Memory cleared.")

## The Comparison Grid

Now we'll create a professional side-by-side comparison grid. This is the **hero image** for the README and LinkedIn post — SD3 Medium on the left, FLUX.1-schnell on the right, same prompts and seed throughout.

In [None]:
n_prompts = len(BENCHMARK_PROMPTS)

fig = plt.figure(figsize=(14, 4 * n_prompts + 2))
gs = gridspec.GridSpec(n_prompts + 1, 2, hspace=0.3, wspace=0.05,
                        height_ratios=[0.3] + [1] * n_prompts)

# Header row
ax_header_left = fig.add_subplot(gs[0, 0])
ax_header_left.text(0.5, 0.5, 'SD3 Medium\n28 steps | float16 | no T5',
                     ha='center', va='center', fontsize=14, fontweight='bold',
                     color='#1565C0',
                     bbox=dict(boxstyle='round,pad=0.5', facecolor='#E3F2FD', edgecolor='#1565C0'))
ax_header_left.axis('off')

ax_header_right = fig.add_subplot(gs[0, 1])
ax_header_right.text(0.5, 0.5, 'FLUX.1-schnell\n4 steps | GGUF Q4_K_S',
                      ha='center', va='center', fontsize=14, fontweight='bold',
                      color='#C62828',
                      bbox=dict(boxstyle='round,pad=0.5', facecolor='#FFEBEE', edgecolor='#C62828'))
ax_header_right.axis('off')

# Image rows
for i in range(n_prompts):
    # SD3 image
    ax_sd3 = fig.add_subplot(gs[i + 1, 0])
    ax_sd3.imshow(sd3_images[i])
    ax_sd3.set_ylabel(f"{BENCHMARK_PROMPTS[i].category.upper()}\n{sd3_times[i]:.1f}s",
                       fontsize=10, fontweight='bold', rotation=0, labelpad=80, va='center')
    ax_sd3.set_xticks([])
    ax_sd3.set_yticks([])

    # FLUX image
    ax_flux = fig.add_subplot(gs[i + 1, 1])
    ax_flux.imshow(flux_images[i])
    ax_flux.set_ylabel(f"{flux_times[i]:.1f}s", fontsize=10, fontweight='bold',
                        rotation=0, labelpad=30, va='center')
    ax_flux.yaxis.set_label_position("right")
    ax_flux.set_xticks([])
    ax_flux.set_yticks([])

    # Add prompt as xlabel on the left image only
    if i == n_prompts - 1:
        ax_sd3.set_xlabel("")
        ax_flux.set_xlabel("")

fig.suptitle("Diffusion Models Evolution: SD3 Medium vs FLUX.1-schnell\nSame prompts, same seed (42), 512\u00d7512",
             fontsize=16, fontweight='bold', y=0.98)

# Save to assets
comparison_path = os.path.join("..", "assets", "comparison_grid.png")
fig.savefig(comparison_path, dpi=150, bbox_inches='tight', facecolor='white')
print(f"Saved comparison grid to {comparison_path}")

plt.show()

In [None]:
print("=" * 70)
print("HEAD-TO-HEAD COMPARISON: SD3 Medium vs FLUX.1-schnell")
print("=" * 70)
print(f"\n{'Prompt':<16} {'Category':<16} {'SD3 (s)':<10} {'FLUX (s)':<10} {'Speedup':<10}")
print("-" * 70)

for ps, sd3_t, flux_t in zip(BENCHMARK_PROMPTS, sd3_times, flux_times):
    speedup = sd3_t / flux_t if flux_t > 0 else float('inf')
    print(f"{ps.name:<16} {ps.category:<16} {sd3_t:<10.1f} {flux_t:<10.1f} {speedup:<10.1f}x")

avg_sd3 = sum(sd3_times) / len(sd3_times)
avg_flux = sum(flux_times) / len(flux_times)
avg_speedup = avg_sd3 / avg_flux if avg_flux > 0 else float('inf')

print("-" * 70)
print(f"{'AVERAGE':<16} {'':<16} {avg_sd3:<10.1f} {avg_flux:<10.1f} {avg_speedup:<10.1f}x")
print()
print("Configuration:")
print(f"  SD3 Medium:     28 steps, guidance=7.0, float16, no T5-XXL")
print(f"  FLUX.1-schnell: 4 steps, guidance=0.0, GGUF Q4_K_S")
print(f"  Resolution:     512\u00d7512")
print(f"  Device:         Apple Silicon (16GB unified memory)")

In [None]:
# Save individual side-by-side comparisons for each prompt
for i, ps in enumerate(BENCHMARK_PROMPTS):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

    ax1.imshow(sd3_images[i])
    ax1.set_title(f"SD3 Medium\n{sd3_times[i]:.1f}s | 28 steps", fontsize=11, fontweight='bold', color='#1565C0')
    ax1.axis('off')

    ax2.imshow(flux_images[i])
    ax2.set_title(f"FLUX.1-schnell\n{flux_times[i]:.1f}s | 4 steps", fontsize=11, fontweight='bold', color='#C62828')
    ax2.axis('off')

    fig.suptitle(f"{ps.category.upper()}: {ps.prompt[:70]}...", fontsize=11)
    plt.tight_layout()

    fig.savefig(os.path.join("..", "outputs", "comparisons", f"{ps.name}_comparison.png"),
                dpi=150, bbox_inches='tight', facecolor='white')
    plt.close()

print(f"Saved {len(BENCHMARK_PROMPTS)} individual comparisons to outputs/comparisons/")

## Analysis & Key Findings

### Speed
FLUX.1-schnell is dramatically faster despite having **6x more parameters** (12B vs 2B). The combination of guidance distillation and 4-step flow matching makes it possible to generate high-quality images in a fraction of the time SD3 Medium requires at 28 steps.

### Text Rendering
Both models benefit from MMDiT's two-way attention mechanism, which allows text tokens and image patches to attend to each other bidirectionally. FLUX tends to render clearer, more legible text — likely due to its larger parameter count and improved training data.

### Photorealism
FLUX.1 produces more photorealistic results even at 4-bit quantization (Q4_K_S). The flow matching objective and rectified flow paths give FLUX a smoother, more natural appearance compared to SD3's DDPM-based sampling.

### Spatial Reasoning
Both models handle spatial prompts well — understanding "on top of", "behind", and relative positioning. This is a major improvement over SDXL and earlier UNet-based architectures, where spatial compositionality was a persistent weakness.

### The 2024 Takeaway
The shift from **UNet to MMDiT** and **DDPM to Flow Matching** represents a genuine paradigm shift in generative AI. These are not incremental improvements — they fundamentally change how diffusion models understand and compose visual concepts.

## What's Next

Now that you've seen the comparison, try these:

- **Run the interactive Gradio demo**: `python scripts/launch_app.py`
- **Try your own prompts**: `python scripts/generate.py --model flux-schnell --prompt "your prompt"`
- **Full benchmark**: `python scripts/benchmark.py`