# 🎨 FOCUS Inference Demo
## Multi-Object Compositional Generation with FLUX and Stable Diffusion 3.5

This notebook demonstrates how to use FOCUS-finetuned models for improved multi-object compositional generation.

**FOCUS** enhances diffusion models' ability to generate multiple distinct objects in a single image with better composition and object fidelity.

We'll cover:
1. **Environment Setup** - Install dependencies and configure GPU
2. **FLUX-FOCUS Inference** - Generate images with FLUX.1-dev FOCUS model
3. **SD3.5-FOCUS Inference** - Generate images with Stable Diffusion 3.5 FOCUS model

---

## 📦 Setup & Installation

First, let's install the required dependencies. We'll need the latest versions of `transformers` and `diffusers` for compatibility with both FLUX and SD3.5.

In [None]:
# Install required packages
!pip install -q -U transformers==4.53.0 diffusers==0.33.1 accelerate

In [None]:
# Check GPU availability and specs
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️ Warning: No GPU detected. This notebook requires a GPU for inference.")

---

## 🌊 FLUX.1-dev with FOCUS

**FLUX.1-dev** is a 12B parameter model that excels at text-to-image generation with exceptional quality and prompt adherence.

The **FOCUS-finetuned version** (`ericbill21/flux_focus`) improves multi-object compositional generation, making it better at:
- Generating multiple distinct objects in one image
- Maintaining object fidelity and characteristics
- Following complex compositional prompts

### Load the FLUX-FOCUS Model

In [None]:
import torch
from diffusers import FluxPipeline

# Load the FLUX-FOCUS model from Hugging Face
flux_pipe = FluxPipeline.from_pretrained(
    "ericbill21/flux_focus",
    torch_dtype=torch.bfloat16
)

# For GPUs with 24GB+ VRAM, load directly to GPU:
flux_pipe.to("cuda")

# For smaller GPUs (less than 24GB VRAM), use sequential CPU offloading:
# flux_pipe.enable_sequential_cpu_offload()
# Note: This will be slower but use less VRAM

print("✅ FLUX-FOCUS model loaded successfully!")

### Generate Images with FLUX-FOCUS

FLUX supports longer prompts (max_sequence_length=256) compared to SD3.5, allowing for more detailed and nuanced descriptions.

**Recommended settings for FLUX:**
- `guidance_scale`: 3.5 (default)
- `num_inference_steps`: 28-50
- `max_sequence_length`: 256

In [None]:
# Generate an image with multiple objects
flux_image = flux_pipe(
    prompt="A majestic lion and a graceful tiger resting side by side in a lush jungle clearing, sunlight filtering through the canopy",
    num_inference_steps=28,
    guidance_scale=3.5,
    max_sequence_length=256,  # FLUX supports longer prompts
    height=1024,  # FLUX works well at higher resolutions
    width=1024,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]

# Display the image
flux_image

In [None]:
# Try another example: Three different objects
flux_image2 = flux_pipe(
    prompt="A red parrot, a blue butterfly, and a yellow snake in a vibrant tropical rainforest",
    num_inference_steps=28,
    guidance_scale=3.5,
    max_sequence_length=256,
    height=1024,
    width=1024,
    generator=torch.Generator("cpu").manual_seed(123),
).images[0]

flux_image2

---

## 🎯 Stable Diffusion 3.5 with FOCUS

**Stable Diffusion 3.5** is a powerful text-to-image model with excellent prompt adherence and efficiency.

The **FOCUS-finetuned version** (`ericbill21/sd35_focus`) enhances its ability to generate multiple distinct objects in compositional scenes with:
- Better object separation and distinctiveness
- Improved spatial relationship understanding
- More accurate multi-object rendering

### Load the SD3.5-FOCUS Model

In [None]:
import torch
from diffusers import StableDiffusion3Pipeline

# Load the SD3.5-FOCUS model from Hugging Face
sd35_pipe = StableDiffusion3Pipeline.from_pretrained(
    "ericbill21/sd35_focus",
    torch_dtype=torch.bfloat16
)

# Load to GPU
sd35_pipe.to("cuda")

# For smaller GPUs, you can use CPU offloading:
# sd35_pipe.enable_model_cpu_offload()

print("✅ SD3.5-FOCUS model loaded successfully!")

### Generate Images with SD3.5-FOCUS

SD3.5 uses a shorter max_sequence_length (77) but excels at prompt adherence and compositional generation with FOCUS finetuning.

**Recommended settings for SD3.5:**
- `guidance_scale`: 4.5-7.0
- `num_inference_steps`: 28-50
- `max_sequence_length`: 77

In [None]:
# Generate an image with multiple objects
sd35_image = sd35_pipe(
    prompt="A horse and a bear standing together in a misty forest",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=77,
    height=1024,
    width=1024,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]

# Display the image
sd35_image

In [None]:
# Try another example: Multiple vehicles
sd35_image2 = sd35_pipe(
    prompt="A red sports car and a blue motorcycle parked side by side on a city street",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=77,
    height=1024,
    width=1024,
    generator=torch.Generator("cpu").manual_seed(99),
).images[0]

sd35_image2