# Fine-tuning Stable Diffusion XL with DreamBooth and LoRA

This notebook demonstrates the complete pipeline for fine-tuning Stable Diffusion XL using DreamBooth and LoRA techniques. This project was developed as part of a school assignment to explore and understand modern diffusion models.

## Project Overview

This notebook will guide you through:

1. Environment setup and dependency installation
2. Dataset preparation requirements
3. Model fine-tuning using DreamBooth and LoRA
4. Inference with the fine-tuned model
5. Creating an interactive interface
6. (Experimental) Cloud deployment with Modal.com


## Initial Setup and GPU Check

First, let's verify our GPU setup:


In [None]:
import torch

print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name()}")
    print(
        f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB"
    )

## Installing Required Packages

Clone and install the diffusers repository:


In [None]:
!git clone https://github.com/huggingface/diffusers
%cd diffusers
!pip install -e .

### Install DreamBooth requirements:


In [None]:
%cd examples/dreambooth
!pip install -r requirements_sdxl.txt

### Configure accelerate:


In [None]:
!accelerate config default


## Hugging Face Authentication

Set up authentication with Hugging Face:


In [None]:
from huggingface_hub import login

# IMPORTANT: Replace YOUR_TOKEN with your actual token when running
# login("YOUR_TOKEN")

# Never commit or share your token!

## Dataset Requirements

Before proceeding with training, ensure your dataset meets these requirements:

- 10-25 high-quality images
- Good variety in angles, lighting, and poses
- Clear, focused images without blur
- Consistent subject matter

Here's a helper script to prepare your dataset metadata:


In [None]:
import pandas as pd
import os

# Create metadata with all images
metadata = {
    "file_name": [
        "image_1.jpg",
        "image_2.jpg",
        "image_3.jpg",
        # Add all your image files
    ],
    "caption": [
        "image_1.txt",
        "image_2.txt",
        "image_3.txt",
        # Add corresponding captions
    ],
}

# Create directory structure
os.makedirs("dataset", exist_ok=True)
os.makedirs("dataset/images", exist_ok=True)

# Convert to DataFrame
df = pd.DataFrame(metadata)

# Save metadata
df.to_csv("dataset/metadata.csv", index=False)  # Human readable
df.to_parquet("dataset/metadata.parquet")  # For HuggingFace

## Model Fine-tuning

Now we'll fine-tune the model using DreamBooth and LoRA:


In [None]:
!accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
--dataset_name="YOUR_DATASET_PATH" \
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
--output_dir="lora-trained-xl" \
--instance_prompt="YOUR_TRIGGER_WORD person" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=400 \
--validation_prompt="A photo of YOUR_TRIGGER_WORD person in a studio" \
--validation_epochs=5 \
--seed="0" \
--hub_model_id="YOUR_HF_REPO_NAME" \
--push_to_hub \
--mixed_precision="fp16" \
--gradient_checkpointing

## Basic Inference

After training, generate images with your fine-tuned model:


import gc
import torch
from diffusers import DiffusionPipeline
from IPython.display import display, Image

# Clear CUDA cache

torch.cuda.empty_cache()
gc.collect()

# Load base model and LoRA weights

pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")

pipeline.load_lora_weights(
"YOUR_MODEL_PATH_ON_HF" # Replace with your model path
)

# Generate image

prompt = "A photo of YOUR_TRIGGER_WORD person in a professional setting"
negative_prompt = "blurry, low quality, distorted"

image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=40,
guidance_scale=2.5,
).images[0]

# Display and save

display(image)
image.save("generated_image.png")

# Cleanup

del pipeline
torch.cuda.empty_cache()
gc.collect()


## Memory-Efficient Inference

For systems with limited GPU memory:


import gc
import torch
from diffusers import AutoPipelineForText2Image

# Clear cache

torch.cuda.empty_cache()
gc.collect()

# Initialize with memory optimizations

pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")

pipeline.enable_sequential_cpu_offload()
pipeline.enable_attention_slicing(1)

# Load LoRA weights

pipeline.load_lora_weights(
"YOUR_MODEL_PATH_ON_HF"
)

# Generate

with torch.cuda.amp.autocast():
image = pipeline(
prompt="YOUR_PROMPT_HERE",
num_inference_steps=35,
height=512,
width=512,
guidance_scale=2.5,
).images[0]

image.save("optimized_generation.png")
display(image)

# Cleanup

del pipeline
torch.cuda.empty_cache()
gc.collect()


## Interactive Gradio Interface (Optional)

Create a user-friendly interface for generation:


In [None]:
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image
import gc


class ModelPipeline:
    def __init__(self):
        self.pipeline = None

    def load_model(self):
        if self.pipeline is None:
            self.pipeline = AutoPipelineForText2Image.from_pretrained(
                "stabilityai/stable-diffusion-xl-base-1.0",
                torch_dtype=torch.float16,
            ).to("cuda")
            self.pipeline.load_lora_weights("YOUR_MODEL_PATH_ON_HF")
            self.pipeline.enable_sequential_cpu_offload()
            self.pipeline.enable_attention_slicing(1)

    def generate_image(
        self,
        prompt,
        negative_prompt="",
        num_steps=35,
        guidance_scale=2.5,
        width=512,
        height=512,
    ):
        self.load_model()
        with torch.cuda.amp.autocast():
            image = self.pipeline(
                prompt=prompt,
                negative_prompt=negative_prompt,
                num_inference_steps=num_steps,
                guidance_scale=guidance_scale,
                width=width,
                height=height,
            ).images[0]
        return image


# Initialize model
model = ModelPipeline()


# Create interface
def generate(prompt, negative_prompt, steps, guidance, width, height):
    return model.generate_image(
        prompt, negative_prompt, int(steps), float(guidance), int(width), int(height)
    )


interface = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt"),
        gr.Textbox(label="Negative Prompt", value=""),
        gr.Slider(minimum=20, maximum=100, value=35, label="Steps"),
        gr.Slider(minimum=1.0, maximum=20.0, value=2.5, label="Guidance Scale"),
        gr.Slider(minimum=256, maximum=1024, value=512, step=64, label="Width"),
        gr.Slider(minimum=256, maximum=1024, value=512, step=64, label="Height"),
    ],
    outputs=gr.Image(type="pil"),
    title="Fine-tuned SDXL Image Generator",
    description="Generate images using your fine-tuned SDXL model",
)


interface.launch(share=True)  # Set share=False if you don't want a public URL

# (WIP) Image Refinement

### This section demonstrates an experimental feature to refine and improve already generated images. Note that this feature is still under development and results may vary.


In [None]:
import gc
import torch
from diffusers import AutoPipelineForImage2Image
from PIL import Image

# Clear CUDA cache before starting
torch.cuda.empty_cache()
gc.collect()

# Initialize the refinement pipeline
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
).to("cuda")

# Enable memory optimizations
pipeline.enable_sequential_cpu_offload()
pipeline.enable_attention_slicing(1)

# Load an image to refine
original_image = Image.open("your_generated_image.png")
if original_image.mode != "RGB":
    original_image = original_image.convert("RGB")

# Refinement prompt - focus on aspects to enhance
refinement_prompt = """Ultra high quality photograph, perfect lighting, 
    no artifacts, clear details, professional photography"""

# Generate refined image
with torch.cuda.amp.autocast():
    refined_image = pipeline(
        prompt=refinement_prompt,
        image=original_image,
        strength=0.3,  # Adjust this value to control refinement intensity
        num_inference_steps=30,
        guidance_scale=7.5,
    ).images[0]

# Save the refined image
refined_image.save("refined_image.png")

# Display before/after comparison
display(original_image)
print("Original Image ↑")
display(refined_image)
print("Refined Image ↑")

# Clean up
del pipeline
torch.cuda.empty_cache()
gc.collect()

## Notes on Image Refinement:

### This is a work in progress feature, at its current stage it did not seem to produce results that would beat the original image generation pipeline.

### The strength parameter controls refinement intensity:

<ul>
<li>Lower values (0.1-0.3): Subtle improvements</li>
<li>Higher values (0.4-0.7): More dramatic changes</li>
</ul>

Choose refinement prompts that focus on specific aspects you want to improve
Results may vary depending on input image and prompt
Keep the original image until you're satisfied with the refinement
