## Fine-tuning Stable Diffusion XL with DreamBooth and LoRA

In this notebook, we fine-tune [Stable Diffusion XL (SDXL)](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl) with [DreamBooth](https://huggingface.co/docs/diffusers/main/en/training/dreambooth) and [LoRA](https://huggingface.co/docs/diffusers/main/en/training/lora).

We prepared our own dataset. The details are mentioned in the documentation.
We picked to fine tune it on images of Micheal Scott from The Office.
You can use this code to fine-tune your model on something else.
Make sure to **connect to a T4 GPU** before running this notebook.


## Setup

In [None]:
!nvidia-smi

**Installing dependencies**

In [None]:
!pip install bitsandbytes transformers accelerate peft -q

In [None]:
!pip install git+https://github.com/huggingface/diffusers.git -q

now we download diffusers SDXL DreamBooth training script

In [None]:
!wget https://raw.githubusercontent.com/huggingface/diffusers/main/examples/dreambooth/train_dreambooth_lora_sdxl.py

## **Dataset (training data)**


upload example images

In [None]:
import os
from google.colab import files

local_dir = "./micheal_scott/"
os.makedirs(local_dir)
os.chdir(local_dir)

uploaded_images = files.upload()
os.chdir("/content")

Preview the images:

In [None]:
from PIL import Image

def image_grid(imgs, rows, cols, resize=256):

    if resize is not None:
        imgs = [img.resize((resize, resize)) for img in imgs]
    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

In [None]:
import glob

# change path to display images from your local dir
img_paths = "./micheal_scott/*.jpg"
imgs = [Image.open(path) for path in glob.glob(img_paths)]

num_imgs_to_preview = 5
image_grid(imgs[:num_imgs_to_preview], 1, num_imgs_to_preview)

Free some memory

In [None]:
import gc

gc.collect()
torch.cuda.empty_cache()

## Training setup

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!accelerate config default

We passed the access token so that we can push the trained checkpoints to the Hugging Face Hub

In [None]:
from huggingface_hub import notebook_login
notebook_login('***********')

## Training

### Launch training

In [None]:
!pip install datasets -q

 - Use `--output_dir` to specify your LoRA model repository name!
 - Use `--caption_column` to specify name of the cpation column in your dataset. In this example we used "prompt" to
 save our captions in the
 metadata file, change this according to your needs.

** here we are reszing the images to make them the same dimension by adding padding (This can be skipped). Our target is 1024 pixels. You can change it accordingly **

In [None]:
from PIL import Image, ImageOps
import os

input_folder = '/content/micheal_scott'
output_folder = '/content/micheal_scott_new'
target_size = 1024

os.makedirs(output_folder, exist_ok=True)

for filename in os.listdir(input_folder):
    if not filename.lower().endswith(('.png', '.jpg', '.jpeg', '.webp')):
        continue

    img = Image.open(os.path.join(input_folder, filename)).convert("RGB")
    img.thumbnail((target_size, target_size), Image.LANCZOS)

    delta_w = target_size - img.width
    delta_h = target_size - img.height
    padding = (delta_w // 2, delta_h // 2, delta_w - (delta_w // 2), delta_h - (delta_h // 2))
    img = ImageOps.expand(img, padding, fill=(0, 0, 0))

    img.save(os.path.join(output_folder, filename), quality=95)


In [None]:
%cd /content/

In [None]:
!accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="micheal_scott_new" \
  --output_dir="micheal_scott_LoRA_29_1024" \
  --caption_column="prompt"\
  --mixed_precision="fp16" \
  --instance_prompt="a portrait photo of Micheal Scott, office setting, realistic lighting" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 \
  --gradient_checkpointing \
  --learning_rate=1e-4 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --max_train_steps=2000 \
  --checkpointing_steps=500 \
  --seed="0"

### Save your model to the hub and check it

In [None]:
from huggingface_hub import whoami
from pathlib import Path

output_dir = "micheal_scott_LoRA_29_1024"
username = whoami(token=Path("/root/.cache/huggingface/"))["name"]
repo_id = f"{username}/{output_dir}_29i_1024"

In [None]:
# @markdown

from train_dreambooth_lora_sdxl import save_model_card
from huggingface_hub import upload_folder, create_repo

repo_id = create_repo(repo_id, exist_ok=True).repo_id

save_model_card(
    repo_id = repo_id,
    images=[],
    base_model="stabilityai/stable-diffusion-xl-base-1.0",
    train_text_encoder=False,
    instance_prompt="a photo of Micheal Scott from the office",
    validation_prompt=None,
    repo_folder=output_dir,
    vae_path="madebyollin/sdxl-vae-fp16-fix",
    use_dora=False
)

upload_folder(
    repo_id=repo_id,
    folder_path=output_dir,
    commit_message="End of training",
    ignore_patterns=["step_*", "epoch_*"],
)

In [None]:
from IPython.display import display, Markdown

link_to_model = f"https://huggingface.co/{repo_id}"
display(Markdown("### Your model has finished training.\nAccess it here: {}".format(link_to_model)))

## Inference

In [None]:
import torch
from diffusers import DiffusionPipeline, AutoencoderKL

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.load_lora_weights(repo_id)
_ = pipe.to("cuda")

In [None]:
prompt = "A side profile shot of Michael Scott from The Office, wearing a dark suit, a striped dress shirt, and a dark patterned tie. He is standing in the office with his hands resting on a counter, looking intently to his right. An EXIT sign is visible in the background." # @param

image = pipe(prompt=prompt, num_inference_steps=25).images[0]
image