# Dreambooth fine-tuning for Stable Diffusion

This notebook shows how to "teach" Stable Diffusion a new concept via Dreambooth using 🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers).

We use the training script provided diffusers [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) and follow the guide available [here](https://huggingface.co/docs/diffusers/training/dreambooth).

## Configuration
As explained in the guide, dreambooth is quite susceptible to overfitting and finding the right hyperparameters can be challenging. We tried various configurations to find the one that works best for our use case. We also followed the advice from [their analysis](https://huggingface.co/blog/dreambooth) on how to train dreambooth.

Overall, we found that using Low-Rank Adaptation of Large Language Models (LoRA) gives better results. We also took advantage of GPU optimization tools, such as **xFormers**. [xFormers](https://github.com/facebookresearch/xformers) can be installed with 

```console
pip install xformers
```
and is enabled by adding the `--enable_xformers_memory_efficient_attention` argument to the training script.

We also use [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 8-bit optimizer (add `--use_8bit_adam`).

## Setup

Before running the script, we need to install the dependencies. Installing the dependencies with poetry caused some issues, so we recommend to use a python virtual env manager to install dependencies.

In [1]:
# Clone the huggingface diffusers repo and install dependencies
!git clone https://github.com/huggingface/diffusers ../diffusers
!pip install -e ../diffusers
!pip install -U -r ../diffusers/examples/dreambooth/requirements.txt
!pip install bitsandbytes xformer
!accelerate config default

Cloning into '../diffusers'...
remote: Enumerating objects: 31055, done.[K
remote: Counting objects: 100% (932/932), done.[K
remote: Compressing objects: 100% (506/506), done.[K
remote: Total 31055 (delta 618), reused 598 (delta 367), pack-reused 30123[K
Receiving objects: 100% (31055/31055), 22.50 MiB | 4.15 MiB/s, done.
Resolving deltas: 100% (22669/22669), done.
Obtaining file:///Users/jonas/workspace/adomvi/diffusers
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: diffusers
  Building editable for diffusers (pyproject.toml) ... [?25ldone
[?25h  Created wheel for diffusers: filename=diffusers-0.19.0.dev0-0.editable-py3-none-any.whl size=10598 sha256=723bbf222efad7d53e3bee93d537a91e0c4564fd791c469bf63dd5e778de41ed
  Stored in directory: /pr

## Download instance dataset

To begin, we'll try to train dreambooth on a sample dataset with a few images of a dog. You can download the dataset from [here](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) or with the following script

In [3]:
from huggingface_hub import snapshot_download

local_dir = "../dog"
snapshot_download(
    "diffusers/dog-example",
    local_dir=local_dir,
    repo_type="dataset",
    ignore_patterns=".gitattributes",
)

Fetching 5 files: 100%|███████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 15.25it/s]


'/Users/jonas/workspace/adomvi/dog'

## Fine-tune the model

To train the model, we found these settings work best. We train with LoRA, on 1500 steps, saving checkpoints every 500 step. We finetune the text_encoder along with the unet.

In [5]:
!export MODEL_NAME="runwayml/stable-diffusion-v1-5"
!export INSTANCE_DIR="../dog"
!export CLASS_DIR="../dogs"
!export OUTPUT_DIR="../adomvi-dream-dog"

!accelerate launch ../diffusers/examples/dreambooth/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation \
  --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of a dog" \
  --num_class_images=200 \
  --resolution=512 \
  --train_batch_size=1 \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --checkpointing_steps=500 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=1500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=50 \
  --seed="0"

## Running inference

Once the model has been trained, we can run inference to generate new images from a prompt

In [None]:
from huggingface_hub.repocard import RepoCard
from diffusers import StableDiffusionPipeline
import torch

lora_model_id = "adomvi-dream-dog"
card = RepoCard.load(lora_model_id)
base_model_id = card.data.to_dict()["base_model"]

pipe = StableDiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_model_id)

prompt = "A picture of a sks dog in a bucket."
negative_prompt = ("(low quality, worst quality:1.4), "
                   "bad composition, inaccurate eyes")

images = pipeline(prompt=prompt, 
    negative_prompt=negative_prompt, 
    width=512, 
    height=768, 
    num_inference_steps=100, 
    num_images_per_prompt=4,
    generator=torch.manual_seed(0)
).images

In [None]:
from diffusers import DiffusionPipeline
import torch

model_id = "dreambooth"
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("dog-bucket.png")