Skip to content

[WIP][Examples] Support LoRA DreamBooth training with SD XL#3896

Closed
sayakpaul wants to merge 58 commits intomainfrom
dreambooth/sd-xl-2
Closed

[WIP][Examples] Support LoRA DreamBooth training with SD XL#3896
sayakpaul wants to merge 58 commits intomainfrom
dreambooth/sd-xl-2

Conversation

@sayakpaul
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul commented Jun 29, 2023

What does this PR do?

This PR is for me to gather feedback on the structure and the modifications to accommodate the DreamBooth LoRA training with SD XL.

Keep in mind

This PR adds an example to show how to conduct DreamBooth LoRA training with SDXL. Builds on top of #3859.

While reviewing the PR, please only restrict yourself to the train_dreambooth_lora_sd_xl.py script.

Others

  • The script should be ready more or less as a good first draft.
  • No support for text encoder yet as the modifications are reasonably sized.
  • I have not run make style && make quality to not mess with the other files.
  • There are many use_auth_token=True in the script because of obvious reasons.
  • No documentation and test cases yet. Need to gather good enough results and findings first. But getting the script to work took more time than I thought. So, I think it would be good to have some 👀 from the get-go.

I am currently training with the following command:

export MODEL_NAME="diffusers/stable-diffusion-xl-base-0.9"
export INSTANCE_DIR="dog"
export CLASS_DIR="dog-class"
export OUTPUT_DIR="lora-trained-xl"

accelerate launch train_dreambooth_lora_sd_xl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=100 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=50 \
  --seed="0" \
  --push_to_hub

The dog dataset was downloaded using the following code:

from huggingface_hub import snapshot_download

local_dir = "./dog"
snapshot_download(
    "diffusers/dog-example",
    local_dir=local_dir, repo_type="dataset",
    ignore_patterns=".gitattributes",
)

The training artifacts are available here: https://huggingface.co/diffusers/lora-trained-xl (private, only visible to the diffusers team members for now).

@sayakpaul sayakpaul requested review from patrickvonplaten and pcuenca and removed request for patrickvonplaten June 29, 2023 09:09
def compute_embeddings(prompt, text_encoders, tokenizers):
original_size = (args.resolution, args.resolution)
target_size = (args.resolution, args.resolution)
crops_coords_top_left = (0, 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass in crop coords as an optional value, because forcing this to 0,0 while also forcing a square resolution and cropping the input data is kind of going against the results of the SDXL paper, and the purpose of these conditioning inputs.

@bghira
Copy link
Copy Markdown
Contributor

bghira commented Jul 4, 2023

i would like for these examples to begin using a common module for shared functions, the code duplication is pretty high and the maintenance cost of updating all of these scripts just grows over time, now that each new model seems to get its own examples with possibly 4 different scripts.

if we use a common module, i would like for it to have the data bucketing implemented that was used in the original technical report, so that we use a conditioning method to scale the smaller side to a fixed value and then use a two-decimal rounded aspect ratio (eg. 1.78 for 16:9) to put these images into buckets.

it gets tricky, because the original model was trained on 256x256, 512x512, and 1024 based multi-aspect.

if we had multiple dataloaders for SDXL (1 for each base resolution) and a common module to handle this, then general finetune, LoRA and TI scripts can all benefit from that.

i implemented aspect bucketing into the diffusers finetuning script for SD2.1 and can share whatever lessons from that.

@sayakpaul
Copy link
Copy Markdown
Member Author

Hey @bghira, thanks for sharing your insights!

i would like for these examples to begin using a common module for shared functions, the code duplication is pretty high and the maintenance cost of updating all of these scripts just grows over time, now that each new model seems to get its own examples with possibly 4 different scripts.

Please note that we purposefully don't do it as stated in our doc. Also, we don't want to maintain too many examples here. If you checked, the DreamBooth example we currently have in main only caters to SD and IF. We decided to add a separate one for SDXL based on the potential impact. So, until and unless there's something significantly more impactful, we won't likely be adding anything new.

For aspect-ratio bucketing, we purposefully don't want to add it because it introduces complexity we want to avoid to respect the goals of having these examples officially maintained, i.e., to provide not just the examples but also achieve readability and simplicity baked inside of them. But more than happy to welcome a PR from you if you want to maybe add that as a community example. Happy for it to be also included from our official docs :)

@sayakpaul
Copy link
Copy Markdown
Member Author

Closing in favor of #4016.

@sayakpaul sayakpaul closed this Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants