Skip to content

diffusers doesn't save and load the LoraConfig, resulting wrong lora_alpha during inference. #6087

@YiqunChen1999

Description

@YiqunChen1999

Describe the bug

When I run train_dreambooth_lora_sdxl.py, I find the StableDiffusionXLLoraLoaderMixin (and LoraLoaderMixin) doesn't save & load the LoraConfig.

The default lora_alpha is 8 in LoraConfig but is 1 in peft Linear. The lora_alpha is 8 during training but is 1 when I load the weights from a checkpoint as the XXLoraLoaderMixin cannot parse the correct lora_alpha from the saved Lora weights. The wrong lora_alpha gives the wrong scaling in the peft Linear layer.

I manually reset the scaling to get the correct result, but I think this is a bug and should be fixed.

Reproduction

Clone the diffusers repo and install it. Then prepare any one image in a folder.

In the example/dreambooth folder, run:

python train_dreambooth_lora_sdxl.py \
    --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --pretrained_vae_model_name_or_path madebyollin/sdxl-vae-fp16-fix \
    --instance_data_dir [PATH_TO_YOUR_IMAGE_DIR]  \
    --instance_prompt "<sks>" \  # try to overfit the given image
    --validation_prompt "<sks>" \
    --num_validation_images 1 \
    --validation_epochs 50 \
    --output_dir [OUTPUT_DIR] \
    --train_batch_size 1 \
    --max_train_steps 300 \
    --checkpointing_steps 100 \
    --learning_rate 2e-4  \
    --rank 64

For example, I try to overfit the first image. The second image is the validation image at epoch 250 (tensorboard). The third one is the result after the pipeline loads Lora weights from the saved checkpoint (epoch 300, also available in tensorboard).
image
image
image

Logs

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • diffusers version: 0.25.0.dev0
  • Platform: Linux-4.18.0-305.3.1.el8.x86_64-x86_64-with-glibc2.28
  • Python version: 3.11.5
  • PyTorch version (GPU?): 2.1.1 (True)
  • Huggingface_hub version: 0.19.4
  • Transformers version: 4.35.2
  • Accelerate version: 0.25.0
  • xFormers version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@sayakpaul @patri

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions