Skip to content

Degraded Image Generation Performance in SD-XL U-Net Fine-tuning #5956

@wddwzwhhxx

Description

@wddwzwhhxx

Describe the bug

Problem:
I am experiencing degraded image generation performance while fine-tuning the U-Net part of the SDXL model using the train_text_to_image_sdxl.py script from the diffusers library. The generated results are noticeably worse compared to training on SD1.5 with the exact same dataset.
Here are some examples of generating indoor decoration images:
sd1.5:
image
image
sd-xl1.0:
image
image

Then I tried to finetune the sdxl on the pokeman dataset. As the number of training steps increased, the image generation effect seemed to deteriorate, and the fine-tuning script of the sdxl model seemed to only damage the original model's generation effect:
steps 100:
image
steps 300:
image
steps 1000:
image
steps 2000:
image

Has anyone else encountered a similar issue?
Alternatively, has anyone achieved favorable results when fine-tuning the U-Net script in SDXL?
I hope to receive insights from the community on this matter. Thank you!

Reproduction

export MODEL_NAME="stable-diffusion-xl-base-1.0"
export VAE_NAME="sdxl-vae-fp16-fix"
export DATASET_NAME="pokemon-blip-captions"

/aistudio/workspace/system-default/envs/diffusers/bin/python train_sdxl_test.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_model_name_or_path=$VAE_NAME
--dataset_name=$DATASET_NAME
--output_dir=/aistudio/workspace/sd/diffusers/examples/text_to_image/ft_model/pokemon_modify/
--resolution=512 --center_crop --random_flip
--proportion_empty_prompts=0.2
--train_batch_size=1
--enable_xformers_memory_efficient_attention
--gradient_accumulation_steps=4 --gradient_checkpointing
--max_train_steps=2000
--learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0
--mixed_precision="fp16"
--use_8bit_adam
--checkpointing_steps=100
--dataloader_num_workers=20

Logs

No response

System Info

  • diffusers version: 0.22.0.dev0
  • Platform: Linux-4.19.96-x86_64-with-glibc2.17
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.1.0+cu121 (True)
  • Huggingface_hub version: 0.17.3
  • Transformers version: 4.35.1
  • Accelerate version: 0.24.1
  • xFormers version: 0.0.22.post7
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?: yes

Who can help?

@sayakpaul @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions