-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Describe the bug
Problem:
I am experiencing degraded image generation performance while fine-tuning the U-Net part of the SDXL model using the train_text_to_image_sdxl.py script from the diffusers library. The generated results are noticeably worse compared to training on SD1.5 with the exact same dataset.
Here are some examples of generating indoor decoration images:
sd1.5:


sd-xl1.0:


Then I tried to finetune the sdxl on the pokeman dataset. As the number of training steps increased, the image generation effect seemed to deteriorate, and the fine-tuning script of the sdxl model seemed to only damage the original model's generation effect:
steps 100:

steps 300:

steps 1000:

steps 2000:

Has anyone else encountered a similar issue?
Alternatively, has anyone achieved favorable results when fine-tuning the U-Net script in SDXL?
I hope to receive insights from the community on this matter. Thank you!
Reproduction
export MODEL_NAME="stable-diffusion-xl-base-1.0"
export VAE_NAME="sdxl-vae-fp16-fix"
export DATASET_NAME="pokemon-blip-captions"
/aistudio/workspace/system-default/envs/diffusers/bin/python train_sdxl_test.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_model_name_or_path=$VAE_NAME
--dataset_name=$DATASET_NAME
--output_dir=/aistudio/workspace/sd/diffusers/examples/text_to_image/ft_model/pokemon_modify/
--resolution=512 --center_crop --random_flip
--proportion_empty_prompts=0.2
--train_batch_size=1
--enable_xformers_memory_efficient_attention
--gradient_accumulation_steps=4 --gradient_checkpointing
--max_train_steps=2000
--learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0
--mixed_precision="fp16"
--use_8bit_adam
--checkpointing_steps=100
--dataloader_num_workers=20
Logs
No response
System Info
diffusersversion: 0.22.0.dev0- Platform: Linux-4.19.96-x86_64-with-glibc2.17
- Python version: 3.10.13
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Huggingface_hub version: 0.17.3
- Transformers version: 4.35.1
- Accelerate version: 0.24.1
- xFormers version: 0.0.22.post7
- Using GPU in script?:
- Using distributed or parallel set-up in script?: yes