-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
Description
Describe the bug
Hello, I have the following error when trying to train a LoRA with SDXL:
ValueError: Attempting to unscale FP16 gradients.
Traceback (most recent call last):
File "/nfs/horai.dgpsrv/year/zling/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1994, in <module>
main(args)
File "/nfs/horai.dgpsrv/year/zling/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1823, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/u8/c/zling/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 2396, in clip_grad_norm_
self.unscale_gradients()
File "/u8/c/zling/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 2340, in unscale_gradients
self.scaler.unscale_(opt)
File "/u8/c/zling/.local/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/u8/c/zling/.local/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 260, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Reproduction
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
# export INSTANCE_DIR="dog"
export INSTANCE_DIR="/scratch/year/zling/progressive-shading/picasso-data/surrealism_images"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
accelerate launch --gpu_ids 0,1 train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--instance_prompt="a drawing in sks style" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A drawing of sks style" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
Logs
System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.28.1
- Transformers version: 4.48.3
- Accelerate version: 1.3.0
- PEFT version: 0.7.0
- Bitsandbytes version: not installed
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator:
NVIDIA RTX A6000, 49140 MiB - Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
No response
sayakpaul