ValueError: Attempting to unscale FP16 gradients. when using examples/dreambooth/train_dreambooth_lora_sd3.py

### Describe the bug

I was using the example training script `examples/dreambooth/train_dreambooth_lora_sd3.py`. 

I changed `autocast_ctx = nullcontext()` to `autocast_ctx = torch.autocast(accelerator.device.type, dtype=torch.float32)` to address RuntimeError: Input type (float) and bias type (c10::Half) should be the same.

However, I got ValueError: Attempting to unscale FP16 gradients during running validation. 

The complete error message:


(/home/ubuntu/code/yafeimao_env/diffusers) ubuntu@ip-172-31-20-87:~/code/yafeimao_code/diffusers_new/diffusers/examples/dreambooth$ accelerate launch train_dreambooth_lora_sd3.py   --mixed_precision="fp16"   --pretrained_model_name_or_path=$MODEL_NAME    --instance_data_dir=$INSTANCE_DIR   --output_dir=$OUTPUT_DIR   --instance_prompt="a photo of green floral dress"   --resolution=512   --train_batch_size=1   --gradient_accumulation_steps=4   --learning_rate=1e-5   --lr_scheduler="constant"   --lr_warmup_steps=0   --max_train_steps=500   --validation_prompt="A photo of a female model wearing green floral dress"   --validation_epochs=25   --seed="0"   --push_to_hub                     
10/09/2024 19:14:52 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'base_shift', 'max_shift', 'use_dynamic_shifting', 'max_image_seq_len', 'base_image_seq_len'} was not found in config. Values will be initialized to default values.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 288.50it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:22<00:00, 71.45s/it]
{'mid_block_add_attention'} was not found in config. Values will be initialized to default values.
10/09/2024 19:19:14 - INFO - __main__ - ***** Running training *****
10/09/2024 19:19:14 - INFO - __main__ -   Num examples = 4
10/09/2024 19:19:14 - INFO - __main__ -   Num batches each epoch = 4
10/09/2024 19:19:14 - INFO - __main__ -   Num Epochs = 500
10/09/2024 19:19:14 - INFO - __main__ -   Instantaneous batch size per device = 1
10/09/2024 19:19:14 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
10/09/2024 19:19:14 - INFO - __main__ -   Gradient Accumulation steps = 4
10/09/2024 19:19:14 - INFO - __main__ -   Total optimization steps = 500
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 312.30it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:22<00:00, 71.10s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:22<00:00, 70.46s/itLoaded tokenizer_3 as T5TokenizerFast from `tokenizer_3` subfolder of stabilityai/stable-diffusion-3-medium-diffusers.                                                                                                                                          | 0/9 [00:00<?, ?it/s]                                                                                                                                                                                                                                                                                    Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-3-medium-diffusers.                                                                                                                                    | 1/9 [00:00<00:02,  2.75it/s]Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-3-medium-diffusers.
                                                                                                                                                                                                                                                                                    {'base_shift', 'max_shift', 'use_dynamic_shifting', 'max_image_seq_len', 'base_image_seq_len'} was not found in config. Values will be initialized to default values.█████████████                                                                      | 6/9 [00:00<00:00, 14.12it/s]
Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-3-medium-diffusers.
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 16.60it/s]
10/09/2024 19:22:07 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: A photo of a female model wearing green floral dress.
Steps:   0%|▍                                                                                                                                                                                                                   | 1/500 [04:08<24:06,  2.90s/it, loss=0.176, lr=Steps:   0%|▍                                                                                                                                                                                                                   | 1/500 [04:08<24:06,  2.90s/it, loss=0.301, lr=Steps:   0%|▍                                                                                                                                                                                                                   | 1/500 [04:08<24:06,  2.90s/it, loss=0.134, lr=1e-5]Traceback (most recent call last):                                             
  File "/home/ubuntu/code/yafeimao_code/diffusers_new/diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py", line 1872, in <module>
    main(args)                                                       
  File "/home/ubuntu/code/yafeimao_code/diffusers_new/diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py", line 1727, in main     
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)  
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py", line 2346, in clip_grad_norm_    
    self.unscale_gradients()                                         
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py", line 2290, in unscale_gradients  
    self.scaler.unscale_(opt)                                        
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_        
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(  
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")        
ValueError: Attempting to unscale FP16 gradients.                    
Steps:   0%|▍                                                                                                                                                                                                               | 1/500 [04:09<34:32:16, 249.17s/it, loss=0.134, lr=1e-5]
Traceback (most recent call last):
  File "/home/ubuntu/code/yafeimao_env/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1174, in launch_command
    simple_launcher(args)
  File "/home/ubuntu/code/yafeimao_env/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/code/yafeimao_env/diffusers/bin/python3.9', 'train_dreambooth_lora_sd3.py', '--mixed_precision=fp16', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--instance_data_dir=/home/ubuntu/code/yafeimao_code/diffusers/images/green_floral_dress', '--output_dir=trained-sd3-lora', '--instance_prompt=a photo of green floral dress', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-5', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of a female model wearing green floral dress', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

Can you help? Thanks!




### Reproduction

I was using [this script](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_sd3.py)

And my command:

accelerate launch train_dreambooth_lora_sd3.py \
  --mixed_precision="fp16" \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of green floral dress" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of a female model wearing green floral dress" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

### Logs

_No response_

### System Info

Ubuntu 20.04

NVIDIA A10G Single GPU

And my pip list


Package                 Version      Editable project location
----------------------- ------------ -----------------------------------------
absl-py                 2.1.0
accelerate              0.34.2
bitsandbytes            0.44.1
certifi                 2024.8.30
charset-normalizer      3.3.2
diffusers               0.31.0.dev0  /home/ubuntu/code/yafeimao_code/diffusers
filelock                3.16.0
fsspec                  2024.9.0
ftfy                    6.2.3
grpcio                  1.66.2
huggingface-hub         0.24.6
idna                    3.8
importlib_metadata      8.4.0
Jinja2                  3.1.4
Markdown                3.7
MarkupSafe              3.0.1
mpmath                  1.3.0
networkx                3.2.1
numpy                   1.24.1
packaging               24.1
peft                    0.11.1
pillow                  10.4.0
pip                     24.2
protobuf                5.28.2
psutil                  6.0.0
PyYAML                  6.0.2
regex                   2024.7.24
requests                2.32.3
safetensors             0.4.5
sentencepiece           0.2.0
setuptools              73.0.1
six                     1.16.0
sympy                   1.12
tensorboard             2.18.0
tensorboard-data-server 0.7.2
tokenizers              0.20.0
torch                   2.1.2+cu121
torchaudio              2.1.2+cu121
torchvision             0.16.2+cu121
tqdm                    4.66.5
transformers            4.45.2
triton                  2.1.0
typing_extensions       4.12.2
urllib3                 2.2.2
wcwidth                 0.2.13
Werkzeug                3.0.4
wheel                   0.44.0
zipp                    3.20.1


### Who can help?

@yiyixuxu @sayakpaul 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: Attempting to unscale FP16 gradients. when using examples/dreambooth/train_dreambooth_lora_sd3.py #9628

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError: Attempting to unscale FP16 gradients. when using examples/dreambooth/train_dreambooth_lora_sd3.py #9628

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions