Skip to content

Dreambooth: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul #1082

@enn-nafnlaus

Description

@enn-nafnlaus

Describe the bug

Hi - I've spent a couple days trying to get Dreambooth to run, and can't get past this:

_Steps: 0%| | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py", line 765, in
main()
File "/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py", line 712, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1673, in forward
loss = self.module(*inputs, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 287, in forward
emb = self.time_embedding(t_emb)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/embeddings.py", line 75, in forward
sample = self.linear_1(sample)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in call_impl
return forward_call(*input, **kwargs)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())
Steps: 0%| | 0/800 [00:00<?, ?it/s]
[2022-10-31 12:46:24,888] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 711745
[2022-10-31 12:46:24,889] [ERROR] [launch.py:292:sigkill_handler] ['/home/stablediffusion/.conda/envs/diffusers/bin/python', '-u', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--instance_data_dir=training/dataset', '--class_data_dir=classes', '--output_dir=output', '--instance_prompt=MyObject dragon', '--class_prompt=dragon', '--seed=3434554', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--sample_batch_size=4', '--max_train_steps=800'] exits with return code = 1
Traceback (most recent call last):
File "/home/stablediffusion/.conda/envs/diffusers/bin/accelerate", line 8, in
sys.exit(main())
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 827, in launch_command
deepspeed_launcher(args)
File "/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 540, in deepspeed_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['deepspeed', '--no_local_rank', '--num_gpus', '1', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--instance_data_dir=training/dataset', '--class_data_dir=classes', '--output_dir=output', '--instance_prompt=MyObject dragon', '--class_prompt=dragon', '--seed=3434554', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.

I can run other CUDA apps just fine. No other GPU-using apps are running.

Reproduction

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="training/dataset"
export CLASS_DIR="classes"
export OUTPUT_DIR="output"

accelerate launch train_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--class_data_dir=$CLASS_DIR
--output_dir=$OUTPUT_DIR
--instance_prompt="MyObject dragon"
--class_prompt="dragon"
--seed=3434554
--resolution=512
--center_crop
--train_batch_size=1
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1 --gradient_checkpointing
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=100
--sample_batch_size=4
--max_train_steps=800

Logs

See above.

System Info

  • diffusers version: 0.7.0.dev0
  • Platform: Linux-5.19.16-200.fc36.x86_64-x86_64-with-glibc2.35
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.13.0+cu116 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

GPU is a RTX 3060 (12GB), hence the need to limit memory usage.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions