-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train_text_to_image.py | RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16' #3453
Comments
looks like this is a transformers issue and can be fixed like this cc @patrickvonplaten |
Actually I think we could redirect this issue directly to PyTorch. Don't think we should solve this in every model in |
Was this issue introduced in torch 2.0.1? It just "appeared" in my Dreambooth app, and I'd definitely like to fix it until it's solved upstream. |
Is this a new problem? it looks like That did seem to have just been fixed last week though: pytorch/pytorch#101414 |
I have faced the same problem when I make an inference with StableDiffusionPipeline (v2.1) with bfloat16 type |
Maybe you need to install PyTorch from source to ensure the support is reflected in your installation? |
huggingface/transformers#23942 should fix this issue |
Describe the bug
When running train_text_to_image.py, setting --mixed_precision="bf16" causes an error in the transformers clip model. I am opening this here as I am not sure how to reproduce this from the transformers repo.
Reproduction
#!/bin/bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="bf16" train_text_to_image.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$dataset_name
--use_ema
--resolution=512 --center_crop --random_flip
--train_batch_size=1
--gradient_accumulation_steps=4
--gradient_checkpointing
--max_train_steps=15000
--learning_rate=1e-05
--max_grad_norm=1
--lr_scheduler="constant" --lr_warmup_steps=0
--output_dir="sd-pokemon-model"
Logs
System Info
diffusers version: 0.17.0.dev0
Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
Python version: 3.9.16
PyTorch version (GPU?): 1.13.1 (True)
Huggingface_hub version: 0.14.1
Transformers version: 4.30.0.dev0
Accelerate version: 0.20.0.dev0
xFormers version: 0.0.19
Using GPU in script?: Yes (2 A100)
Using distributed or parallel set-up in script?: Yes
The text was updated successfully, but these errors were encountered: