Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flash Attention 2.0 for T5 Family #27441

Open
jkswin opened this issue Nov 10, 2023 · 3 comments
Open

Add Flash Attention 2.0 for T5 Family #27441

jkswin opened this issue Nov 10, 2023 · 3 comments

Comments

@jkswin
Copy link

jkswin commented Nov 10, 2023

Encountered the following when trying to incorporate Flash attention into a previously devved byt5-small finetuning script.

Code to produce:

from transformers import T5ForConditionalGeneration, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForSeq2Seq

model_path = "google/byt5-small"
model = T5ForConditionalGeneration.from_pretrained(model_path,
                                                   use_flash_attention_2=True,
                                                   )

Error:

ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
@strankid
Copy link

Is this issue open for contribution?

@sbrunk
Copy link
Contributor

sbrunk commented Feb 19, 2024

The main Flash Attention 2 discussion tracking issue and discussion seems to be #26350

@Ingvarstep
Copy link

For anyone who is interested in optimized T5 version, I just finished my project on creating flash attention version with fused attention bias calculation. It allows to fix the major drawbacks of T5 and allow to run it on 100k sequences on single L4 GPU (22.5 GB). Check it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants