Add Flash Attention 2.0 for T5 Family #27441

jkswin · 2023-11-10T17:43:32Z

Encountered the following when trying to incorporate Flash attention into a previously devved byt5-small finetuning script.

Code to produce:

from transformers import T5ForConditionalGeneration, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForSeq2Seq

model_path = "google/byt5-small"
model = T5ForConditionalGeneration.from_pretrained(model_path,
                                                   use_flash_attention_2=True,
                                                   )

Error:

ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

The text was updated successfully, but these errors were encountered:

strankid · 2023-11-21T02:40:17Z

Is this issue open for contribution?

sbrunk · 2024-02-19T13:40:31Z

The main Flash Attention 2 discussion tracking issue and discussion seems to be #26350

Ingvarstep · 2024-04-25T18:46:05Z

For anyone who is interested in optimized T5 version, I just finished my project on creating flash attention version with fused attention bias calculation. It allows to fix the major drawbacks of T5 and allow to run it on 100k sequences on single L4 GPU (22.5 GB). Check it here.

amyeroberts added Flash Attention New model labels Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flash Attention 2.0 for T5 Family #27441

Add Flash Attention 2.0 for T5 Family #27441

jkswin commented Nov 10, 2023

strankid commented Nov 21, 2023

sbrunk commented Feb 19, 2024

Ingvarstep commented Apr 25, 2024

Add Flash Attention 2.0 for T5 Family #27441

Add Flash Attention 2.0 for T5 Family #27441

Comments

jkswin commented Nov 10, 2023

strankid commented Nov 21, 2023

sbrunk commented Feb 19, 2024

Ingvarstep commented Apr 25, 2024