You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Encountered the following when trying to incorporate Flash attention into a previously devved byt5-small finetuning script.
Code to produce:
from transformers import T5ForConditionalGeneration, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForSeq2Seq
model_path = "google/byt5-small"
model = T5ForConditionalGeneration.from_pretrained(model_path,
use_flash_attention_2=True,
)
Error:
ValueError: The current architecture does not support Flash Attention 2.0. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
The text was updated successfully, but these errors were encountered:
For anyone who is interested in optimized T5 version, I just finished my project on creating flash attention version with fused attention bias calculation. It allows to fix the major drawbacks of T5 and allow to run it on 100k sequences on single L4 GPU (22.5 GB). Check it here.
Encountered the following when trying to incorporate Flash attention into a previously devved byt5-small finetuning script.
Code to produce:
Error:
The text was updated successfully, but these errors were encountered: