Mistral with flash attention 2 and right padding #26877

dakinggg · 2023-10-17T18:17:56Z

System Info

transformers version: 4.34.0
Platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.31
Python version: 3.10.13
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.0
Accelerate version: 0.20.3
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed

Who can help?

@younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

If you run a batch through mistral with flash attention 2 with right padding, you get

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to  call `tokenizer.padding_side  = 'left'` before tokenizing the 
input.

I am not doing generation, just calling forward. Is the error message incorrect and you actually meant to prevent all usage of right padding here? Or is the implementation wrong and this was meant to only prevent generate usage of right padding? Or perhaps I am missing something else. Thanks!

Expected behavior

Either right padding is ok for calling forward, or the error message correctly states the problem.

The text was updated successfully, but these errors were encountered:

imraviagrawal · 2023-10-17T23:48:05Z

Having the same issues
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

ArthurZucker · 2023-10-18T09:41:17Z

cc @younesbelkada

younesbelkada · 2023-10-18T20:02:10Z

Indeed forward should be supported but not generation, will raise a patch soon for this

hengjiUSTC · 2024-01-10T14:57:05Z

Hey @younesbelkada, I am still seeing this error after set tokenizer.padding_side = 'left'
This is my demo notebook: https://colab.research.google.com/drive/1sVqbYEOqjJYl7CzNzXzviEBB6A984cMq?usp=sharing

Tokenizer already set with left padding

Still have: ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

Not sure is this because of trl or something wrong within transformer?

transformers 4.36.2
trl 0.7.7
peft 0.6.0

wenhuchen · 2024-01-11T04:22:00Z

I am having the same issue. Even after I set the tokenizer padding side = left, this error still occurs during training.

younesbelkada · 2024-01-12T05:50:37Z

Thanks everyone for reporting, might be an issue with TRL I think, let me have a deeper look and get back ASAP

hengjiUSTC · 2024-01-12T08:03:32Z

Thanks everyone for reporting, might be an issue with TRL I think, let me have a deeper look and get back ASAP

I opened a issue in trl as well: huggingface/trl#1217 (comment)

arkapal3 · 2024-01-12T14:00:28Z

You need to set use_cache = False for both the main and reference model. See my comment here: huggingface/trl#1217 (comment)

nguyen-brat · 2024-03-16T16:49:22Z

You need to set use_cache = False for both the main and reference model. See my comment here: huggingface/trl#1217 (comment)

I try your solution and it work like a charm but does set use_cache to False make tokenizer.padding_side = 'left' during evaluation and 'right' during training. I read the doc about use cache here: link but seem like it just reduce the inference time. Can you explain why it work like a charm sir ?

younesbelkada mentioned this issue Oct 18, 2023

[FA-2 / Mistral] Supprot fa-2 + right padding + forward #26912

Merged

younesbelkada closed this as completed in #26912 Oct 19, 2023

younesbelkada mentioned this issue Jan 30, 2024

[DPOTrainer] Fix DPO trainer + mistral + FA2 huggingface/trl#1290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral with flash attention 2 and right padding #26877

Mistral with flash attention 2 and right padding #26877

dakinggg commented Oct 17, 2023

imraviagrawal commented Oct 17, 2023

ArthurZucker commented Oct 18, 2023

younesbelkada commented Oct 18, 2023

hengjiUSTC commented Jan 10, 2024 •

edited

Loading

wenhuchen commented Jan 11, 2024

younesbelkada commented Jan 12, 2024

hengjiUSTC commented Jan 12, 2024

arkapal3 commented Jan 12, 2024

nguyen-brat commented Mar 16, 2024

Mistral with flash attention 2 and right padding #26877

Mistral with flash attention 2 and right padding #26877

Comments

dakinggg commented Oct 17, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

imraviagrawal commented Oct 17, 2023

ArthurZucker commented Oct 18, 2023

younesbelkada commented Oct 18, 2023

hengjiUSTC commented Jan 10, 2024 • edited Loading

wenhuchen commented Jan 11, 2024

younesbelkada commented Jan 12, 2024

hengjiUSTC commented Jan 12, 2024

arkapal3 commented Jan 12, 2024

nguyen-brat commented Mar 16, 2024

hengjiUSTC commented Jan 10, 2024 •

edited

Loading