Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral with flash attention 2 and right padding #26877

Closed
4 tasks
dakinggg opened this issue Oct 17, 2023 · 9 comments · Fixed by #26912 or huggingface/trl#1290
Closed
4 tasks

Mistral with flash attention 2 and right padding #26877

dakinggg opened this issue Oct 17, 2023 · 9 comments · Fixed by #26912 or huggingface/trl#1290

Comments

@dakinggg
Copy link
Contributor

System Info

  • transformers version: 4.34.0
  • Platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.31
  • Python version: 3.10.13
  • Huggingface_hub version: 0.17.3
  • Safetensors version: 0.4.0
  • Accelerate version: 0.20.3
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Who can help?

@younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

If you run a batch through mistral with flash attention 2 with right padding, you get

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to  call `tokenizer.padding_side  = 'left'` before tokenizing the 
input.

I am not doing generation, just calling forward. Is the error message incorrect and you actually meant to prevent all usage of right padding here? Or is the implementation wrong and this was meant to only prevent generate usage of right padding? Or perhaps I am missing something else. Thanks!

Expected behavior

Either right padding is ok for calling forward, or the error message correctly states the problem.

@imraviagrawal
Copy link

Having the same issues
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

@ArthurZucker
Copy link
Collaborator

cc @younesbelkada

@younesbelkada
Copy link
Contributor

Indeed forward should be supported but not generation, will raise a patch soon for this

@hengjiUSTC
Copy link

hengjiUSTC commented Jan 10, 2024

Hey @younesbelkada, I am still seeing this error after set tokenizer.padding_side = 'left'
This is my demo notebook: https://colab.research.google.com/drive/1sVqbYEOqjJYl7CzNzXzviEBB6A984cMq?usp=sharing

Tokenizer already set with left padding
截屏2024-01-10 下午10 53 07

Still have: ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
截屏2024-01-10 下午10 54 31

Not sure is this because of trl or something wrong within transformer?

transformers 4.36.2
trl 0.7.7
peft 0.6.0

@wenhuchen
Copy link

I am having the same issue. Even after I set the tokenizer padding side = left, this error still occurs during training.

@younesbelkada
Copy link
Contributor

Thanks everyone for reporting, might be an issue with TRL I think, let me have a deeper look and get back ASAP

@hengjiUSTC
Copy link

Thanks everyone for reporting, might be an issue with TRL I think, let me have a deeper look and get back ASAP

I opened a issue in trl as well: huggingface/trl#1217 (comment)

@arkapal3
Copy link

You need to set use_cache = False for both the main and reference model. See my comment here: huggingface/trl#1217 (comment)

@nguyen-brat
Copy link

You need to set use_cache = False for both the main and reference model. See my comment here: huggingface/trl#1217 (comment)

I try your solution and it work like a charm but does set use_cache to False make tokenizer.padding_side = 'left' during evaluation and 'right' during training. I read the doc about use cache here: link but seem like it just reduce the inference time. Can you explain why it work like a charm sir ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants