FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time #30317

younesbelkada · 2024-04-18T12:43:09Z

What does this PR do?

Fixes a silent behaviour introduced by a recent PR, passing a None attention mask results in unexpected behaviour for awq fused modules. The fix is simply to force set a dummy _attn_implementation on the config objects of the modules that contain fused modules

I can confirm the failing slow tests now pass with these changes

cc @ArthurZucker @fxmarty

src/transformers/integrations/awq.py

HuggingFaceDocBuilderDev · 2024-04-18T13:13:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/modeling_attn_mask_utils.py

ArthurZucker

Let's make sure to run the slow Llama tests at least before merging to iterate fast if they fail

ArthurZucker · 2024-04-18T13:28:32Z

src/transformers/integrations/awq.py

+    # For AWQ fused + Llama we need to set `config._attn_implementation` = "custom" to avoid unexpected behavior and pass
+    # `None` attention mask to the fused attention modules as now the attention mask is dropped by our models and dealt
+    # by the `AttentionMaskConverter` module.


Suggested change

# For AWQ fused + Llama we need to set `config._attn_implementation` = "custom" to avoid unexpected behavior and pass

# `None` attention mask to the fused attention modules as now the attention mask is dropped by our models and dealt

# by the `AttentionMaskConverter` module.

# For AWQ fused + Llama we need to set `config._attn_implementation` = "custom" to avoid unexpected behaviors. We loop over the layers to make sure the vision/text config are

# modified only if some of their modules were fused

nit but more understandable

fxmarty · 2024-04-18T13:39:06Z

src/transformers/modeling_attn_mask_utils.py

+            if sliding_window is None or key_value_length < sliding_window:
+                ignore_causal_mask = not is_tracing


pls apply this fix here: https://github.com/huggingface/transformers/pull/30311/files

fxmarty

Please apply the above fix and LGTM

If you can run

from transformers import WhisperForCausalLM, WhisperForConditionalGeneration, WhisperProcessor
import torch
from datasets import load_dataset

processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")

assistant_model = WhisperForCausalLM.from_pretrained("distil-whisper/distil-large-v2")

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
input_features = processor(
    sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features, assistant_model=assistant_model)

# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

& llava AWQ test & llama slow test to be sure as well

younesbelkada · 2024-04-18T13:49:20Z

Thanks @fxmarty @ArthurZucker ! just tested everything seems to all pass !

… + revert huggingface#30070 at the same time (huggingface#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

… + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

Update awq.py

784c377

younesbelkada requested review from fxmarty and ArthurZucker April 18, 2024 12:43

style

6a6c001

ArthurZucker reviewed Apr 18, 2024

View reviewed changes

src/transformers/integrations/awq.py Show resolved Hide resolved

src/transformers/integrations/awq.py Outdated Show resolved Hide resolved

revert felix PR

036d445

fxmarty reviewed Apr 18, 2024

View reviewed changes

src/transformers/modeling_attn_mask_utils.py Outdated Show resolved Hide resolved

fxmarty reviewed Apr 18, 2024

View reviewed changes

src/transformers/modeling_attn_mask_utils.py Outdated Show resolved Hide resolved

fix

4b46f5c

younesbelkada changed the title ~~FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules~~ FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time Apr 18, 2024

younesbelkada requested review from fxmarty and ArthurZucker April 18, 2024 13:26

ArthurZucker approved these changes Apr 18, 2024

View reviewed changes

fxmarty reviewed Apr 18, 2024

View reviewed changes

fxmarty approved these changes Apr 18, 2024

View reviewed changes

add felix comments

c7681e6

LysandreJik merged commit 5728b5a into main Apr 18, 2024
16 of 21 checks passed

LysandreJik deleted the younesbelkada-patch-1 branch April 18, 2024 13:51

poedator mentioned this pull request Apr 19, 2024

Llama: fix custom 4D masks, v2 #30348

Merged

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules…

8b45481

… + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

ydshieh pushed a commit that referenced this pull request Apr 23, 2024

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules…

bf0fe25

… + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

gante mentioned this pull request Apr 23, 2024

Llama: SDPA FA2 path + static cache fix #30437

Closed

itazap pushed a commit that referenced this pull request May 14, 2024

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules…

d66011f

… + revert #30070 at the same time (#30317) * Update awq.py * style * revert felix PR * fix * add felix comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time #30317

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time #30317

younesbelkada commented Apr 18, 2024 •

edited

HuggingFaceDocBuilderDev commented Apr 18, 2024

ArthurZucker left a comment

ArthurZucker Apr 18, 2024

ArthurZucker Apr 18, 2024

fxmarty Apr 18, 2024

fxmarty left a comment •

edited

younesbelkada commented Apr 18, 2024

		if sliding_window is None or key_value_length < sliding_window:
		ignore_causal_mask = not is_tracing

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time #30317

FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time #30317

Conversation

younesbelkada commented Apr 18, 2024 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 18, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Apr 18, 2024

Choose a reason for hiding this comment

ArthurZucker Apr 18, 2024

Choose a reason for hiding this comment

fxmarty Apr 18, 2024

Choose a reason for hiding this comment

fxmarty left a comment • edited

Choose a reason for hiding this comment

younesbelkada commented Apr 18, 2024

younesbelkada commented Apr 18, 2024 •

edited

fxmarty left a comment •

edited