`gpt2` with `output_attentions=True` has different attentions shape between flash and eager #33417

lapp0 · 2024-09-11T03:49:35Z

System Info

transformers version: 4.44.2
Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 4090

Who can help?

@ArthurZucker @gante

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

>>> model_flash = transformers.AutoModelForCausalLM.from_pretrained("gpt2", device_map="cuda", attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16)
>>> model_eager = transformers.AutoModelForCausalLM.from_pretrained("gpt2", device_map="cuda", attn_implementation="eager")

>>> input_ids = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]).to("cuda")

>>> eager_attns = model_eager(input_ids, output_attentions=True).attentions
>>> flash_attns = model_flash(input_ids, output_attentions=True).attentions

>>> len(eager_attns)
12
>>> len(flash_attns)
12

>>> eager_attns[0].shape
torch.Size([3, 12, 4, 4])
>>> flash_attns[0].shape
torch.Size([3, 4, 768])

Expected behavior

output_attentions=True should be result in an error for GPT2FlashAttention2

Additionally I'd like to understand what's being returned here.

The text was updated successfully, but these errors were encountered:

lapp0 added the bug label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gpt2` with `output_attentions=True` has different attentions shape between flash and eager #33417

`gpt2` with `output_attentions=True` has different attentions shape between flash and eager #33417

lapp0 commented Sep 11, 2024 •

edited

Loading

gpt2 with output_attentions=True has different attentions shape between flash and eager #33417

gpt2 with output_attentions=True has different attentions shape between flash and eager #33417

Comments

lapp0 commented Sep 11, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

`gpt2` with `output_attentions=True` has different attentions shape between flash and eager #33417

`gpt2` with `output_attentions=True` has different attentions shape between flash and eager #33417

lapp0 commented Sep 11, 2024 •

edited

Loading