[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X #26463

younesbelkada · 2023-09-28T10:19:38Z

What does this PR do?

Adds flash attention support for GPT-Neo-X

Fixes: #26444

HuggingFaceDocBuilderDev · 2023-09-28T10:39:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker

LGTM but left a few nits

ArthurZucker · 2023-09-29T06:44:34Z

src/transformers/models/gpt_neox/modeling_gpt_neox.py

+            query = query.to(torch.float16)
+            key = key.to(torch.float16)
+            value = value.to(torch.float16)
+


We should take into account bfloat16 here as well

ArthurZucker · 2023-09-29T06:46:14Z

src/transformers/models/gpt_neox/modeling_gpt_neox.py

+        # In PEFT, usually we cast the layer norms in float32 for training stability reasons
+        # therefore the input hidden states gets silently casted in float32. Hence, we need
+        # cast them back in float16 just to be sure everything works as expected.
+        # This might slowdown training & inference so it is recommended to not cast the LayerNorms
+        # in fp32. (LlamaRMSNorm handles it correctly)


Is this also true for GPTNeoX? (Comment is the same as Llama 😓 )

btrude · 2023-11-13T05:52:29Z

Any plans on completing this or should someone else pick it up? For what it's worth, this implementation is working very well for me 👍

younesbelkada · 2023-11-23T10:41:48Z

cc @amyeroberts let me know if I need to address anything else in this PR!

avnermay · 2023-11-27T00:01:49Z

Checking on the progress here. What's the ETA on merging this with the main branch? Thanks!

amyeroberts

LGTM - thanks for adding!

Just needs a performance example to be added to the docs before merging

docs/source/en/model_doc/gpt_neox.md

src/transformers/models/gpt_neox/modeling_gpt_neox.py

docs/source/en/model_doc/gpt_neox.md

younesbelkada added 3 commits September 28, 2023 12:16

add flash-attn-2 support for GPT-neo-x

6328de8

fixup

e66f1df

add comment

db4bfd4

younesbelkada mentioned this pull request Sep 28, 2023

Community contribution: Adding Flash Attention 2 support for more architectures #26350

Open

24 tasks

younesbelkada requested review from ArthurZucker and LysandreJik September 28, 2023 13:53

ArthurZucker approved these changes Sep 29, 2023

View reviewed changes

huggingface deleted a comment from github-actions bot Oct 29, 2023

younesbelkada and others added 4 commits November 13, 2023 11:17

Merge branch 'main' into add-flash-neo-x

a17f0fe

revert

3018072

fixes

7448384

update docs

7cf63d7

younesbelkada requested a review from amyeroberts November 13, 2023 10:40

younesbelkada added 2 commits November 13, 2023 11:41

comment

7f14a13

again

c4ecc93

Merge remote-tracking branch 'upstream/main' into add-flash-neo-x

ed535af

amyeroberts approved these changes Nov 27, 2023

View reviewed changes

docs/source/en/model_doc/gpt_neox.md Show resolved Hide resolved

src/transformers/models/gpt_neox/modeling_gpt_neox.py Show resolved Hide resolved

younesbelkada added 3 commits December 6, 2023 16:26

Merge remote-tracking branch 'upstream/main' into add-flash-neo-x

0ec972a

fix copies

93fe356

add plot + fix copies

073ae3f

younesbelkada commented Dec 6, 2023

View reviewed changes

docs/source/en/model_doc/gpt_neox.md Outdated Show resolved Hide resolved

Update docs/source/en/model_doc/gpt_neox.md

41c9c7d

younesbelkada merged commit 9270ab0 into huggingface:main Dec 6, 2023
18 checks passed

younesbelkada deleted the add-flash-neo-x branch December 6, 2023 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X #26463

[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X #26463

younesbelkada commented Sep 28, 2023

HuggingFaceDocBuilderDev commented Sep 28, 2023

ArthurZucker left a comment

ArthurZucker Sep 29, 2023

ArthurZucker Sep 29, 2023

btrude commented Nov 13, 2023

younesbelkada commented Nov 23, 2023

avnermay commented Nov 27, 2023

amyeroberts left a comment

[Flash Attention 2] Add flash attention 2 for GPT-Neo-X #26463

[Flash Attention 2] Add flash attention 2 for GPT-Neo-X #26463

Conversation

younesbelkada commented Sep 28, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 28, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Sep 29, 2023

Choose a reason for hiding this comment

ArthurZucker Sep 29, 2023

Choose a reason for hiding this comment

btrude commented Nov 13, 2023

younesbelkada commented Nov 23, 2023

avnermay commented Nov 27, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X #26463

[`Flash Attention 2`] Add flash attention 2 for GPT-Neo-X #26463