Llama: always convert the causal mask in the SDPA code path #29663

gante · 2024-03-14T20:24:07Z

What does this PR do?

Removes the if condition to apply the used-defined attention_mask on causal_mask: it is not required for correctness, and it prevents correct left-padding behavior in compile mode (related PR: #29374).

I could not observe any performance degradation with eager dynamic cache nor with compiled static cache 🙌

HuggingFaceDocBuilderDev · 2024-03-14T20:47:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fxmarty

LGTM

amyeroberts

Thanks!

gante requested a review from fxmarty March 14, 2024 20:24

fxmarty approved these changes Mar 18, 2024

View reviewed changes

gante requested a review from amyeroberts March 19, 2024 16:50

gante added 2 commits March 19, 2024 16:51

always convert the mask

a90da39

rebase and fix copies

99edde5

gante force-pushed the always_convert_mask branch from 7fbcb22 to 99edde5 Compare March 19, 2024 16:53

amyeroberts approved these changes Mar 21, 2024

View reviewed changes

gante merged commit ee38fc3 into huggingface:main Mar 21, 2024
19 checks passed

gante deleted the always_convert_mask branch March 21, 2024 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama: always convert the causal mask in the SDPA code path #29663

Llama: always convert the causal mask in the SDPA code path #29663

gante commented Mar 14, 2024 •

edited

HuggingFaceDocBuilderDev commented Mar 14, 2024

fxmarty left a comment

amyeroberts left a comment

Llama: always convert the causal mask in the SDPA code path #29663

Llama: always convert the causal mask in the SDPA code path #29663

Conversation

gante commented Mar 14, 2024 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 14, 2024

fxmarty left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

gante commented Mar 14, 2024 •

edited