Disable FlashAttenion for is_causal=True when seqlen q not equal kv #111007

drisspg · 2023-10-11T00:30:55Z

Summary:

This pull request removes support for non-square sequence lengths in causal attention when using FlashAttention V2.

Why are doing this

// FlashAttention 2 updated the default mask meaning for causal in this PR:
// 9e5e8bc91e it is now aligned to lower_right which would be a BC break
// for non-square masks. We will not support non-square masks for causal w/ FAV2

For more context see:
#108108

Followup

A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : #110681

pytorch-bot · 2023-10-11T00:30:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111007

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit be9bb0e with merge base 652f4c6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2023-10-20T03:11:00Z

@pytorchbot merge

pytorchmergebot · 2023-10-20T03:14:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-10-20T03:14:25Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-docs / build-docs-python-false

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

drisspg · 2023-10-20T14:07:25Z

@pytorchbot merge -f "unrelated failures"

drisspg · 2023-10-23T20:31:05Z

@pytorchbot merge

pytorchmergebot · 2023-10-23T20:33:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Summary We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: #108108 Prior to this PR we landed: #111007 which enabled us to updated beyond: 9e5e8bc91e on FlashAttentionV2. With this PR we have updated to this commit: Dao-AILab/flash-attention@02ac572. Or Tag 2.3.2 ## Plans Following this PR I plan to work more on #110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask. Pull Request resolved: #111886 Approved by: https://github.com/cpuhrsch

…ytorch#111007) # Summary: This pull request **removes** support for non-square sequence lengths in causal attention when using FlashAttention V2. ### Why are doing this // FlashAttention 2 updated the default mask meaning for causal in this PR: // 9e5e8bc91e it is now aligned to lower_right which would be a BC break // for non-square masks. We will not support non-square masks for causal w/ FAV2 For more context see: pytorch#108108 ### Followup A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : pytorch#110681 Pull Request resolved: pytorch#111007 Approved by: https://github.com/cpuhrsch

# Summary We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: pytorch#108108 Prior to this PR we landed: pytorch#111007 which enabled us to updated beyond: 9e5e8bc91e on FlashAttentionV2. With this PR we have updated to this commit: Dao-AILab/flash-attention@02ac572. Or Tag 2.3.2 ## Plans Following this PR I plan to work more on pytorch#110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask. Pull Request resolved: pytorch#111886 Approved by: https://github.com/cpuhrsch

…ytorch#111007) # Summary: This pull request **removes** support for non-square sequence lengths in causal attention when using FlashAttention V2. ### Why are doing this // FlashAttention 2 updated the default mask meaning for causal in this PR: // 9e5e8bc91e it is now aligned to lower_right which would be a BC break // for non-square masks. We will not support non-square masks for causal w/ FAV2 For more context see: pytorch#108108 ### Followup A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : pytorch#110681 Pull Request resolved: pytorch#111007 Approved by: https://github.com/cpuhrsch

# Summary We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: pytorch#108108 Prior to this PR we landed: pytorch#111007 which enabled us to updated beyond: 9e5e8bc91e on FlashAttentionV2. With this PR we have updated to this commit: Dao-AILab/flash-attention@02ac572. Or Tag 2.3.2 ## Plans Following this PR I plan to work more on pytorch#110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask. Pull Request resolved: pytorch#111886 Approved by: https://github.com/cpuhrsch

drisspg marked this pull request as ready for review October 19, 2023 18:23

drisspg force-pushed the disable_flash_on_seqlen_q_not_equal_kv branch from 5dfadec to 044a820 Compare October 19, 2023 18:38

drisspg added module: multi-headed-attention topic: not user facing topic category labels Oct 19, 2023

drisspg requested a review from cpuhrsch October 19, 2023 20:27

drisspg added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 19, 2023

cpuhrsch approved these changes Oct 19, 2023

View reviewed changes

drisspg added 5 commits October 19, 2023 18:13

implement and test

2145ccf

remove old test

9605e49

I want to figure out a better way to avoid having so many skips

e6f3bf5

use sym_size

63f9022

fix flops test

be9bb0e

drisspg force-pushed the disable_flash_on_seqlen_q_not_equal_kv branch from e9f3845 to be9bb0e Compare October 20, 2023 01:24

pytorchmergebot added the merging label Oct 20, 2023

pytorchmergebot removed the merging label Oct 20, 2023

pytorchmergebot added the merging label Oct 23, 2023

pytorchmergebot added the Merged label Oct 23, 2023

pytorchmergebot removed the merging label Oct 23, 2023

pytorchmergebot closed this in e509b16 Oct 23, 2023

drisspg mentioned this pull request Oct 24, 2023

Update FlashAttentionV2 kernel to 02ac572 #111886

Closed

ydshieh mentioned this pull request Jun 28, 2024

[BC BREAKING] Change default behavior of scaled_dot_product_attention's causal masking alignment #108108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable FlashAttenion for is_causal=True when seqlen q not equal kv #111007

Disable FlashAttenion for is_causal=True when seqlen q not equal kv #111007

Uh oh!

drisspg commented Oct 11, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 11, 2023 •

edited

Loading

Uh oh!

drisspg commented Oct 20, 2023

Uh oh!

pytorchmergebot commented Oct 20, 2023

Uh oh!

pytorchmergebot commented Oct 20, 2023

Uh oh!

drisspg commented Oct 20, 2023

Uh oh!

drisspg commented Oct 23, 2023

Uh oh!

pytorchmergebot commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Disable FlashAttenion for is_causal=True when seqlen q not equal kv #111007

Disable FlashAttenion for is_causal=True when seqlen q not equal kv #111007

Uh oh!

Conversation

drisspg commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Why are doing this

Followup

Uh oh!

pytorch-bot bot commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111007

✅ No Failures

Uh oh!

drisspg commented Oct 20, 2023

Uh oh!

pytorchmergebot commented Oct 20, 2023

Merge started

Uh oh!

pytorchmergebot commented Oct 20, 2023

Merge failed

Uh oh!

drisspg commented Oct 20, 2023

Uh oh!

drisspg commented Oct 23, 2023

Uh oh!

pytorchmergebot commented Oct 23, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drisspg commented Oct 11, 2023 •

edited

Loading

pytorch-bot bot commented Oct 11, 2023 •

edited

Loading