Skip to content

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented Oct 11, 2023

Summary:

This pull request removes support for non-square sequence lengths in causal attention when using FlashAttention V2.

Why are doing this

// FlashAttention 2 updated the default mask meaning for causal in this PR:
// 9e5e8bc91e it is now aligned to lower_right which would be a BC break
// for non-square masks. We will not support non-square masks for causal w/ FAV2

For more context see:
#108108

Followup

A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : #110681

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 11, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111007

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit be9bb0e with merge base 652f4c6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@drisspg drisspg marked this pull request as ready for review October 19, 2023 18:23
@drisspg drisspg force-pushed the disable_flash_on_seqlen_q_not_equal_kv branch from 5dfadec to 044a820 Compare October 19, 2023 18:38
@drisspg drisspg requested a review from cpuhrsch October 19, 2023 20:27
@drisspg drisspg added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 19, 2023
@drisspg drisspg force-pushed the disable_flash_on_seqlen_q_not_equal_kv branch from e9f3845 to be9bb0e Compare October 20, 2023 01:24
@drisspg
Copy link
Contributor Author

drisspg commented Oct 20, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@drisspg
Copy link
Contributor Author

drisspg commented Oct 20, 2023

@pytorchbot merge -f "unrelated failures"

@drisspg
Copy link
Contributor Author

drisspg commented Oct 23, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 25, 2023
# Summary
We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: #108108

Prior to this PR we landed: #111007 which enabled us to updated beyond:
9e5e8bc91e on FlashAttentionV2.

With this PR we have updated to this commit:
Dao-AILab/flash-attention@02ac572.
Or Tag 2.3.2

## Plans
Following this PR I plan to work more on #110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask.

Pull Request resolved: #111886
Approved by: https://github.com/cpuhrsch
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…ytorch#111007)

# Summary:
This pull request **removes** support for non-square sequence lengths in causal attention when using FlashAttention V2.

### Why are doing this
  // FlashAttention 2 updated the default mask meaning for causal in this PR:
  // 9e5e8bc91e it is now aligned to lower_right which would be a BC break
  // for non-square masks. We will not support non-square masks for causal w/ FAV2

 For more context see:
 pytorch#108108

 ### Followup
 A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : pytorch#110681

Pull Request resolved: pytorch#111007
Approved by: https://github.com/cpuhrsch
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
# Summary
We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: pytorch#108108

Prior to this PR we landed: pytorch#111007 which enabled us to updated beyond:
9e5e8bc91e on FlashAttentionV2.

With this PR we have updated to this commit:
Dao-AILab/flash-attention@02ac572.
Or Tag 2.3.2

## Plans
Following this PR I plan to work more on pytorch#110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask.

Pull Request resolved: pytorch#111886
Approved by: https://github.com/cpuhrsch
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ytorch#111007)

# Summary:
This pull request **removes** support for non-square sequence lengths in causal attention when using FlashAttention V2.

### Why are doing this
  // FlashAttention 2 updated the default mask meaning for causal in this PR:
  // 9e5e8bc91e it is now aligned to lower_right which would be a BC break
  // for non-square masks. We will not support non-square masks for causal w/ FAV2

 For more context see:
 pytorch#108108

 ### Followup
 A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : pytorch#110681

Pull Request resolved: pytorch#111007
Approved by: https://github.com/cpuhrsch
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
# Summary
We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: pytorch#108108

Prior to this PR we landed: pytorch#111007 which enabled us to updated beyond:
9e5e8bc91e on FlashAttentionV2.

With this PR we have updated to this commit:
Dao-AILab/flash-attention@02ac572.
Or Tag 2.3.2

## Plans
Following this PR I plan to work more on pytorch#110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask.

Pull Request resolved: pytorch#111886
Approved by: https://github.com/cpuhrsch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants