New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require less alignment for attn bias (#114173) #114837
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114837
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6009c5e with merge base 138e289 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Failure are a bad cherry pick I will fix up this morning |
@@ -6441,6 +6441,52 @@ def fn_or(x, y): | |||
(torch.randn(32), torch.randn(32)), | |||
) | |||
|
|||
@requires_cuda() | |||
@unittest.skipIf( | |||
not PLATFORM_SUPPORTS_FUSED_SDPA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On main PR this is called: PLATFORM_SUPPORTS_MEM_EFF_ATTENTION
, why on cherry pick we name this as : PLATFORM_SUPPORTS_FUSED_SDPA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag was added inbetween releases and is not in this cherry pick. That being said PLATFORM_SUPPORTS_FUSED_SDPA is equivalent to PLATFORM_SUPPORTS_MEM_EFF_ATTENTION in 2.1 release
Merge in this into this patch: |
Improved Fix for Attention Mask Alignment Issue (pytorch#112577) This PR addresses Issue pytorch#112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention. Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users. Should this be warn_once? We only call expand, once on the aligned mask. Reference https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115 @albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki. Pull Request resolved: pytorch#114173 Approved by: https://github.com/danthe3rd
Improved Fix for Attention Mask Alignment Issue (#112577)
This PR addresses Issue #112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention.
Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users.
Should this be warn_once?
We only call expand, once on the aligned mask.
Reference
https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115
@albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki.
Pull Request resolved: #114173
Approved by: https://github.com/danthe3rd
Fixes #ISSUE_NUMBER
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler