Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require less alignment for attn bias (#114173) #114837

Merged
merged 1 commit into from Dec 5, 2023

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented Nov 30, 2023

Improved Fix for Attention Mask Alignment Issue (#112577)

This PR addresses Issue #112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention.

Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users.

Should this be warn_once?

We only call expand, once on the aligned mask.

Reference
https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115

@albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki.

Pull Request resolved: #114173
Approved by: https://github.com/danthe3rd

Fixes #ISSUE_NUMBER

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

Copy link

pytorch-bot bot commented Nov 30, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114837

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6009c5e with merge base 138e289 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@drisspg
Copy link
Contributor Author

drisspg commented Nov 30, 2023

Failure are a bad cherry pick I will fix up this morning

@@ -6441,6 +6441,52 @@ def fn_or(x, y):
(torch.randn(32), torch.randn(32)),
)

@requires_cuda()
@unittest.skipIf(
not PLATFORM_SUPPORTS_FUSED_SDPA,
Copy link
Contributor

@atalman atalman Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main PR this is called: PLATFORM_SUPPORTS_MEM_EFF_ATTENTION, why on cherry pick we name this as : PLATFORM_SUPPORTS_FUSED_SDPA ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag was added inbetween releases and is not in this cherry pick. That being said PLATFORM_SUPPORTS_FUSED_SDPA is equivalent to PLATFORM_SUPPORTS_MEM_EFF_ATTENTION in 2.1 release

@drisspg
Copy link
Contributor Author

drisspg commented Nov 30, 2023

Merge in this into this patch:
#114909

Improved Fix for Attention Mask Alignment Issue (pytorch#112577)

This PR addresses Issue pytorch#112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention.

Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users.

Should this be warn_once?

We only call expand, once on the aligned mask.

Reference
https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115

@albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki.

Pull Request resolved: pytorch#114173
Approved by: https://github.com/danthe3rd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants