-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDPA] update type hint for scaled_dot_product_attention and documentation #94008
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94008
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 2a93079: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs to be approved by an authorized maintainer before merge. |
d764c8b
to
e0e1f69
Compare
0c69708
to
86fe908
Compare
\text{attn\_mask} = $torch.ones(L,S, torch.bool, diagonal=0).tril()$ if is\_causal
\text{attn\_mask} = $torch.masked\_fill(!attn\_mask, -\infty)$ if attn\_mask.dtype==torch.bool
\text{attn\_weight} = $\text{torch.softmax}(\frac{QK^T}{\sqrt{d^k}}+attn\_mask)$
\text{attn\_weight} = $torch.dropout(\text{attn\_weight}, dropout_p)$
\text{return} $torch.matmul(\text{attn\_weight},V)$ Curious if we think this latex math should be added to the longish docstring to explain the math fallback |
61983fb
to
36f569f
Compare
Should I add a warn once to warn onces are kind of annoying but we might want to very directly point uses to the docs |
f744c3a
to
55cbd55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I added the usual nitpicky grammar / formatting comments for public-facing docs, nothing major.
FYI there's a style guide here for consistency in formatting module docs. I realize this is not a module, but maybe some of the content there will be useful for maintaining consistency.
6c25dcc
to
d6c8538
Compare
d6c8538
to
5833a6d
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot. |
@pytorchbot merge -f "all checks are passing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary
cc @svekars @carljparker