[SDPA] update type hint for scaled_dot_product_attention and documentation #94008

drisspg · 2023-02-02T22:46:36Z

Summary

Adds type hinting support for SDPA
Updates the documentation adding warnings and notes on the context manager
Adds scaled_dot_product_attention to the non-linear activation function section of nn.functional docs

cc @svekars @carljparker

pytorch-bot · 2023-02-02T22:46:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94008

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 2a93079:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2023-02-06T19:39:39Z

This PR needs to be approved by an authorized maintainer before merge.

drisspg · 2023-02-07T01:00:15Z

\text{attn\_mask} = $torch.ones(L,S, torch.bool, diagonal=0).tril()$ if is\_causal

\text{attn\_mask} = $torch.masked\_fill(!attn\_mask, -\infty)$ if attn\_mask.dtype==torch.bool

\text{attn\_weight} = $\text{torch.softmax}(\frac{QK^T}{\sqrt{d^k}}+attn\_mask)$ 

\text{attn\_weight} = $torch.dropout(\text{attn\_weight}, dropout_p)$

\text{return} $torch.matmul(\text{attn\_weight},V)$

Curious if we think this latex math should be added to the longish docstring to explain the math fallback

drisspg · 2023-02-07T22:39:06Z

Should I add a warn once to scaled_dot_product_efficient_attention, and scaled_dot_product_flash_attention saying:
" Memory-efficeint/FlashAttention SDPA is a beta feature. See the documentation for torch.nn.scaled_dot_product_attention for further information"

warn onces are kind of annoying but we might want to very directly point uses to the docs

torch/nn/functional.py

jbschlosser

Nice work! I added the usual nitpicky grammar / formatting comments for public-facing docs, nothing major.

FYI there's a style guide here for consistency in formatting module docs. I realize this is not a module, but maybe some of the content there will be useful for maintaining consistency.

torch/nn/functional.py

torch/backends/cuda/__init__.py

torch/nn/functional.py

drisspg · 2023-02-10T01:03:05Z

@pytorchbot merge

pytorchmergebot · 2023-02-10T01:04:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-10T07:03:26Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

drisspg · 2023-02-10T18:00:25Z

@pytorchbot merge -f "all checks are passing"

pytorchmergebot · 2023-02-10T18:02:36Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

drisspg requested review from albanD and jbschlosser as code owners February 2, 2023 22:46

Skylion007 changed the title ~~[SDPA] update type hint structure for scaled_dot_proudct_attention~~ [SDPA] update type hint structure for scaled_dot_product_attention Feb 2, 2023

drisspg force-pushed the add_typehints_to_sdpa branch from d764c8b to e0e1f69 Compare February 6, 2023 21:47

drisspg added the module: docs Related to our documentation, both in docs/ and docblocks label Feb 6, 2023

drisspg changed the title ~~[SDPA] update type hint structure for scaled_dot_product_attention~~ [SDPA] update type hint for scaled_dot_product_attention and documentation Feb 6, 2023

drisspg force-pushed the add_typehints_to_sdpa branch 3 times, most recently from 0c69708 to 86fe908 Compare February 6, 2023 23:00

drisspg force-pushed the add_typehints_to_sdpa branch 2 times, most recently from 61983fb to 36f569f Compare February 7, 2023 04:33

drisspg force-pushed the add_typehints_to_sdpa branch 3 times, most recently from f744c3a to 55cbd55 Compare February 8, 2023 03:04

albanD reviewed Feb 8, 2023

View reviewed changes

jbschlosser reviewed Feb 8, 2023

View reviewed changes

drisspg force-pushed the add_typehints_to_sdpa branch 2 times, most recently from 6c25dcc to d6c8538 Compare February 8, 2023 22:23

drisspg added 8 commits February 8, 2023 22:51

update type hint structure

8fb9de1

wrong list

f9e544b

lets see if this is right

680ee05

expose to nn.functional

f66ba19

broken html

48cdc7b

reorder links

e3f5d5f

update documentation

5765d62

update docs

51a0c84

drisspg added 7 commits February 8, 2023 22:51

switch to highlight block

d7419f6

arbitrary value embed dimension

81c8bea

if this was it that sucks

72ef902

make shape legend a list

817b73f

move some things around

b5dcbf2

some more updates

690e670

few more comments

5833a6d

drisspg force-pushed the add_typehints_to_sdpa branch from d6c8538 to 5833a6d Compare February 8, 2023 22:51

drisspg added 3 commits February 8, 2023 23:58

I hate rst

387d03d

burn rst to the ground

8f5e4d8

polish polish

2cac119

drisspg commented Feb 9, 2023

View reviewed changes

torch/nn/functional.py Show resolved Hide resolved

drisspg added 2 commits February 9, 2023 19:31

style guide change

7cf66d1

remove redundant fused shapes

d590c6f

drisspg requested a review from cpuhrsch February 9, 2023 21:11

remove heads

2a93079

cpuhrsch approved these changes Feb 10, 2023

View reviewed changes

drisspg added the module: multi-headed-attention label Feb 10, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 10, 2023

drisspg added the release notes: nn release notes category label Feb 10, 2023

pytorchmergebot added the Merged label Feb 10, 2023

pytorchmergebot closed this in 70026aa Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDPA] update type hint for scaled_dot_product_attention and documentation #94008

[SDPA] update type hint for scaled_dot_product_attention and documentation #94008

drisspg commented Feb 2, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Feb 2, 2023 •

edited

pytorch-bot bot commented Feb 6, 2023

drisspg commented Feb 7, 2023

drisspg commented Feb 7, 2023 •

edited

jbschlosser left a comment

drisspg commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

drisspg commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

[SDPA] update type hint for scaled_dot_product_attention and documentation #94008

[SDPA] update type hint for scaled_dot_product_attention and documentation #94008

Conversation

drisspg commented Feb 2, 2023 • edited by pytorch-bot bot

Summary

pytorch-bot bot commented Feb 2, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94008

❗ 2 Active SEVs

✅ No Failures

pytorch-bot bot commented Feb 6, 2023

drisspg commented Feb 7, 2023

drisspg commented Feb 7, 2023 • edited

jbschlosser left a comment

Choose a reason for hiding this comment

drisspg commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

Merge started

pytorchmergebot commented Feb 10, 2023

drisspg commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

Merge started

drisspg commented Feb 2, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Feb 2, 2023 •

edited

drisspg commented Feb 7, 2023 •

edited