-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update nested tensor + MHA tutorial to include SDPA + torch.compile #2813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2813
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a4feb18 with merge base d3cf027 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@svekars is it possible to see a rendered version of the docs? I tried clicking the link Preview Python docs built from this PR` but I get: <Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>NJK553V5M50YSYCH</RequestId>
<HostId>x0Kg4Bq9i51Qa+uIaXFdTbhzI6iAGdRlFYfDhjYjo1vUyaLDXESxv/8jEJMVjFHDU6LUUPFkNsI=</HostId>
</Error> |
@jbschlosser - there was an error on one of the workers. The preview only becomes available after the manager finishes building. Now it passes and the preview is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An editorial pass. Looks good overall.
Should we also update the usage here: https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html? |
good call yes |
6cf13ed
to
c879422
Compare
@jbschlosser the PR looks good - do you want to finish up the TODOs? |
this will take a fair amount of work; might be worth landing this for now and addressing all that later on when I can carve out some time |
@pytorchbot merge |
@svekars what's the merging procedure for this repo? |
Hi, sorry to resurrect this. I had a qq: is it still the case that fused implementations of sdpa (like the FA kernels) don't support |
@kkt-cohere sorry for the delay, I was out the last couple weeks.
This isn't really true anymore. Nested tensors with |
As per title. Idea is to show how to implement MHA using NJT + torch.compile and get nice speedups.
TODO (future work):