-
Notifications
You must be signed in to change notification settings - Fork 657
pad dequantized paged fp8 kv with zeros #4780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This pull request was exported from Phabricator. Differential Revision: D80977902 |
Summary: X-link: facebookresearch/FBGEMM#1803 Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001. Differential Revision: D80977902
5a83a60
to
c353791
Compare
Summary: X-link: facebookresearch/FBGEMM#1803 Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001. Differential Revision: D80977902
c353791
to
b5046be
Compare
This pull request was exported from Phabricator. Differential Revision: D80977902 |
Summary: Pull Request resolved: pytorch#4780 X-link: facebookresearch/FBGEMM#1803 Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001. Differential Revision: D80977902
b5046be
to
6b58c11
Compare
This pull request was exported from Phabricator. Differential Revision: D80977902 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D80977902 |
Summary: Pull Request resolved: pytorch#4780 X-link: facebookresearch/FBGEMM#1803 Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001. Differential Revision: D80977902
6b58c11
to
3cb21bd
Compare
This pull request was exported from Phabricator. Differential Revision: D80977902 |
This pull request has been merged in 699954b. |
Summary: Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.
Differential Revision: D80977902