Skip to content

Conversation

bottler
Copy link
Contributor

@bottler bottler commented Aug 27, 2025

Summary: Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.

Differential Revision: D80977902

Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 3cb21bd
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68aefe4168260600089e576a
😎 Deploy Preview https://deploy-preview-4780--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Aug 27, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80977902

bottler added a commit to bottler/FBGEMM-1 that referenced this pull request Aug 27, 2025
Summary:

X-link: facebookresearch/FBGEMM#1803

Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.

Differential Revision: D80977902
bottler added a commit to bottler/FBGEMM-1 that referenced this pull request Aug 27, 2025
Summary:

X-link: facebookresearch/FBGEMM#1803

Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.

Differential Revision: D80977902
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80977902

bottler added a commit to bottler/FBGEMM-1 that referenced this pull request Aug 27, 2025
Summary:
Pull Request resolved: pytorch#4780

X-link: facebookresearch/FBGEMM#1803

Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.

Differential Revision: D80977902
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80977902

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80977902

Summary:
Pull Request resolved: pytorch#4780

X-link: facebookresearch/FBGEMM#1803

Pad zeros after the end of used sequences to avoid nans in flash attention 3, in the dequantization of fp8 paged kv-cache. This is analogous to the non-paged case which was tackled in D69522001.

Differential Revision: D80977902
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80977902

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 699954b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants