Skip to content

Fix backward return count mismatch in _Float8GroupedMM#3956

Merged
danielvegamyhre merged 1 commit intopytorch:mainfrom
xiaobochen-amd:dev_fix
Feb 27, 2026
Merged

Fix backward return count mismatch in _Float8GroupedMM#3956
danielvegamyhre merged 1 commit intopytorch:mainfrom
xiaobochen-amd:dev_fix

Conversation

@xiaobochen-amd
Copy link
Copy Markdown
Contributor

No description provided.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Feb 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3956

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 84d572b with merge base 4e18d87 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 27, 2026
@danielvegamyhre danielvegamyhre self-requested a review February 27, 2026 02:09
@danielvegamyhre
Copy link
Copy Markdown
Contributor

surprised 1xh100 CI tests didn't catch this

@xiaobochen-amd
Copy link
Copy Markdown
Contributor Author

Before my previous PR, the counts here were mismatched.

@danielvegamyhre
Copy link
Copy Markdown
Contributor

i think i see the issue, we need to unskip this test:

@danielvegamyhre danielvegamyhre added module: training quantize_ api training flow moe labels Feb 27, 2026
@xiaobochen-amd
Copy link
Copy Markdown
Contributor Author

i think i see the issue, we need to unskip this test:

When I tested locally, it was set to unskip and the test passed. I just noticed that #3788
hasn’t been fixed, so I left it as is.

@danielvegamyhre
Copy link
Copy Markdown
Contributor

i think i see the issue, we need to unskip this test:

When I tested locally, it was set to unskip and the test passed. I just noticed that #3788 hasn’t been fixed, so I left it as is.

I see, let me test on h100 if the issue still exists, i think it may have been a transient env/build issue we never root caused

@danielvegamyhre
Copy link
Copy Markdown
Contributor

@xiaobochen-amd i tested and and the original cublas error is resolved, but there is a new error (numerics mismatch over the threshold). forward pass outputs are identical with torch.equal, however, gradients are slightly different, with most columns identical but some columns requiring atol/rtol=1 to pass.

I'll create an issue for this on the CUDA side. It doesn't block this, we can leave the test as skipped.

@danielvegamyhre danielvegamyhre merged commit 9bdc0ca into pytorch:main Feb 27, 2026
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: training quantize_ api training flow moe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants