Fix backward return count mismatch in _Float8GroupedMM#3956
Fix backward return count mismatch in _Float8GroupedMM#3956danielvegamyhre merged 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3956
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 84d572b with merge base 4e18d87 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
surprised 1xh100 CI tests didn't catch this |
|
Before my previous PR, the counts here were mismatched. |
|
i think i see the issue, we need to unskip this test: |
When I tested locally, it was set to unskip and the test passed. I just noticed that #3788 |
I see, let me test on h100 if the issue still exists, i think it may have been a transient env/build issue we never root caused |
|
@xiaobochen-amd i tested and and the original cublas error is resolved, but there is a new error (numerics mismatch over the threshold). forward pass outputs are identical with torch.equal, however, gradients are slightly different, with most columns identical but some columns requiring atol/rtol=1 to pass. I'll create an issue for this on the CUDA side. It doesn't block this, we can leave the test as skipped. |
No description provided.