fix undefined values for tail elements in act quant kernels by iamzainhuda · Pull Request #4186 · pytorch/ao

iamzainhuda · 2026-03-26T21:15:34Z

Summary

Fixed a tail-handling bug in the blockwise FP8 activation quantization kernels that was corrupting reciprocal scale tensors and causing test_triton_fp8_gemm_1x128_128x128 to fail on ragged shapes.

The GEMM consumes a_s / b_s scale tensors produced by the quantization kernels, and those scale tensors could be corrupted at the tensor edges. In the LHS activation quant kernel, masked tail lanes were still storing scales into a compact column-major as_strided buffer. Because that buffer has no padding, logically invalid row writes could alias valid scale entries from the next column. The GEMM then used those corrupted scales and SQNR collapsed on small or ragged M.

Updated RHS and LHS blockwise quant kernels with tl.load(..., other=0.0) for masked tail loads. And masked reciprocral scale tl.store calls so invalid lanes don't write to scale buffers.

Failures before fix:

Testing

pytest -q test/prototype/blockwise_fp8_training/test_blockwise_kernels.py -k test_triton_fp8_gemm_1x128_128x128
pytest -q test/prototype/blockwise_fp8_training/test_blockwise_kernels.py
- now passes entirely (42 passed)

pytorch-bot · 2026-03-26T21:15:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4186

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1d57301 with merge base 96a9cdf ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/dtypes/test_nf4.py::TestComm::test_comm

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre

Lgtm, thanks for fixing this!

danielvegamyhre · 2026-03-26T23:19:43Z

(this fix is indeed needed, just note that the triton fp8 blockwise gemms were experimental and not performant, not actually used. we use torch._scaled_mm which dispatches to cublas)

fix undefined values for tail elements in act quant kernels

1d57301

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2026

iamzainhuda added topic: bug fix Use this tag for PRs that fix bugs module: training quantize_ api training flow labels Mar 26, 2026

iamzainhuda requested a review from danielvegamyhre March 26, 2026 21:48

danielvegamyhre approved these changes Mar 26, 2026

View reviewed changes

danielvegamyhre added this to the FP8 Blockwise Training milestone Mar 26, 2026

iamzainhuda merged commit a6be48f into main Mar 27, 2026
23 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix undefined values for tail elements in act quant kernels#4186

fix undefined values for tail elements in act quant kernels#4186
iamzainhuda merged 1 commit intomainfrom
fix-tail-elements-act-quant

iamzainhuda commented Mar 26, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

danielvegamyhre left a comment

Uh oh!

danielvegamyhre commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamzainhuda commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

pytorch-bot bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4186

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

danielvegamyhre left a comment

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamzainhuda commented Mar 26, 2026 •

edited

Loading

pytorch-bot bot commented Mar 26, 2026 •

edited

Loading