[ROCm][INT4] Configurable ntile size for TilePacked format by ZhiweiYan-96 · Pull Request #3834 · pytorch/ao

ZhiweiYan-96 · 2026-02-06T03:29:40Z

Motivation

Fix a UT failure

pytest -sv test/integration/test_integration.py -k test_int4_weight_only_quant_subclass_api_grouped_5

The failed case is with shape (m, k, n)=(256, 256,8). The n dimension is smaller than the Matrix Core nTileSize=16 on AMD, while reasonable for Nv TensorCore with nTileSize=8

According to the code at https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/cuda/int4mm.cu#L1116

auto nTiles = (B.size(0) / nTileScaleFactor);

We can infer that

n/8 > 16, where 16=nTileScaleFactor*nTileSizeTensor otherwise, nTiles would be 0.(This is bug!)
n/8 must be a mulitple of 8, otherwise, there would be fractional number of tiles.

This PR fix it by using a proper padding size when calling find_mmultiple utils.

Testing

pytest -sv test/integration/test_integration.py -k test_int4_weight_only_quant_subclass_api_grouped_5

pytorch-bot · 2026-02-06T03:29:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3834

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c04c333 with merge base c17160a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ZhiweiYan-96 · 2026-02-06T03:30:46Z

@XiaobingSuper Mind take a look？

XiaobingSuper · 2026-02-06T03:38:50Z

torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py

@@ -127,7 +127,8 @@ def from_hp(

        # Pre-process: pad to required dimensions
        in_features = find_multiple(orig_in_features, 1024)
-        out_features = find_multiple(orig_out_features, 8)
+        n_tile = 16 if orig_out_features < 16 and torch.version.hip else 8


need to n_tile = 16 if torch.version.hip else 8? find_multiple will do a padding according to the given tile size.

That's right.

ZhiweiYan-96 · 2026-02-06T06:29:05Z

hi, @petrex @jithunnair-amd Could you please take a look for reviewing this PR and add ciflow/rocm label for testing? Thanks.

XiaobingSuper · 2026-02-06T06:58:50Z

@jerryzh168 could you help review it? Thanks!

ZhiweiYan-96 · 2026-02-26T09:10:27Z

The two failures are irrelevant to this PR.

test/prototype/test_parq.py::TestTorchAoConfigIntegration::test_tied_weights_quantization - AttributeError: 'list' object has no attribute 'keys'

 torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. port: 29500, useIpv6: false, code: -98, name: EADDRINUSE, message: address already in use

vkuzo · 2026-03-02T17:26:20Z

hi @ZhiweiYan-96 , looks reasonable overall. IMO we should make this user configurable instead of automatically selected, it's confusing when a packing format behaves differently based on the environment. Can we add this to the config instead of selecting it automatically? You can make the config clearly state which value the user needs to select on ROCm.

pytorch-bot · 2026-03-05T10:29:44Z

Warning: Unknown label ciflow/rocm-mi300.
Currently recognized labels are

ciflow/benchmark
ciflow/tutorials
ciflow/rocm
ciflow/4xh100
ciflow/xpu

Please add the new label to .github/pytorch-probot.yml

ZhiweiYan-96 · 2026-03-05T10:31:05Z

Thanks, @vkuzo , we can make n_tile_size as a configurable attributes in Int4WeightOnlyConfig, which defaults to 8 (nv) and 16 for the ROCm users.

XiaobingSuper · 2026-03-06T06:21:53Z

torchao/quantization/quant_api.py

         `int4_choose_qparams_algorithm`: variants of choose qparams algorithm to use for int4,
         currently support TINYGEMM ("tinygemm") and HQQ ("hqq"), used in version 2 only
        `set_inductor_config`: if True, adjusts `torchinductor` settings to recommended values. used in both version 1 and 2
        `version`: version of the config to use, default is 2


add int4_tile_packed_ntile description here? also note that int4_tile_packed_ntile only works Int4PackingFormat.TILE_PACKED_TO_4D case.

nice catch, moved

XiaobingSuper · 2026-03-06T06:23:43Z

torchao/quantization/quant_api.py

        Int4ChooseQParamsAlgorithm.TINYGEMM
    )
+    # ntile size for TILE_PACKED_TO_4D format, 8 for CUDA platform, 16 for ROCm platform
+    int4_tile_packed_ntile: int = 8


Do we need to add a check to ensure it only supports a limited set of values?

yes, added. https://github.com/pytorch/ao/pull/3834/changes/b390fec93994d91276a5d03b48d78c3bf6c71a5a..470b7ccca937f0b7beca642836666b7f5db3adde#diff-bf4d50867e3d649de2d89146592bf47d2f258c4c19126c8acf0e120ee904b726R775

ZhiweiYan-96

Request change is commited

ZhiweiYan-96 · 2026-03-06T08:30:46Z

torchao/quantization/quant_api.py

         `int4_choose_qparams_algorithm`: variants of choose qparams algorithm to use for int4,
         currently support TINYGEMM ("tinygemm") and HQQ ("hqq"), used in version 2 only
        `set_inductor_config`: if True, adjusts `torchinductor` settings to recommended values. used in both version 1 and 2
        `version`: version of the config to use, default is 2


nice catch, moved

ZhiweiYan-96 · 2026-03-06T08:31:01Z

torchao/quantization/quant_api.py

        Int4ChooseQParamsAlgorithm.TINYGEMM
    )
+    # ntile size for TILE_PACKED_TO_4D format, 8 for CUDA platform, 16 for ROCm platform
+    int4_tile_packed_ntile: int = 8


yes, added. https://github.com/pytorch/ao/pull/3834/changes/b390fec93994d91276a5d03b48d78c3bf6c71a5a..470b7ccca937f0b7beca642836666b7f5db3adde#diff-bf4d50867e3d649de2d89146592bf47d2f258c4c19126c8acf0e120ee904b726R775

vkuzo · 2026-03-06T14:29:44Z

torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py

        cls,
        hp_tensor: torch.Tensor,
        block_size: List[int],
+        ntile_size: Optional[int] = 8,


nit: maybe make this argument last, to avoid breaking any existing callsites that specify arguments positionally

thanks @vkuzo ! Move the argument to the last one https://github.com/pytorch/ao/pull/3834/changes#diff-632459893f0de165d48a87a900c6535f5b392520b47fffeddaa2175f5904b6baR102. The CI is green now

vkuzo · 2026-03-06T14:30:19Z

looks good! can we just make the new argument last, and after that if CI is green lgtm!

pytorch-bot · 2026-03-08T09:16:58Z

Warning: Unknown label ciflow/rocm-mi300.
Currently recognized labels are

ciflow/benchmark
ciflow/tutorials
ciflow/rocm
ciflow/4xh100
ciflow/xpu

Please add the new label to .github/pytorch-probot.yml

pytorch-bot bot added the device: rocm label Feb 6, 2026

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2026

XiaobingSuper reviewed Feb 6, 2026

View reviewed changes

ZhiweiYan-96 mentioned this pull request Feb 6, 2026

[ROCM] TorchAO Unit Test Gaps for ROCM #3666

Open

13 tasks

ZhiweiYan-96 requested a review from XiaobingSuper February 6, 2026 06:34

jithunnair-amd added the ciflow/rocm label Feb 10, 2026

ZhiweiYan-96 changed the title ~~[ROCm][INT4] Corner case n<Ntile handling~~ [WIP][ROCm][INT4] Corner case n<Ntile handling Mar 5, 2026

pytorch-bot bot added ciflow/rocm-mi300 topic: rocm labels Mar 5, 2026

pytorch-bot bot removed ciflow/rocm ciflow/rocm-mi300 labels Mar 6, 2026

ZhiweiYan-96 added 3 commits March 6, 2026 05:29

[ROCm][INT4] Corner case n<Ntile handling

63b79ce

use 16 for tile padding

8e726fc

add ntile_size in Int4WeightOnlyConfig

b390fec

ZhiweiYan-96 force-pushed the zhiwei/int4_ut branch from 84c6f59 to b390fec Compare March 6, 2026 05:29

lint

3fc271a

XiaobingSuper reviewed Mar 6, 2026

View reviewed changes

ZhiweiYan-96 added 2 commits March 6, 2026 08:25

move doc, add check

470b7cc

lint

3e37c96

ZhiweiYan-96 commented Mar 6, 2026

View reviewed changes

jithunnair-amd added the ciflow/rocm label Mar 6, 2026

ZhiweiYan-96 requested a review from XiaobingSuper March 6, 2026 08:43

vkuzo reviewed Mar 6, 2026

View reviewed changes

re-arrange the config order

b57829c

pytorch-bot bot removed the ciflow/rocm label Mar 7, 2026

jithunnair-amd added the ciflow/rocm label Mar 7, 2026

rearrange the tile size argument

c04c333

pytorch-bot bot removed the ciflow/rocm label Mar 7, 2026

jithunnair-amd added the ciflow/rocm label Mar 7, 2026

ZhiweiYan-96 requested a review from vkuzo March 8, 2026 09:14

ZhiweiYan-96 changed the title ~~[WIP][ROCm][INT4] Corner case n<Ntile handling~~ [ROCm][INT4] Configurable ntile size for TilePacked format Mar 8, 2026

pytorch-bot bot added the ciflow/rocm-mi300 label Mar 8, 2026

XiaobingSuper approved these changes Mar 9, 2026

View reviewed changes

vkuzo merged commit 67e5358 into pytorch:main Mar 9, 2026
23 checks passed

Conversation

ZhiweiYan-96 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Testing

Uh oh!

pytorch-bot bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3834

✅ No Failures

Uh oh!

ZhiweiYan-96 commented Feb 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiaobingSuper commented Feb 6, 2026

Uh oh!

ZhiweiYan-96 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Mar 2, 2026

Uh oh!

pytorch-bot bot commented Mar 5, 2026

Uh oh!

ZhiweiYan-96 commented Mar 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Mar 6, 2026

Uh oh!

pytorch-bot bot commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZhiweiYan-96 commented Feb 6, 2026 •

edited

Loading

pytorch-bot bot commented Feb 6, 2026 •

edited

Loading

ZhiweiYan-96 commented Feb 6, 2026 •

edited

Loading

ZhiweiYan-96 commented Feb 26, 2026 •

edited

Loading