[X86] Update smooth quant fusion pattern by cyxlily · Pull Request #3784 · pytorch/ao

cyxlily · 2026-01-30T08:39:23Z

The SmoothQuant implementation has changed in Torchao recently, so we need to update the fusion patterns in FX graph passes of Inductor accordingly.
Besides, we added a reduce_range argument in Int8Tensor to accommodate CPUs without AVX512_VNNI to avoid overflow.
Unit tests are updated too.

Signed-off-by: Cui, Lily <lily.cui@intel.com>

pytorch-bot · 2026-01-30T08:39:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3784

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 8 New Failures, 1 Cancelled Job

As of commit a3d1fd9 with merge base 9472d7d ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio mslk --index-url https://download.... / linux-job (gh)
RuntimeError: Command docker exec -t 8c64cf6bcbd8685edf90a240cd36775af8ac3320506c00dd4c9bf3bdc22c1f10 /exec failed with exit code 1
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytor... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytor... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test (CUDA 2.10, linux.g5.12xlarge.nvidia.gpu, torch==2.10.0 torchvision==0.25.0, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0 torchvision==0.23.0, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1 torchvision==0.24.1, cuda, 12.6) / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch torchvision --index-url https://download.py... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch torchvision --index-url htt... / linux-job (gh)
test/quantization/pt2e/test_x86inductor_fusion.py::DynamicShapesCppWrapperCpuTests::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True_dynamic_shapes_cpp_wrapper

CANCELLED JOB - The following job was cancelled. Please retry:

Run Regression Tests / test (CPU 2.10, linux.4xlarge, torch==2.10.0 torchvision==0.25.0 --index-url https://download.pyt... / linux-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen

Thanks for the PR. Please also add a UT.

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Moved to the other pr. Signed-off-by: Cui, Lily <lily.cui@intel.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Cui, Lily <lily.cui@intel.com>

claude · 2026-04-09T08:14:28Z

Claude finished @Xia-Weiwen's task in 2m 51s —— View job

Code Review: PR #3784 - Update smooth quant fusion pattern (Round 7)

Gather context and read changed files
Review new commits since round 6
Review x86.py inductor pass changes
Review test changes
Review reduce_range and other file changes
Post review feedback

Summary

Since round 6, two new commits have been added:

67878e1 Fix input_dtype: The test input is now created with torch.bfloat16 dtype when enable_autocast=True, matching the autocast dtype. Previously test inputs were always torch.float32 regardless of autocast setting.
1e029da Change reduce_range to optional: Addresses the backward compatibility concern from round 6 (issue Adding gpu quantization workflows and apis #1). reduce_range is moved from tensor_attribute_names (required) to optional_tensor_attribute_names, and its type is changed from bool to Optional[bool] with default False. The field is also moved to the end of the Int8Tensor.__new__/__init__/from_hp parameter lists to avoid breaking positional callers. The config classes (Int8DynamicActivationInt8WeightConfig, Int8StaticActivationInt8WeightConfig) are similarly updated, with reduce_range moved after set_inductor_config and validation assertions added.

The x86.py inductor pass code is unchanged from round 5/6.

Issues

1. `_pin_memory` doesn't propagate `act_pre_scale`

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:470-480

Not new to this PR, but worth noting since the reduce_range propagation effort surfaced it: the aten._pin_memory implementation propagates reduce_range, zero_point, act_quant_scale, act_quant_zero_point, and act_quant_kwargs, but does not propagate act_pre_scale. The aten.slice.Tensor implementation at line 408-420 does propagate it. This is a pre-existing issue and doesn't need to block this PR, but it could cause data loss if _pin_memory is called on an Int8Tensor that has act_pre_scale set (e.g., SmoothQuant tensors).

2. `QuantizeTensorToInt8Kwargs.reduce_range` is `bool`, not `Optional[bool]`

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:52

QuantizeTensorToInt8Kwargs has reduce_range: bool = False (non-optional), while Int8Tensor has reduce_range: Optional[bool] = False. The docstring says "optional flag" but the type annotation is bool. This is a minor inconsistency — consider making these match. Since QuantizeTensorToInt8Kwargs is a dataclass that users construct explicitly (not deserialized), bool is arguably fine here, but the mismatch could be confusing.

3. `_int8_static_activation_int8_weight_transform` no longer passes `reduce_range` to `from_hp` via keyword before `act_quant_kwargs`

torchao/quantization/quant_api.py:1231-1242

In the diff, reduce_range=config.reduce_range was moved from before act_quant_kwargs to after act_quant_zero_point. Since from_hp now has reduce_range as the last keyword-only parameter, this is correct. However, the weight itself is quantized via from_hp and the reduce_range flag controls the weight's quant range too. Let me verify the call is still correct... yes, reduce_range is passed as a keyword argument to from_hp, so the reordering is fine. No issue here — just confirming.

Stale Nits from Previous Rounds (Still Present)

Stale `type: ignore[possibly-undefined]` comments

torchao/quantization/pt2e/inductor_passes/x86.py:2060,2066

out_shape is always defined at line 2052 via kwargs.get("out_shape", None), so these # type: ignore[possibly-undefined] comments are unnecessary. Harmless but should be cleaned up.

Bias+reshape deduplication

torchao/quantization/pt2e/inductor_passes/x86.py:2053-2067

The reshape logic is identical in both the if bias and else branches. Could be simplified to:

if bias is not None:
    new_out_node = match.graph.call_function(
        aten.add.Tensor, args=(new_out_node, bias)
    )
if out_shape is not None:
    new_out_node = match.graph.call_function(
        aten.reshape.default, args=(new_out_node, out_shape)
    )

`enable_autocast` skip uses `return` instead of `skipTest`

test/quantization/pt2e/test_x86inductor_fusion.py:2776-2777

Using return silently passes. self.skipTest("bf16 not supported") would be more visible in test reports.

Test Coverage Note

All four base_config parametrizations use reduce_range=True. The default (reduce_range=False) is not exercised by test_smooth_quant_pattern. The existing test_da8w8_sym_act_sym_wgt_with_int_mm test doesn't use reduce_range, so it effectively tests the False path, which provides some coverage. This is acceptable.

Looks Good

Backward compatibility fix (1e029da): Moving reduce_range to optional_tensor_attribute_names addresses the serialization concern from round 6. Old checkpoints without reduce_range will deserialize correctly with None as the default.
Input dtype fix (67878e1): Creating bf16 test inputs when enable_autocast=True is more representative of real usage — users passing bf16 inputs under autocast. This also properly exercises the convert_element_type patterns in the graph.
Config field reordering: Moving reduce_range after set_inductor_config in both config classes avoids breaking any existing positional construction of these configs.
Validation assertions: Both config classes now assert reduce_range in (True, False), catching None or other invalid values early.
reduce_range propagation: Thoroughly propagated through slice, _pin_memory, select, and from_hp. All construction sites pass it correctly.
x86.py inductor pass: Unchanged and clean. Pattern composition, registration, bias validation, dtype assertion, and per-channel path precision handling all remain correct.

Verdict

This PR is in good shape. The backward compatibility concern from round 6 has been addressed. The remaining items are all minor nits. Ready for merge.

View job

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen · 2026-04-09T08:57:19Z

@claude review again

claude · 2026-04-09T08:57:47Z

Claude finished @Xia-Weiwen's task in 2m 8s —— View job

Code Review: PR #3784 - Update smooth quant fusion pattern (Round 8)

Gather context and read changed files
Review new commits since round 7
Review x86.py inductor pass changes
Review test changes
Review reduce_range and other file changes
Post review feedback

Summary

Since round 7, two new commits have been added:

5721ede Add act_pre_scale in pin_memory: Fixes the pre-existing issue flagged in round 7 (issue Adding gpu quantization workflows and apis #1) where aten._pin_memory and aten.is_pinned did not propagate act_pre_scale. Both implementations now correctly handle this field. The aten.slice.Tensor, aten.select.int, and aten._pin_memory implementations are now consistent in propagating all fields including act_pre_scale.
Round 7 nits addressed: The bias+reshape deduplication from previous rounds has been implemented — the reshape logic is no longer duplicated in the if bias and else branches. The stale type: ignore[possibly-undefined] comments have been removed. The skipTest change (replacing return with self.skipTest("bf16 not supported")) is now in place.

The x86.py inductor pass pattern code is unchanged from rounds 5-7.

Remaining Minor Items

1. `QuantizeTensorToInt8Kwargs.reduce_range` type inconsistency

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:52

QuantizeTensorToInt8Kwargs has reduce_range: Optional[bool] = False while Int8Tensor also has reduce_range: Optional[bool] = False. These now match — good. However, at the call site in torchao/quantization/quantize_/common/quantize_tensor_kwargs.py:66, the value is passed directly from the dataclass field to Int8Tensor.from_hp, so there's no type mismatch in practice. This is fine.

2. `reduce_range` test coverage is one-sided

test/quantization/pt2e/test_x86inductor_fusion.py:2739-2746

All four base_config parametrizations in test_smooth_quant_pattern use reduce_range=True. The default path (reduce_range=False) is not exercised by this specific test. The existing test_da8w8_sym_act_sym_wgt_with_int_mm test doesn't set reduce_range, so it implicitly tests the default False path, providing some coverage. This is acceptable but worth noting — if a future change breaks the reduce_range=False path for the SmoothQuant workflow specifically, this test won't catch it.

3. `_test_common` tolerance behavior (pre-existing, minor)

test/quantization/pt2e/test_x86inductor_fusion.py:203,208

The atol, rtol = 1e-2, 1e-2 unconditionally overwrites the caller's tolerances when autocast is enabled. The test_smooth_quant_pattern test doesn't pass custom tolerances, so this doesn't affect it. No current callers pass wider tolerances under autocast. This is fine for now.

Looks Good

act_pre_scale propagation fix (5721ede): aten.is_pinned and aten._pin_memory now correctly check/propagate act_pre_scale. This fixes the pre-existing issue from round 7 and ensures SmoothQuant tensors survive pin_memory() without data loss.
Bias+reshape deduplication: The reshape logic in the per-channel replacement handler is now clean — bias add and reshape are sequential if blocks instead of duplicated in both branches. This addresses the nit from rounds 5-7.
skipTest usage (test line 2777): self.skipTest("bf16 not supported") replaces the silent return, making skipped tests visible in test reports.
reduce_range as optional attribute: Correctly placed in optional_tensor_attribute_names, preserving backward compatibility with old serialized Int8Tensor checkpoints. The validation assertions in both config classes (assert self.reduce_range in (True, False)) catch invalid values early.
reduce_range propagation is comprehensive: All Int8Tensor construction sites (from_hp, aten.slice.Tensor, aten._pin_memory, aten.select.int) correctly propagate reduce_range. Both config classes (Int8DynamicActivationInt8WeightConfig, Int8StaticActivationInt8WeightConfig) thread reduce_range through to from_hp and QuantizeTensorToInt8Kwargs.
x86.py inductor pass: Pattern composition, registration, bias validation, dtype assertion, and per-channel path precision handling are all correct and clean. The per-channel path uses x_scale_dtype (float32) for intermediate precision and applies output_convert at the end.
Test design: End-to-end SmoothQuantConfig + quantize_() with parametrization over base_config, has_bias, enable_autocast, input_ndim, and dynamic provides good coverage. The check_output_dtype=True addition validates the compiled output dtype matches eager mode.
Parameter ordering consistency: reduce_range is consistently placed as the last keyword argument in all Int8Tensor constructor calls (from_hp, slice, _pin_memory, select), maintaining a clean and predictable API.

Verdict

This PR is ready for merge. All issues from previous review rounds have been addressed. The only remaining item is that test_smooth_quant_pattern only tests reduce_range=True — but this is a minor coverage gap since the default path is exercised elsewhere.

View job

jerryzh168 · 2026-04-09T17:11:50Z

        Union[Granularity, Tuple[Granularity, Granularity], list[Granularity]]
    ] = PerRow()
    set_inductor_config: bool = True
+    reduce_range: Optional[bool] = False


should just be bool

Thanks for reviewing. This is for compatibility purpose (per review comments from claude). If it is not optional, it may break backward compatibility if users saved quantized model with old torchao and now want to load it with the new torchao.

jerryzh168 · 2026-04-09T17:13:23Z

            "Please set it to MappingType.SYMMETRIC or "
            "MappingType.ASYMMETRIC."
        )
+        assert self.reduce_range in (True, False), (


if this is only for x86, maybe do a check to make sure user is using cpu when the flag is set to True?

If it's not on CPU, reduce_range still works so not necessary to check I think.

probably better to add the check if we don't expect other backend to use it, this will help us decrease the user surface and make it less error prone for users I think

jerryzh168 · 2026-04-09T17:13:45Z

            "Please set it to MappingType.SYMMETRIC or "
            "MappingType.ASYMMETRIC."
        )
+        assert self.reduce_range in (True, False), (


jerryzh168 · 2026-04-09T17:15:04Z

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

Xia-Weiwen · 2026-04-10T01:36:59Z

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.
A better way to handle this may be that we check AVX512_VNNI availability and set reduce_range accordingly. Though reduce_range=True will still not be tested in CI but code looks better and it can be tested in our own environments. How does that sound to you?

Signed-off-by: Cui, Lily <lily.cui@intel.com>

cyxlily · 2026-04-10T08:05:03Z

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

Thanks for comments, split the reduce_range to the PR #4266

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168

@Xia-Weiwen please feel free to be the codeowner for cpu only changes

you can add yourself to https://github.com/pytorch/ao/blob/main/CODEOWNERS as well for relevant files

jerryzh168 · 2026-04-13T23:46:40Z

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.

can this be set automatically when we are in CPU + not in AVX512_VNNI machine? ideally we make the user's life easier

maybe one question here is do we always prepare and serve the quantized model in the same type of machine or not

cyxlily · 2026-04-14T05:41:17Z

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.

can this be set automatically when we are in CPU + not in AVX512_VNNI machine? ideally we make the user's life easier

maybe one question here is do we always prepare and serve the quantized model in the same type of machine or not

Thanks for comments, I removed the manual reduce_range setting and set it automatically in int8 tensor.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen · 2026-04-29T15:17:16Z

@claude Review again

claude · 2026-04-29T15:17:49Z

Claude finished @Xia-Weiwen's task in 3m 46s —— View job

Code Review: PR #3784 - Update smooth quant fusion pattern (Round 9)

Gather context and read changed files
Review x86.py inductor pass changes
Review test changes
Review other changed files (intmm.py, utils.py)
Post review feedback

Summary

Since the last review (round 8), the branch has been rebased/merged with upstream main (commit e2b1ccd), incorporating the reduce_range changes from the separately merged PR #4266. The only new commit since the merge is a3d1fd9 Fix lint, which adds a missing clone_inputs assignment in _test_common. All previous round feedback has been addressed. The PR is in good shape overall.

I reviewed the full diff (4 files, ~430 additions, ~200 deletions) against the upstream merge parent.

Changes Overview

torchao/quantization/pt2e/inductor_passes/x86.py — Major rewrite of _register_smooth_quant_int_mm_pattern():
- get_pattern_no_bias() now takes reshape_a, convert_scaled_matmul, and use_u8s8 parameters, composing patterns cleanly
- 16 pattern variants registered (s8s8 + u8s8) × (no_reshape + with_reshape) × (no_bias + with_bias) × (no_convert + with_output_convert)
- Per-channel path correctly uses x_scale_dtype (float32) for intermediate precision
- Defensive x_meta_val null check added at line 465-466
test/quantization/pt2e/test_x86inductor_fusion.py — Test redesign:
- test_smooth_quant_with_int_mm → test_smooth_quant_pattern using end-to-end SmoothQuantConfig + quantize_()
- Parametrized over base_config (4 configs), has_bias, enable_autocast, input_ndim, dynamic
- expand_a_scale parameter removed from test_da8w8_sym_act_sym_wgt_with_int_mm
- check_output_dtype and _clone_inputs additions to _test_common
- _should_use_u8s8() helper for expected node count calculation
torchao/kernel/intmm.py — _cpu_is_amx_tile_supported() moved to torchao/utils.py for reuse
torchao/utils.py — _cpu_is_amx_tile_supported() added and exported

Issues

1. `_test_common`: `clone_inputs` referenced before assignment when `check_code=False`

test/quantization/pt2e/test_x86inductor_fusion.py:307-309

The new clone_inputs assignment is inside the if check_code: block (line 307), but the original code at line 309 already had actual = torch.compile(mod, **compile_options)(*clone_inputs) inside that same block, so this is fine. However, looking at the broader structure of _test_common:

with torch.no_grad(), maybe_autocast:
    if check_code:
        clone_inputs = self._clone_inputs(inputs)  # <-- new line
        expected = mod(*inputs)
        actual = torch.compile(mod, **compile_options)(*clone_inputs)

This correctly ensures inputs isn't mutated before comparing expected vs actual. Good fix — the old code was silently using already-mutated inputs for the compiled path (or vice versa).

2. Bias validation was removed — intentional?

The old _validate_pattern had bias shape checks (ensuring bias is 1D, checking reshape compatibility). These were removed. Since the test now uses end-to-end SmoothQuantConfig + quantize_() which always produces valid bias shapes, this is likely fine in practice. But the validation was a safety net against unexpected graph patterns. If the pattern matcher accidentally matches a non-quantization graph with a similar structure, the bias check would have caught it. The weight get_attr check (lines 2091-2095) still provides some protection.

This is a minor concern — just noting that the defense-in-depth was reduced.

3. Node count comment doesn't document u8s8 counts

torchao/quantization/pt2e/inductor_passes/x86.py:1976-1987

The comment block maps node counts 4-9 to specific patterns, then says "8-13: u8s8 decomposed equivalents with input shift and compensation subtraction." The u8s8 patterns add 4 nodes (convert_i32 + add_128 + convert_u8 + sub_comp) on top of the base patterns, so:

u8s8 base: 8, 9, 10, 11, 12, 13

But this overlaps with the s8s8 node counts 8 and 9. The validator accepts all of [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] which is correct — the pattern matcher distinguishes by structure, not node count. The comment could be clearer that counts 8-9 are shared between s8s8 and u8s8 variants. Minor documentation nit.

Looks Good

Pattern composition via get_pattern_no_bias() with reshape_a, convert_scaled_matmul, and use_u8s8 parameters is clean and avoids the duplication from the old approach.
Per-channel precision: The per-channel path (line 2173) correctly uses x_scale_dtype (float32) for the qlinear output dtype, preserving intermediate precision for mul(x_scale). The output_convert is applied at the end (lines 2198-2202). This correctly matches the original graph's computation order.
Dtype assertion (lines 2032-2033): assert kwargs["output_dtype"] == kwargs["scaled_matmul_dtype"] catches any divergence in the two convert dtypes, which is a clean invariant check.
Pattern registration ordering (lines 1998-2015): More specific patterns (larger, with reshape + bias + output_convert) get lower pass numbers (matched first), preventing partial matches. The 16-pattern registration is well-organized.
Bias + reshape deduplication (lines 2188-2196): Clean sequential if blocks instead of duplicated branches.
has_output_convert flag (line 2136, 2198-2202): Correctly detects and reproduces the output type conversion in the replacement graph.
u8s8 handling: The u8s8 decomposition in both scalar and per-channel paths (lines 2091-2108, 2145-2161) correctly converts int8→int32→add 128→uint8 and subtracts the compensation term, matching the decomposed path in _int_scaled_matmul_cpu.
Defensive null check (line 465-466): x_meta_val = match.kwargs["x"].meta.get("val", None) prevents KeyError when "val" is missing from metadata. Good defensive fix.
_cpu_is_amx_tile_supported refactor: Moving this from intmm.py to utils.py is the right call — it's now reusable by the test helper _should_use_u8s8().
Test redesign: End-to-end SmoothQuantConfig + quantize_() is far more representative and less brittle than the hand-crafted forward methods from the old test. The parametrization matrix (4 configs × 2 bias × 2 autocast × 2 ndim × 2 dynamic = 64 combinations) provides thorough coverage.
skipTest usage (line 2849): Properly uses self.skipTest() instead of silent return, making skipped tests visible in reports.
check_output_dtype (lines 310-311): Useful general-purpose addition that validates compiled model output dtype matches eager mode.
should_reduce_range integration (test lines 2863-2864): Automatically sets reduce_range=True based on hardware capability, making tests portable across CI machines.

Verdict

This PR is ready for merge. All issues from previous review rounds (1-8) have been addressed. The remaining items above are minor documentation nits and a note about the removed bias validation, neither of which should block merging. The code is well-structured, the patterns are comprehensively tested, and the u8s8 support is a valuable addition.

View job

Add convert_element_type to smooth quant pattern

513bddc

Signed-off-by: Cui, Lily <lily.cui@intel.com>

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026

Xia-Weiwen requested a review from Copilot January 30, 2026 08:45

This comment was marked as resolved.

Sign in to view

Xia-Weiwen changed the title ~~Add convert_element_type to smooth quant pattern~~ [X86] Add convert_element_type to smooth quant pattern Jan 30, 2026

cyxlily added 3 commits February 2, 2026 11:00

Merge branch 'pytorch:main' into smooth_quant_pattern

7229c42

Cleanup patterns

2c45d12

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Update nodes

22b7ed9

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen reviewed Feb 4, 2026

View reviewed changes

cyxlily added 5 commits February 10, 2026 10:17

Merge remote-tracking branch 'upstream/main' into smooth_quant_pattern

866e92e

Remove cpu expand

92cfcee

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Remove expand pattern

7b8b44b

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Rename pattern

be223bf

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Refine codes

2ec8981

Signed-off-by: Cui, Lily <lily.cui@intel.com>

jerryzh168 reviewed Feb 13, 2026

View reviewed changes

Comment thread torchao/kernel/intmm.py Outdated

cyxlily added 4 commits February 13, 2026 18:39

Merge remote-tracking branch 'upstream/main' into smooth_quant_pattern

3fbafad

Add unit test

760e66d

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Merge branch 'pytorch:main' into smooth_quant_pattern

a53e0ea

Revert cpu intmm change

3217cb6

Moved to the other pr. Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen requested a review from Copilot March 18, 2026 08:40

Copilot started reviewing on behalf of Xia-Weiwen March 18, 2026 08:40 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread torchao/quantization/pt2e/inductor_passes/x86.py

Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated

Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated

cyxlily added 7 commits March 20, 2026 09:56

Merge branch 'pytorch:main' into smooth_quant_pattern

eb2df47

Merge remote-tracking branch 'upstream/main' into smooth_quant_pattern

00ee982

Refine comments

47f6545

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Update pattern validate

16ff953

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Rename pattern and add unit tests

39b0ec6

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Merge remote-tracking branch 'upstream/main' into smooth_quant_pattern

2a45ad8

Fix unit tests

1873021

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Add act_pre_scale in pin_memory

5721ede

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Xia-Weiwen approved these changes Apr 9, 2026

View reviewed changes

Xia-Weiwen added the ciflow/rocm label Apr 9, 2026

jerryzh168 reviewed Apr 9, 2026

View reviewed changes

Split reduce_range to another PR

acb8fec

Signed-off-by: Cui, Lily <lily.cui@intel.com>

pytorch-bot Bot removed the ciflow/rocm label Apr 10, 2026

Change vnni check import

d1992e6

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168 approved these changes Apr 13, 2026

View reviewed changes

jerryzh168 mentioned this pull request Apr 14, 2026

Add reduce_range to avoid overflow in int8 tensor #4266

Merged

Cui, Lily added 3 commits April 14, 2026 06:03

Remove the manual reduce_range setting

58b433f

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Use should_reduce_range

b205f11

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Merge remote-tracking branch 'upstream/main' into smooth_quant_pattern

6b71229

cyxlily requested a review from danielvegamyhre as a code owner April 20, 2026 05:41

Cui, Lily and others added 5 commits April 20, 2026 06:05

Add u8s8 pattern

3645567

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Fix x meta val bug

6ffe8b4

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Set x_zp shape 0D

96709d2

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Merge upstream/main into smooth_quant_pattern

e2b1ccd

Fix lint

a3d1fd9

Signed-off-by: Cui, Lily <lily.cui@intel.com>

Conversation

cyxlily commented Jan 30, 2026 • edited by Xia-Weiwen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3784

❗ 1 Active SEVs

❌ 8 New Failures, 1 Cancelled Job

Uh oh!

This comment was marked as resolved.

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #3784 - Update smooth quant fusion pattern (Round 7)

Summary

Issues

1. _pin_memory doesn't propagate act_pre_scale

2. QuantizeTensorToInt8Kwargs.reduce_range is bool, not Optional[bool]

3. _int8_static_activation_int8_weight_transform no longer passes reduce_range to from_hp via keyword before act_quant_kwargs

Stale Nits from Previous Rounds (Still Present)

Stale type: ignore[possibly-undefined] comments

Bias+reshape deduplication

enable_autocast skip uses return instead of skipTest

Test Coverage Note

Looks Good

Verdict

Uh oh!

Xia-Weiwen commented Apr 9, 2026

Uh oh!

claude Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #3784 - Update smooth quant fusion pattern (Round 8)

Summary

Remaining Minor Items

1. QuantizeTensorToInt8Kwargs.reduce_range type inconsistency

2. reduce_range test coverage is one-sided

3. _test_common tolerance behavior (pre-existing, minor)

Looks Good

Verdict

Uh oh!

jerryzh168 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Apr 9, 2026

Uh oh!

Xia-Weiwen commented Apr 10, 2026

cyxlily commented Jan 30, 2026 •

edited by Xia-Weiwen

Loading

pytorch-bot Bot commented Jan 30, 2026 •

edited

Loading

claude Bot commented Apr 9, 2026 •

edited

Loading

1. `_pin_memory` doesn't propagate `act_pre_scale`

2. `QuantizeTensorToInt8Kwargs.reduce_range` is `bool`, not `Optional[bool]`

3. `_int8_static_activation_int8_weight_transform` no longer passes `reduce_range` to `from_hp` via keyword before `act_quant_kwargs`

Stale `type: ignore[possibly-undefined]` comments

`enable_autocast` skip uses `return` instead of `skipTest`

claude Bot commented Apr 9, 2026 •

edited

Loading

1. `QuantizeTensorToInt8Kwargs.reduce_range` type inconsistency

2. `reduce_range` test coverage is one-sided

3. `_test_common` tolerance behavior (pre-existing, minor)

jerryzh168 Apr 9, 2026 •

edited

Loading

jerryzh168 commented Apr 13, 2026 •

edited

Loading

claude Bot commented Apr 29, 2026 •

edited

Loading

1. `_test_common`: `clone_inputs` referenced before assignment when `check_code=False`