Skip to content

[X86] Update smooth quant fusion pattern#3784

Open
cyxlily wants to merge 43 commits intopytorch:mainfrom
cyxlily:smooth_quant_pattern
Open

[X86] Update smooth quant fusion pattern#3784
cyxlily wants to merge 43 commits intopytorch:mainfrom
cyxlily:smooth_quant_pattern

Conversation

@cyxlily
Copy link
Copy Markdown
Contributor

@cyxlily cyxlily commented Jan 30, 2026

The SmoothQuant implementation has changed in Torchao recently, so we need to update the fusion patterns in FX graph passes of Inductor accordingly.
Besides, we added a reduce_range argument in Int8Tensor to accommodate CPUs without AVX512_VNNI to avoid overflow.
Unit tests are updated too.

Signed-off-by: Cui, Lily <lily.cui@intel.com>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 30, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3784

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures, 1 Cancelled Job

As of commit a3d1fd9 with merge base 9472d7d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026
@Xia-Weiwen Xia-Weiwen requested a review from Copilot January 30, 2026 08:45

This comment was marked as resolved.

@Xia-Weiwen Xia-Weiwen changed the title Add convert_element_type to smooth quant pattern [X86] Add convert_element_type to smooth quant pattern Jan 30, 2026
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Copy link
Copy Markdown
Collaborator

@Xia-Weiwen Xia-Weiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Please also add a UT.

Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Comment thread torchao/kernel/intmm.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread torchao/quantization/pt2e/inductor_passes/x86.py
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
Comment thread torchao/quantization/pt2e/inductor_passes/x86.py Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 9, 2026

Claude finished @Xia-Weiwen's task in 2m 51s —— View job


Code Review: PR #3784 - Update smooth quant fusion pattern (Round 7)

  • Gather context and read changed files
  • Review new commits since round 6
  • Review x86.py inductor pass changes
  • Review test changes
  • Review reduce_range and other file changes
  • Post review feedback

Summary

Since round 6, two new commits have been added:

  1. 67878e1 Fix input_dtype: The test input is now created with torch.bfloat16 dtype when enable_autocast=True, matching the autocast dtype. Previously test inputs were always torch.float32 regardless of autocast setting.

  2. 1e029da Change reduce_range to optional: Addresses the backward compatibility concern from round 6 (issue Adding gpu quantization workflows and apis #1). reduce_range is moved from tensor_attribute_names (required) to optional_tensor_attribute_names, and its type is changed from bool to Optional[bool] with default False. The field is also moved to the end of the Int8Tensor.__new__/__init__/from_hp parameter lists to avoid breaking positional callers. The config classes (Int8DynamicActivationInt8WeightConfig, Int8StaticActivationInt8WeightConfig) are similarly updated, with reduce_range moved after set_inductor_config and validation assertions added.

The x86.py inductor pass code is unchanged from round 5/6.


Issues

1. _pin_memory doesn't propagate act_pre_scale

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:470-480

Not new to this PR, but worth noting since the reduce_range propagation effort surfaced it: the aten._pin_memory implementation propagates reduce_range, zero_point, act_quant_scale, act_quant_zero_point, and act_quant_kwargs, but does not propagate act_pre_scale. The aten.slice.Tensor implementation at line 408-420 does propagate it. This is a pre-existing issue and doesn't need to block this PR, but it could cause data loss if _pin_memory is called on an Int8Tensor that has act_pre_scale set (e.g., SmoothQuant tensors).

2. QuantizeTensorToInt8Kwargs.reduce_range is bool, not Optional[bool]

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:52

QuantizeTensorToInt8Kwargs has reduce_range: bool = False (non-optional), while Int8Tensor has reduce_range: Optional[bool] = False. The docstring says "optional flag" but the type annotation is bool. This is a minor inconsistency — consider making these match. Since QuantizeTensorToInt8Kwargs is a dataclass that users construct explicitly (not deserialized), bool is arguably fine here, but the mismatch could be confusing.

3. _int8_static_activation_int8_weight_transform no longer passes reduce_range to from_hp via keyword before act_quant_kwargs

torchao/quantization/quant_api.py:1231-1242

In the diff, reduce_range=config.reduce_range was moved from before act_quant_kwargs to after act_quant_zero_point. Since from_hp now has reduce_range as the last keyword-only parameter, this is correct. However, the weight itself is quantized via from_hp and the reduce_range flag controls the weight's quant range too. Let me verify the call is still correct... yes, reduce_range is passed as a keyword argument to from_hp, so the reordering is fine. No issue here — just confirming.


Stale Nits from Previous Rounds (Still Present)

Stale type: ignore[possibly-undefined] comments

torchao/quantization/pt2e/inductor_passes/x86.py:2060,2066

out_shape is always defined at line 2052 via kwargs.get("out_shape", None), so these # type: ignore[possibly-undefined] comments are unnecessary. Harmless but should be cleaned up.

Bias+reshape deduplication

torchao/quantization/pt2e/inductor_passes/x86.py:2053-2067

The reshape logic is identical in both the if bias and else branches. Could be simplified to:

if bias is not None:
    new_out_node = match.graph.call_function(
        aten.add.Tensor, args=(new_out_node, bias)
    )
if out_shape is not None:
    new_out_node = match.graph.call_function(
        aten.reshape.default, args=(new_out_node, out_shape)
    )

enable_autocast skip uses return instead of skipTest

test/quantization/pt2e/test_x86inductor_fusion.py:2776-2777

Using return silently passes. self.skipTest("bf16 not supported") would be more visible in test reports.


Test Coverage Note

All four base_config parametrizations use reduce_range=True. The default (reduce_range=False) is not exercised by test_smooth_quant_pattern. The existing test_da8w8_sym_act_sym_wgt_with_int_mm test doesn't use reduce_range, so it effectively tests the False path, which provides some coverage. This is acceptable.


Looks Good

  • Backward compatibility fix (1e029da): Moving reduce_range to optional_tensor_attribute_names addresses the serialization concern from round 6. Old checkpoints without reduce_range will deserialize correctly with None as the default.

  • Input dtype fix (67878e1): Creating bf16 test inputs when enable_autocast=True is more representative of real usage — users passing bf16 inputs under autocast. This also properly exercises the convert_element_type patterns in the graph.

  • Config field reordering: Moving reduce_range after set_inductor_config in both config classes avoids breaking any existing positional construction of these configs.

  • Validation assertions: Both config classes now assert reduce_range in (True, False), catching None or other invalid values early.

  • reduce_range propagation: Thoroughly propagated through slice, _pin_memory, select, and from_hp. All construction sites pass it correctly.

  • x86.py inductor pass: Unchanged and clean. Pattern composition, registration, bias validation, dtype assertion, and per-channel path precision handling all remain correct.


Verdict

This PR is in good shape. The backward compatibility concern from round 6 has been addressed. The remaining items are all minor nits. Ready for merge.


View job

Signed-off-by: Cui, Lily <lily.cui@intel.com>
@Xia-Weiwen
Copy link
Copy Markdown
Collaborator

@claude review again

@claude
Copy link
Copy Markdown

claude Bot commented Apr 9, 2026

Claude finished @Xia-Weiwen's task in 2m 8s —— View job


Code Review: PR #3784 - Update smooth quant fusion pattern (Round 8)

  • Gather context and read changed files
  • Review new commits since round 7
  • Review x86.py inductor pass changes
  • Review test changes
  • Review reduce_range and other file changes
  • Post review feedback

Summary

Since round 7, two new commits have been added:

  1. 5721ede Add act_pre_scale in pin_memory: Fixes the pre-existing issue flagged in round 7 (issue Adding gpu quantization workflows and apis #1) where aten._pin_memory and aten.is_pinned did not propagate act_pre_scale. Both implementations now correctly handle this field. The aten.slice.Tensor, aten.select.int, and aten._pin_memory implementations are now consistent in propagating all fields including act_pre_scale.

  2. Round 7 nits addressed: The bias+reshape deduplication from previous rounds has been implemented — the reshape logic is no longer duplicated in the if bias and else branches. The stale type: ignore[possibly-undefined] comments have been removed. The skipTest change (replacing return with self.skipTest("bf16 not supported")) is now in place.

The x86.py inductor pass pattern code is unchanged from rounds 5-7.


Remaining Minor Items

1. QuantizeTensorToInt8Kwargs.reduce_range type inconsistency

torchao/quantization/quantize_/workflows/int8/int8_tensor.py:52

QuantizeTensorToInt8Kwargs has reduce_range: Optional[bool] = False while Int8Tensor also has reduce_range: Optional[bool] = False. These now match — good. However, at the call site in torchao/quantization/quantize_/common/quantize_tensor_kwargs.py:66, the value is passed directly from the dataclass field to Int8Tensor.from_hp, so there's no type mismatch in practice. This is fine.

2. reduce_range test coverage is one-sided

test/quantization/pt2e/test_x86inductor_fusion.py:2739-2746

All four base_config parametrizations in test_smooth_quant_pattern use reduce_range=True. The default path (reduce_range=False) is not exercised by this specific test. The existing test_da8w8_sym_act_sym_wgt_with_int_mm test doesn't set reduce_range, so it implicitly tests the default False path, providing some coverage. This is acceptable but worth noting — if a future change breaks the reduce_range=False path for the SmoothQuant workflow specifically, this test won't catch it.

3. _test_common tolerance behavior (pre-existing, minor)

test/quantization/pt2e/test_x86inductor_fusion.py:203,208

The atol, rtol = 1e-2, 1e-2 unconditionally overwrites the caller's tolerances when autocast is enabled. The test_smooth_quant_pattern test doesn't pass custom tolerances, so this doesn't affect it. No current callers pass wider tolerances under autocast. This is fine for now.


Looks Good

  • act_pre_scale propagation fix (5721ede): aten.is_pinned and aten._pin_memory now correctly check/propagate act_pre_scale. This fixes the pre-existing issue from round 7 and ensures SmoothQuant tensors survive pin_memory() without data loss.

  • Bias+reshape deduplication: The reshape logic in the per-channel replacement handler is now clean — bias add and reshape are sequential if blocks instead of duplicated in both branches. This addresses the nit from rounds 5-7.

  • skipTest usage (test line 2777): self.skipTest("bf16 not supported") replaces the silent return, making skipped tests visible in test reports.

  • reduce_range as optional attribute: Correctly placed in optional_tensor_attribute_names, preserving backward compatibility with old serialized Int8Tensor checkpoints. The validation assertions in both config classes (assert self.reduce_range in (True, False)) catch invalid values early.

  • reduce_range propagation is comprehensive: All Int8Tensor construction sites (from_hp, aten.slice.Tensor, aten._pin_memory, aten.select.int) correctly propagate reduce_range. Both config classes (Int8DynamicActivationInt8WeightConfig, Int8StaticActivationInt8WeightConfig) thread reduce_range through to from_hp and QuantizeTensorToInt8Kwargs.

  • x86.py inductor pass: Pattern composition, registration, bias validation, dtype assertion, and per-channel path precision handling are all correct and clean. The per-channel path uses x_scale_dtype (float32) for intermediate precision and applies output_convert at the end.

  • Test design: End-to-end SmoothQuantConfig + quantize_() with parametrization over base_config, has_bias, enable_autocast, input_ndim, and dynamic provides good coverage. The check_output_dtype=True addition validates the compiled output dtype matches eager mode.

  • Parameter ordering consistency: reduce_range is consistently placed as the last keyword argument in all Int8Tensor constructor calls (from_hp, slice, _pin_memory, select), maintaining a clean and predictable API.


Verdict

This PR is ready for merge. All issues from previous review rounds have been addressed. The only remaining item is that test_smooth_quant_pattern only tests reduce_range=True — but this is a minor coverage gap since the default path is exercised elsewhere.


View job

Comment thread torchao/quantization/quant_api.py Outdated
Union[Granularity, Tuple[Granularity, Granularity], list[Granularity]]
] = PerRow()
set_inductor_config: bool = True
reduce_range: Optional[bool] = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just be bool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. This is for compatibility purpose (per review comments from claude). If it is not optional, it may break backward compatibility if users saved quantized model with old torchao and now want to load it with the new torchao.

Comment thread torchao/quantization/quant_api.py Outdated
"Please set it to MappingType.SYMMETRIC or "
"MappingType.ASYMMETRIC."
)
assert self.reduce_range in (True, False), (
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is only for x86, maybe do a check to make sure user is using cpu when the flag is set to True?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not on CPU, reduce_range still works so not necessary to check I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to add the check if we don't expect other backend to use it, this will help us decrease the user surface and make it less error prone for users I think

Comment thread torchao/quantization/quant_api.py Outdated
"Please set it to MappingType.SYMMETRIC or "
"MappingType.ASYMMETRIC."
)
assert self.reduce_range in (True, False), (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@jerryzh168
Copy link
Copy Markdown
Contributor

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

@Xia-Weiwen
Copy link
Copy Markdown
Collaborator

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.
A better way to handle this may be that we check AVX512_VNNI availability and set reduce_range accordingly. Though reduce_range=True will still not be tested in CI but code looks better and it can be tested in our own environments. How does that sound to you?

Signed-off-by: Cui, Lily <lily.cui@intel.com>
@pytorch-bot pytorch-bot Bot removed the ciflow/rocm label Apr 10, 2026
@cyxlily
Copy link
Copy Markdown
Contributor Author

cyxlily commented Apr 10, 2026

seems reduce_range is not tested, I'd recommend split the reduce_range change and the fusion change in two separate PRs to make it clearer

Thanks for comments, split the reduce_range to the PR #4266

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xia-Weiwen please feel free to be the codeowner for cpu only changes

you can add yourself to https://github.com/pytorch/ao/blob/main/CODEOWNERS as well for relevant files

@jerryzh168
Copy link
Copy Markdown
Contributor

jerryzh168 commented Apr 13, 2026

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.

can this be set automatically when we are in CPU + not in AVX512_VNNI machine? ideally we make the user's life easier

maybe one question here is do we always prepare and serve the quantized model in the same type of machine or not

@cyxlily
Copy link
Copy Markdown
Contributor Author

cyxlily commented Apr 14, 2026

Hi @jerryzh168 Thanks for reviewing. We add this flag because some or all of the CI machines do not have AVX512_VNNI and the test cases would fail. So, we only test reduce_range=True here. This is also the case in pytorch, where we simply skip correctness test of int8 matmul or use small int8 values intentionally to avoid overflow.

can this be set automatically when we are in CPU + not in AVX512_VNNI machine? ideally we make the user's life easier

maybe one question here is do we always prepare and serve the quantized model in the same type of machine or not

Thanks for comments, I removed the manual reduce_range setting and set it automatically in int8 tensor.

Cui, Lily added 3 commits April 14, 2026 06:03
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Cui, Lily and others added 5 commits April 20, 2026 06:05
Signed-off-by: Cui, Lily <yuxin.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
Signed-off-by: Cui, Lily <lily.cui@intel.com>
@Xia-Weiwen
Copy link
Copy Markdown
Collaborator

@claude Review again

@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @Xia-Weiwen's task in 3m 46s —— View job


Code Review: PR #3784 - Update smooth quant fusion pattern (Round 9)

  • Gather context and read changed files
  • Review x86.py inductor pass changes
  • Review test changes
  • Review other changed files (intmm.py, utils.py)
  • Post review feedback

Summary

Since the last review (round 8), the branch has been rebased/merged with upstream main (commit e2b1ccd), incorporating the reduce_range changes from the separately merged PR #4266. The only new commit since the merge is a3d1fd9 Fix lint, which adds a missing clone_inputs assignment in _test_common. All previous round feedback has been addressed. The PR is in good shape overall.

I reviewed the full diff (4 files, ~430 additions, ~200 deletions) against the upstream merge parent.


Changes Overview

  1. torchao/quantization/pt2e/inductor_passes/x86.py — Major rewrite of _register_smooth_quant_int_mm_pattern():

    • get_pattern_no_bias() now takes reshape_a, convert_scaled_matmul, and use_u8s8 parameters, composing patterns cleanly
    • 16 pattern variants registered (s8s8 + u8s8) × (no_reshape + with_reshape) × (no_bias + with_bias) × (no_convert + with_output_convert)
    • Per-channel path correctly uses x_scale_dtype (float32) for intermediate precision
    • Defensive x_meta_val null check added at line 465-466
  2. test/quantization/pt2e/test_x86inductor_fusion.py — Test redesign:

    • test_smooth_quant_with_int_mmtest_smooth_quant_pattern using end-to-end SmoothQuantConfig + quantize_()
    • Parametrized over base_config (4 configs), has_bias, enable_autocast, input_ndim, dynamic
    • expand_a_scale parameter removed from test_da8w8_sym_act_sym_wgt_with_int_mm
    • check_output_dtype and _clone_inputs additions to _test_common
    • _should_use_u8s8() helper for expected node count calculation
  3. torchao/kernel/intmm.py_cpu_is_amx_tile_supported() moved to torchao/utils.py for reuse

  4. torchao/utils.py_cpu_is_amx_tile_supported() added and exported


Issues

1. _test_common: clone_inputs referenced before assignment when check_code=False

test/quantization/pt2e/test_x86inductor_fusion.py:307-309

The new clone_inputs assignment is inside the if check_code: block (line 307), but the original code at line 309 already had actual = torch.compile(mod, **compile_options)(*clone_inputs) inside that same block, so this is fine. However, looking at the broader structure of _test_common:

with torch.no_grad(), maybe_autocast:
    if check_code:
        clone_inputs = self._clone_inputs(inputs)  # <-- new line
        expected = mod(*inputs)
        actual = torch.compile(mod, **compile_options)(*clone_inputs)

This correctly ensures inputs isn't mutated before comparing expected vs actual. Good fix — the old code was silently using already-mutated inputs for the compiled path (or vice versa).

2. Bias validation was removed — intentional?

The old _validate_pattern had bias shape checks (ensuring bias is 1D, checking reshape compatibility). These were removed. Since the test now uses end-to-end SmoothQuantConfig + quantize_() which always produces valid bias shapes, this is likely fine in practice. But the validation was a safety net against unexpected graph patterns. If the pattern matcher accidentally matches a non-quantization graph with a similar structure, the bias check would have caught it. The weight get_attr check (lines 2091-2095) still provides some protection.

This is a minor concern — just noting that the defense-in-depth was reduced.

3. Node count comment doesn't document u8s8 counts

torchao/quantization/pt2e/inductor_passes/x86.py:1976-1987

The comment block maps node counts 4-9 to specific patterns, then says "8-13: u8s8 decomposed equivalents with input shift and compensation subtraction." The u8s8 patterns add 4 nodes (convert_i32 + add_128 + convert_u8 + sub_comp) on top of the base patterns, so:

  • u8s8 base: 8, 9, 10, 11, 12, 13

But this overlaps with the s8s8 node counts 8 and 9. The validator accepts all of [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] which is correct — the pattern matcher distinguishes by structure, not node count. The comment could be clearer that counts 8-9 are shared between s8s8 and u8s8 variants. Minor documentation nit.


Looks Good

  • Pattern composition via get_pattern_no_bias() with reshape_a, convert_scaled_matmul, and use_u8s8 parameters is clean and avoids the duplication from the old approach.

  • Per-channel precision: The per-channel path (line 2173) correctly uses x_scale_dtype (float32) for the qlinear output dtype, preserving intermediate precision for mul(x_scale). The output_convert is applied at the end (lines 2198-2202). This correctly matches the original graph's computation order.

  • Dtype assertion (lines 2032-2033): assert kwargs["output_dtype"] == kwargs["scaled_matmul_dtype"] catches any divergence in the two convert dtypes, which is a clean invariant check.

  • Pattern registration ordering (lines 1998-2015): More specific patterns (larger, with reshape + bias + output_convert) get lower pass numbers (matched first), preventing partial matches. The 16-pattern registration is well-organized.

  • Bias + reshape deduplication (lines 2188-2196): Clean sequential if blocks instead of duplicated branches.

  • has_output_convert flag (line 2136, 2198-2202): Correctly detects and reproduces the output type conversion in the replacement graph.

  • u8s8 handling: The u8s8 decomposition in both scalar and per-channel paths (lines 2091-2108, 2145-2161) correctly converts int8→int32→add 128→uint8 and subtracts the compensation term, matching the decomposed path in _int_scaled_matmul_cpu.

  • Defensive null check (line 465-466): x_meta_val = match.kwargs["x"].meta.get("val", None) prevents KeyError when "val" is missing from metadata. Good defensive fix.

  • _cpu_is_amx_tile_supported refactor: Moving this from intmm.py to utils.py is the right call — it's now reusable by the test helper _should_use_u8s8().

  • Test redesign: End-to-end SmoothQuantConfig + quantize_() is far more representative and less brittle than the hand-crafted forward methods from the old test. The parametrization matrix (4 configs × 2 bias × 2 autocast × 2 ndim × 2 dynamic = 64 combinations) provides thorough coverage.

  • skipTest usage (line 2849): Properly uses self.skipTest() instead of silent return, making skipped tests visible in reports.

  • check_output_dtype (lines 310-311): Useful general-purpose addition that validates compiled model output dtype matches eager mode.

  • should_reduce_range integration (test lines 2863-2864): Automatically sets reduce_range=True based on hardware capability, making tests portable across CI machines.


Verdict

This PR is ready for merge. All issues from previous review rounds (1-8) have been addressed. The remaining items above are minor documentation nits and a note about the removed bias validation, neither of which should block merging. The code is well-structured, the patterns are comprehensively tested, and the u8s8 support is a valuable addition.


View job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants