Fix ROCM CI: skipped some tests, added checks, install torchvision by Xia-Weiwen · Pull Request #4218 · pytorch/ao

Xia-Weiwen · 2026-04-02T02:46:28Z

As the title.
These CPU tests require latest torch nightly. However, ROCM CI uses some old version of torch nightly without the latest changes these tests require.
An additional torch version check is also added to scaled_int_mm.

pytorch-bot · 2026-04-02T02:46:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4218

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 5e040d2 with merge base 6807454 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-04-02T02:46:37Z

Warning: Unknown label ciflow/rocm-mi300.
Currently recognized labels are

ciflow/benchmark
ciflow/tutorials
ciflow/rocm
ciflow/4xh100
ciflow/xpu

Please add the new label to .github/pytorch-probot.yml

Xia-Weiwen · 2026-04-02T02:47:49Z

@claude review

claude · 2026-04-02T02:48:09Z

Claude finished @Xia-Weiwen's task in 5m 44s —— View job

Review of PR #4218: Fix CI: skip some CPU tests if torch is ROCM

Read PR diff and changed files
Analyze skip conditions and patterns
Post review feedback

Overall this is a reasonable approach to unblock ROCm CI. The changes are straightforward — skip CPU tests that depend on recent torch nightly features not available in the ROCm torch build. A few observations:

1. Misleading skip message on `test_dynamic_quant_gpu_singleline`

In test/quantization/test_quant_api.py:167-169, the test test_dynamic_quant_gpu_singleline is skipped with the message "Don't test CPU for ROCM version of torch", but the test name says "gpu". Looking at the implementation, the model defaults to CPU (device="cpu" in example_inputs), so the skip is functionally correct — the test does run on CPU despite the name. However, this is confusing. Consider either:

Using a more accurate skip message like "Skipping due to incompatibility with ROCm torch nightly", or
Noting that this test actually runs on CPU (it's not a "GPU" test in practice).

2. Inconsistent skip patterns across the PR

The PR uses two different styles for the same purpose:

Decorator style (test_integration.py:259): @unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch")
Inline style (everywhere else): if device == "cpu" and is_ROCM(): self.skipTest(...)

The inline style makes sense for parametrized tests where the device varies, but test_per_token_linear_cpu uses the decorator style while test_dynamic_quant_gpu_singleline (also non-parametrized, always CPU) uses the inline style. It would be more consistent to use the decorator for both non-parametrized cases.

3. Existing `skip_if_rocm` decorator is available but unused

There's an existing skip_if_rocm decorator in torchao/testing/utils.py:74 (already imported in test_quant_api.py). For non-parametrized tests like test_per_token_linear_cpu and test_dynamic_quant_gpu_singleline, you could use @skip_if_rocm("CPU test requires latest torch nightly") for consistency with the rest of the codebase. However, skip_if_rocm uses pytest.skip() while the current code uses unittest.skipIf/self.skipTest — so this may be a deliberate choice for unittest compatibility.

4. Minor: Consider a comment explaining why

The skip message "Don't test CPU for ROCM version of torch" tells us what is happening but not why. The PR description explains it well (ROCm CI uses an older torch nightly), but a brief code comment would help future maintainers understand when these skips can be removed (e.g., # ROCm CI torch nightly doesn't include the latest CPU-related changes yet).

Summary

The changes are correct and will achieve the goal of unblocking ROCm CI. The issues above are minor style/consistency nits that don't affect correctness. The PR looks good to merge as-is, but addressing the inconsistencies would improve maintainability.

Copilot

Pull request overview

This PR adjusts the test suite to avoid running a subset of CPU-targeted quantization/kernel tests when the installed PyTorch build is ROCm, because ROCm CI is pinned to an older nightly that lacks required changes.

Changes:

Added ROCm-based skips for specific CPU test variants in quantization workflow tests.
Added ROCm-based skips for specific CPU test variants in kernel autotuner tests.
Added ROCm-based skips for specific CPU integration tests (including one decorator-based skip).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
test/quantization/test_quant_api.py	Skips a CPU-executed quantization test when running under ROCm.
test/quantization/quantize_/workflows/int8/test_int8_tensor.py	Skips multiple CPU parametrizations for int8 workflow tests under ROCm.
test/kernel/test_autotuner.py	Skips the CPU parametrization of an autotuner-related test under ROCm.
test/integration/test_integration.py	Skips specific CPU integration tests/paths under ROCm (decorator + runtime skip).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T02:50:55Z

    def test_dynamic_quant_gpu_singleline(self):
+        if is_ROCM():
+            self.skipTest("Don't test CPU for ROCM version of torch")


is_ROCM() currently requires torch.cuda.is_available(), so a ROCm-built torch running without a visible GPU would not be detected and this skip would not trigger. If the intent is to skip when torch is a ROCm build (as the PR description suggests), prefer checking torch.version.hip is not None or using the existing skip_if_rocm helper so the condition matches the message/reason.

Also consider making the skip reason more specific (e.g., depends on newer torch nightly) to avoid implying that CPU testing is unsupported on ROCm in general.

Suggested change

def test_dynamic_quant_gpu_singleline(self):

if is_ROCM():

self.skipTest("Don't test CPU for ROCM version of torch")

@skip_if_rocm(

"test_dynamic_quant_gpu_singleline not supported on ROCm builds (uses CUDA-only path)"

)

def test_dynamic_quant_gpu_singleline(self):

Copilot · 2026-04-02T02:50:56Z

        sizes: tuple,
    ):
        """Test linear operation supports including shape and compile"""
+        if device == "cpu" and is_ROCM():


Same concern as elsewhere: is_ROCM() is gated on torch.cuda.is_available(), so it won’t skip on a ROCm build if no GPU is visible. If the goal is to skip CPU variants specifically for ROCm torch nightlies, consider using torch.version.hip is not None / skip_if_rocm for a build-based check, and keep the skip reason aligned with the actual failure mode (old ROCm nightly).

Suggested change

if device == "cpu" and is_ROCM():

if device == "cpu" and torch.version.hip is not None:

Copilot · 2026-04-02T02:50:56Z

        if device == "cuda" and not torch.cuda.is_available():
            self.skipTest(f"{device} not available")

+        if device == "cpu" and is_ROCM():


is_ROCM() depends on torch.cuda.is_available(), so this won’t skip on a ROCm build if the GPU is not visible to the process. If the skip is meant to apply to ROCm builds (per PR description), prefer torch.version.hip is not None or the existing skip_if_rocm helper for consistency and correctness.

Suggested change

if device == "cpu" and is_ROCM():

if device == "cpu" and (torch.version.hip is not None):

Copilot · 2026-04-02T02:50:56Z

        sqnr = compute_error(y_ref, y)
        self.assertTrue(sqnr >= 39.0, f"{sqnr=} too low")

+    @unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch")


Using @unittest.skipIf(is_ROCM(), ...) inherits is_ROCM()’s requirement that torch.cuda.is_available() is true. If this is intended to skip on ROCm torch builds (even when no GPU is visible), use a build-level check like torch.version.hip is not None or the repo’s skip_if_rocm helper so the decorator matches the intended condition.

Suggested change

@unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch")

@skip_if_rocm("Don't test CPU for ROCM version of torch")

Copilot · 2026-04-02T02:50:57Z

        ):
            self.skipTest("Need CUDA and SM80+ available.")

+        if test_device == "cpu" and is_ROCM():


This skip uses is_ROCM() which requires torch.cuda.is_available(). If the intent is to skip CPU export tests whenever torch is built with ROCm (regardless of GPU visibility), switch to a build-based condition such as torch.version.hip is not None / skip_if_rocm so the skip reliably triggers in all ROCm environments.

Suggested change

if test_device == "cpu" and is_ROCM():

if test_device == "cpu" and getattr(torch.version, "hip", None) is not None:

jerryzh168

I feel we should just have an allowlist of tests that runs in rocm, currently it runs all tests: https://github.com/pytorch/ao/blob/main/.github/workflows/regression_test_rocm.yml#L51

jerryzh168

OK we can land this to unblock CI first, we'll figure allowlist for rocm a bit later

pytorch-bot · 2026-04-02T03:14:56Z

Warning: Unknown label ciflow/rocm-mi300.
Currently recognized labels are

ciflow/benchmark
ciflow/tutorials
ciflow/rocm
ciflow/4xh100
ciflow/xpu

Please add the new label to .github/pytorch-probot.yml

jerryzh168 · 2026-04-02T05:08:55Z

can you fix the torchvision issue in rocm CI following da257b5 as well

should be fixed by adding torchvision here: https://github.com/pytorch/ao/blob/main/.github/workflows/regression_test_rocm.yml#L25 I think

Xia-Weiwen · 2026-04-02T05:15:01Z

@jerryzh168 Sure.

Fix CI: skip some CPU tests if torch is ROCM

49bc7b5

pytorch-bot Bot added ciflow/rocm-mi300 module: rocm labels Apr 2, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2026

Xia-Weiwen added ciflow/rocm module: not user facing Use this tag if you don't want this PR to show up in release notes and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. ciflow/rocm-mi300 module: rocm labels Apr 2, 2026

Xia-Weiwen requested a review from jerryzh168 April 2, 2026 02:47

Xia-Weiwen marked this pull request as ready for review April 2, 2026 02:47

Xia-Weiwen requested a review from Copilot April 2, 2026 02:47

Copilot started reviewing on behalf of Xia-Weiwen April 2, 2026 02:48 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

jerryzh168 reviewed Apr 2, 2026

View reviewed changes

jerryzh168 approved these changes Apr 2, 2026

View reviewed changes

Additional torch version check

3d204dd

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2026

pytorch-bot Bot added ciflow/rocm-mi300 module: rocm labels Apr 2, 2026

Install torchvision for ROCM CI

5e040d2

Xia-Weiwen changed the title ~~Fix CI: skip some CPU tests if torch is ROCM~~ Fix ROCM CI: skipped some tests, added checks, install torchvision Apr 2, 2026

Xia-Weiwen merged commit c49a60e into pytorch:main Apr 2, 2026
20 checks passed

Freed-Wu mentioned this pull request Apr 12, 2026

Add torch.uint16, torch.uint32 #4269

Open

	if device == "cpu" and is_ROCM():
	if device == "cpu" and torch.version.hip is not None:

	if device == "cpu" and is_ROCM():
	if device == "cpu" and (torch.version.hip is not None):

	@unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch")
	@skip_if_rocm("Don't test CPU for ROCM version of torch")

	if test_device == "cpu" and is_ROCM():
	if test_device == "cpu" and getattr(torch.version, "hip", None) is not None:

Conversation

Xia-Weiwen commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4218

⏳ No Failures, 1 Pending

Uh oh!

pytorch-bot Bot commented Apr 2, 2026

Uh oh!

Xia-Weiwen commented Apr 2, 2026

Uh oh!

claude Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #4218: Fix CI: skip some CPU tests if torch is ROCM

1. Misleading skip message on test_dynamic_quant_gpu_singleline

2. Inconsistent skip patterns across the PR

3. Existing skip_if_rocm decorator is available but unused

4. Minor: Consider a comment explaining why

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

pytorch-bot Bot commented Apr 2, 2026

Uh oh!

jerryzh168 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xia-Weiwen commented Apr 2, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 2, 2026 •

edited

Loading

claude Bot commented Apr 2, 2026 •

edited

Loading

1. Misleading skip message on `test_dynamic_quant_gpu_singleline`

3. Existing `skip_if_rocm` decorator is available but unused

jerryzh168 commented Apr 2, 2026 •

edited

Loading