Fix ROCM CI: skipped some tests, added checks, install torchvision#4218
Fix ROCM CI: skipped some tests, added checks, install torchvision#4218Xia-Weiwen merged 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4218
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit 5e040d2 with merge base 6807454 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
|
@claude review |
|
Claude finished @Xia-Weiwen's task in 5m 44s —— View job Review of PR #4218: Fix CI: skip some CPU tests if torch is ROCM
Overall this is a reasonable approach to unblock ROCm CI. The changes are straightforward — skip CPU tests that depend on recent torch nightly features not available in the ROCm torch build. A few observations: 1. Misleading skip message on
|
There was a problem hiding this comment.
Pull request overview
This PR adjusts the test suite to avoid running a subset of CPU-targeted quantization/kernel tests when the installed PyTorch build is ROCm, because ROCm CI is pinned to an older nightly that lacks required changes.
Changes:
- Added ROCm-based skips for specific CPU test variants in quantization workflow tests.
- Added ROCm-based skips for specific CPU test variants in kernel autotuner tests.
- Added ROCm-based skips for specific CPU integration tests (including one decorator-based skip).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| test/quantization/test_quant_api.py | Skips a CPU-executed quantization test when running under ROCm. |
| test/quantization/quantize_/workflows/int8/test_int8_tensor.py | Skips multiple CPU parametrizations for int8 workflow tests under ROCm. |
| test/kernel/test_autotuner.py | Skips the CPU parametrization of an autotuner-related test under ROCm. |
| test/integration/test_integration.py | Skips specific CPU integration tests/paths under ROCm (decorator + runtime skip). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_dynamic_quant_gpu_singleline(self): | ||
| if is_ROCM(): | ||
| self.skipTest("Don't test CPU for ROCM version of torch") |
There was a problem hiding this comment.
is_ROCM() currently requires torch.cuda.is_available(), so a ROCm-built torch running without a visible GPU would not be detected and this skip would not trigger. If the intent is to skip when torch is a ROCm build (as the PR description suggests), prefer checking torch.version.hip is not None or using the existing skip_if_rocm helper so the condition matches the message/reason.
Also consider making the skip reason more specific (e.g., depends on newer torch nightly) to avoid implying that CPU testing is unsupported on ROCm in general.
| def test_dynamic_quant_gpu_singleline(self): | |
| if is_ROCM(): | |
| self.skipTest("Don't test CPU for ROCM version of torch") | |
| @skip_if_rocm( | |
| "test_dynamic_quant_gpu_singleline not supported on ROCm builds (uses CUDA-only path)" | |
| ) | |
| def test_dynamic_quant_gpu_singleline(self): |
| sizes: tuple, | ||
| ): | ||
| """Test linear operation supports including shape and compile""" | ||
| if device == "cpu" and is_ROCM(): |
There was a problem hiding this comment.
Same concern as elsewhere: is_ROCM() is gated on torch.cuda.is_available(), so it won’t skip on a ROCm build if no GPU is visible. If the goal is to skip CPU variants specifically for ROCm torch nightlies, consider using torch.version.hip is not None / skip_if_rocm for a build-based check, and keep the skip reason aligned with the actual failure mode (old ROCm nightly).
| if device == "cpu" and is_ROCM(): | |
| if device == "cpu" and torch.version.hip is not None: |
| if device == "cuda" and not torch.cuda.is_available(): | ||
| self.skipTest(f"{device} not available") | ||
|
|
||
| if device == "cpu" and is_ROCM(): |
There was a problem hiding this comment.
is_ROCM() depends on torch.cuda.is_available(), so this won’t skip on a ROCm build if the GPU is not visible to the process. If the skip is meant to apply to ROCm builds (per PR description), prefer torch.version.hip is not None or the existing skip_if_rocm helper for consistency and correctness.
| if device == "cpu" and is_ROCM(): | |
| if device == "cpu" and (torch.version.hip is not None): |
| sqnr = compute_error(y_ref, y) | ||
| self.assertTrue(sqnr >= 39.0, f"{sqnr=} too low") | ||
|
|
||
| @unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch") |
There was a problem hiding this comment.
Using @unittest.skipIf(is_ROCM(), ...) inherits is_ROCM()’s requirement that torch.cuda.is_available() is true. If this is intended to skip on ROCm torch builds (even when no GPU is visible), use a build-level check like torch.version.hip is not None or the repo’s skip_if_rocm helper so the decorator matches the intended condition.
| @unittest.skipIf(is_ROCM(), "Don't test CPU for ROCM version of torch") | |
| @skip_if_rocm("Don't test CPU for ROCM version of torch") |
| ): | ||
| self.skipTest("Need CUDA and SM80+ available.") | ||
|
|
||
| if test_device == "cpu" and is_ROCM(): |
There was a problem hiding this comment.
This skip uses is_ROCM() which requires torch.cuda.is_available(). If the intent is to skip CPU export tests whenever torch is built with ROCm (regardless of GPU visibility), switch to a build-based condition such as torch.version.hip is not None / skip_if_rocm so the skip reliably triggers in all ROCm environments.
| if test_device == "cpu" and is_ROCM(): | |
| if test_device == "cpu" and getattr(torch.version, "hip", None) is not None: |
jerryzh168
left a comment
There was a problem hiding this comment.
I feel we should just have an allowlist of tests that runs in rocm, currently it runs all tests: https://github.com/pytorch/ao/blob/main/.github/workflows/regression_test_rocm.yml#L51
jerryzh168
left a comment
There was a problem hiding this comment.
OK we can land this to unblock CI first, we'll figure allowlist for rocm a bit later
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
|
can you fix the torchvision issue in rocm CI following da257b5 as well should be fixed by adding torchvision here: https://github.com/pytorch/ao/blob/main/.github/workflows/regression_test_rocm.yml#L25 I think |
|
@jerryzh168 Sure. |
As the title.
These CPU tests require latest torch nightly. However, ROCM CI uses some old version of torch nightly without the latest changes these tests require.
An additional torch version check is also added to scaled_int_mm.