[xpu][test][1/N] Enable tests of test_nn.py on Intel GPU - instantiate TestNN with instantiate_device_type_tests#166396
[xpu][test][1/N] Enable tests of test_nn.py on Intel GPU - instantiate TestNN with instantiate_device_type_tests#166396daisyden wants to merge 10 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166396
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit ee010a0 with merge base 24e0e50 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
fb44286 to
df99d6c
Compare
test/test_nn.py
Outdated
| for cuda in ['cuda', 'cuda:0' if torch.cuda.device_count() == 1 else 'cuda:1']: | ||
| m2 = m.cuda(device=cuda) | ||
| self.assertIs(m2, m2.to(cuda)) | ||
| if torch.cuda.is_available() or torch.xpu.is_available(): |
There was a problem hiding this comment.
| if torch.cuda.is_available() or torch.xpu.is_available(): | |
| if TEST_GPU : |
test/test_nn.py
Outdated
| def test_CTCLoss_zero_lengths(self): | ||
| devices = ['cpu'] | ||
| devices += ['cuda'] if TEST_CUDA else [] | ||
| devices += [device_type] if TEST_CUDA or TEST_XPU else [] |
There was a problem hiding this comment.
| devices += [device_type] if TEST_CUDA or TEST_XPU else [] | |
| devices += [device_type] if TEST_GPU else [] |
test/test_nn.py
Outdated
| self.assertTrue((inp.grad == 0).all().item()) | ||
|
|
||
| @unittest.skipIf(not TEST_CUDA, 'CUDA not available') | ||
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, 'CUDA and XPU not available') |
There was a problem hiding this comment.
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, 'CUDA and XPU not available') | |
| @unittest.skipIf(not TEST_GPU, 'CUDA and XPU not available') |
test/test_nn.py
Outdated
|
|
||
|
|
||
| @unittest.skipIf(not TEST_CUDA, 'CUDA not available') | ||
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, 'CUDA and XPU not available') |
There was a problem hiding this comment.
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, 'CUDA and XPU not available') | |
| @unittest.skipIf(not TEST_GPU, 'CUDA and XPU not available') |
test/test_nn.py
Outdated
|
|
||
| @unittest.skipIf(not TEST_CUDA, "CUDA unavailable") | ||
| @unittest.skipIf(not TEST_CUDNN, "needs cudnn") | ||
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, "CUDA and XPU unavailable") |
There was a problem hiding this comment.
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, "CUDA and XPU unavailable") | |
| @unittest.skipIf(not TEST_GPU, "CUDA and XPU unavailable") |
test/test_nn.py
Outdated
| @unittest.skipIf(not TEST_CUDA, "CUDA unavailable") | ||
| @unittest.skipIf(not TEST_CUDNN, "needs cudnn") | ||
| @unittest.skipIf(not TEST_CUDA and not TEST_XPU, "CUDA and XPU unavailable") | ||
| @unittest.skipIf(not TEST_CUDNN and not TEST_XPU, "needs cudnn or xpu") |
There was a problem hiding this comment.
| @unittest.skipIf(not TEST_CUDNN and not TEST_XPU, "needs cudnn or xpu") | |
| @unittest.skipIf(not TEST_CUDNN, "needs cudnn") |
There was a problem hiding this comment.
Here don't need TEST_XPU, right
There was a problem hiding this comment.
The case is general and can work on XPU so I didn't skip it.
test/test_nn.py
Outdated
|
|
||
|
|
||
| @unittest.skipIf(not torch.cuda.is_available(), "CUDA not available") | ||
| @unittest.skipIf(not torch.cuda.is_available() and not torch.xpu.is_available(), "CUDA and XPU not available") |
There was a problem hiding this comment.
| @unittest.skipIf(not torch.cuda.is_available() and not torch.xpu.is_available(), "CUDA and XPU not available") | |
| @unittest.skipIf(not TEST_GPU, "CUDA and XPU not available") |
test/test_nn.py
Outdated
| ) | ||
| def test_batchnorm(self, dims, mode, memory_format, ref_backend, mixed, dtype): | ||
| if torch.version.cuda: | ||
| if torch.version.cuda or torch.version.xpu: |
There was a problem hiding this comment.
Originally, it means CUDA skips these cases, but ROCM doesn't. I think we should run these cases and see what will happen.
test/test_nn.py
Outdated
| _inference(memory_format, ref_backend, mixed, dtype) | ||
|
|
||
| @unittest.skipIf(not torch.cuda.is_available(), "CUDA not available") | ||
| @unittest.skipIf(not torch.cuda.is_available() and not torch.xpu.is_available(), "CUDA and XPU not available") |
There was a problem hiding this comment.
| @unittest.skipIf(not torch.cuda.is_available() and not torch.xpu.is_available(), "CUDA and XPU not available") | |
| @unittest.skipIf(not TEST_GPU, "CUDA and XPU not available") |
0ee4e01 to
343d722
Compare
343d722 to
d7c7830
Compare
test/test_nn.py
Outdated
| self.assertTrue((inp.grad == 0).all().item()) | ||
|
|
||
| @unittest.skipIf(not TEST_CUDA, 'CUDA not available') | ||
| @unittest.skipIf(not TEST_GPU, 'CUDA and XPU not available') |
There was a problem hiding this comment.
| @unittest.skipIf(not TEST_GPU, 'CUDA and XPU not available') | |
| @unittest.skipIf(not TEST_GPU, 'The current accelerator is not available') |
test/test_nn.py
Outdated
| self.skipTest("Failed on CUDA") | ||
| self.skipTest(f"Failed on {device_type.upper()}") | ||
|
|
||
| if torch.version.xpu: |
There was a problem hiding this comment.
| if torch.version.xpu: | |
| if TEST_XPU: |
d7c7830 to
092c678
Compare
malfet
left a comment
There was a problem hiding this comment.
Is it possible to split this change into few smaller ones? For example, change to common_device_types and test_namedtensor.py looks totally fine and majority of the changes to test_nn are also OK, but multigpu changes seems a bit risky
Perhaps let's do single GPU enablement in 1 PR and than multi-GPU in another
test/test_nn.py
Outdated
| if device == 'cuda': | ||
| rnn.cuda() | ||
| input = input.cuda() | ||
| if device == 'cuda' or device == 'xpu': |
There was a problem hiding this comment.
| if device == 'cuda' or device == 'xpu': | |
| if device in ['cuda', 'xpu']: |
There was a problem hiding this comment.
Thanks for the suggestions! I will split the PR into two, one for single GPU and one for multi-gpu.
There was a problem hiding this comment.
Done, this PR is now only for single gpu.
test/test_nn.py
Outdated
| import numpy as np | ||
|
|
||
| TEST_GPU = torch.cuda.is_available() or torch.xpu.is_available() | ||
| TEST_MULTIGPU = TEST_GPU and torch.accelerator.device_count() >= 2 |
There was a problem hiding this comment.
Why not change the definition in common_cuda.py?
There was a problem hiding this comment.
Can I add TEST_GPU and TEST_MULTIGPU definition in common_utils.py?
There was a problem hiding this comment.
Hi, @malfet, I removed the code updates for multi-gpu from this PR and will create another PR for them. I also moved the TEST_GPU definition to common_utils. Please further review this PR. Thanks.
There was a problem hiding this comment.
Hi, @malfet, TEST_GPU is not a general definition. After consideration I feel it is not appropriate to put under common_utils or common_cuda. I define it as
TEST_GPU = TEST_CUDA or TEST_XPU
because other accelerators have limitation on test_nn, so I only port those cases for XPU.
|
@claude Could you do another pass on this PR, taking into account previous review comments. Make sure that the functionality being tested was not changed. |
|
Claude finished @daisyden's task in 3m 26s —— View job PR Review: [xpu][test][1/N] Enable tests of test_nn.py on Intel GPU
SummaryThis PR migrates
|
| Feedback | Status |
|---|---|
Critical: device_list undefined at instantiation |
Fixed ✅ - device_list = None default, uses except_for pattern |
Remove internal device_list loops |
Fixed ✅ - All internal loops removed |
test_to loop variable gpu unused |
Fixed ✅ - Now correctly uses gpu inside loop body |
Missing f-string in test_batchnorm skip list |
Fixed ✅ - Line 5259 correctly uses f-string |
test_upsampling_bfloat16 shadows device |
Fixed ✅ - Internal loop removed |
test_mse_loss_mixed_dtype_grad deleted |
Fixed ✅ - Restored with device parameter (line 2672) |
test_grid_sample_error_checking device-mismatch test |
Fixed ✅ - input and grid created on CPU (line 5798), then input.to(device) moves only input to accelerator while grid stays on CPU |
test_projections_lstm_args_check model/input device mismatch |
Fixed ✅ - get_inputs() now takes and uses device parameter |
test_cosine_similarity_mixed_precision x5/x6 not on device |
Fixed ✅ - Now uses device=device (lines 5790-5791) |
largeTensorTest hardcoded device="cuda" |
Fixed ✅ - device= argument removed |
test_interpolate_buffer_overflow internal device loop |
Fixed ✅ - Loop removed, uses device parameter directly |
cudnn tests marked @onlyAccelerator |
Partially fixed - test_batchnorm_cudnn_nhwc correctly no longer has @onlyAccelerator. test_batchnorm_cudnn_half still has @onlyAccelerator but the cudnn portion is gated behind if TEST_CUDNN: so non-CUDA accelerators only run the THNN portion - acceptable |
@mikaylagawarecki suggestion on except_for for 3rd party backends |
Fixed ✅ - Uses except_for=device_list pattern |
Remaining Issues
1. Minor: test_cosine_similarity_mixed_precision - x3/x4 CPU-only comparison has a subtle mismatch (test/test_nn.py:5778-5788)
Lines 5778-5779 create x3 and x4 on CPU (no device=device), compute result_cpu, then lines 5784-5788 compare against an accelerator copy. This is intentional for cross-device comparison. However, there's a minor issue: result_cpu on line 5795 refers to torch.ones(2, dtype=torch.float16) which is created on CPU - when device is an accelerator, the comparison torch.isclose(result, torch.ones(2, dtype=torch.float16)) compares a device tensor with a CPU tensor. PyTorch handles this automatically, so it works, but it's slightly inconsistent.
This is not a blocker - just a nit for consistency.
2. Minor: test_interpolate_buffer_overflow dead code branch (test/test_nn.py:6930)
Line 6930 has device == 'cuda' and dtype == torch.bfloat16 as a skip condition. When this test runs on a non-CUDA accelerator (e.g., XPU), this branch will never match since device != 'cuda', so bfloat16 would be tested on XPU even though the original intent was only to skip bfloat16 on CUDA. If XPU has similar bfloat16 limitations, this might need updating in a follow-up PR. Not a blocker.
Verdict
The PR is ready to merge. All previously identified bugs and functionality changes have been fixed. The migration approach using instantiate_device_type_tests is correct. The two remaining notes are minor and don't affect correctness.
There was a problem hiding this comment.
Thank you!
For my understanding -- the main change in this PR was to instantiate TestNN with instantiate_device_type_tests. TestNN now no longer has @onlyCUDA. It seems like this PR also migrated some (but not all) @onlyCUDA in TestNNDeviceType to @onlyAccelerator. Will there be a follow-up PR that removes more @onlyCUDA from TestNNDeviceType (or moves them into a separate class if they are truly cuda-specific)?
|
test_comprehensive_linalg_lu_factor_xpu_float32 |
|
@pytorchbot merge -i |
|
This PR has pending changes requested. Please address the comments and update the PR before merging. |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: xpu / linux-noble-xpu-n-py3.10 / test (default, 6, 12, linux.idc.xpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: xpu / linux-noble-xpu-n-py3.10 / test (default, 3, 12, linux.idc.xpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 2 checks: xpu / linux-noble-xpu-n-py3.10 / test (default, 3, 12, linux.idc.xpu), trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert This PR is attributed to have caused regression in: Please investigate and fix the issues. |
|
Claude finished @pytorch-auto-revert[bot]'s task in 5m 16s —— View job Revert Diagnostics for PR #166396
Root CauseThe revert is legitimate. The CI failure is a direct result of this PR's changes. Error: (and in a parallel shard: This occurs at Why It HappensThe
When This error only manifests in the
How To FixThe author needs to handle the dynamically-generated tests from
The key insight is that |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@daisyden your PR has been successfully reverted. |
…stantiate TestNN with instantiate_device_type_tests (#166396)" This reverts commit b87c36c. Reverted #166396 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#166396 (comment)))
malfet
left a comment
There was a problem hiding this comment.
Two comments:
- Test plan for changes like that should include links to some runs before and after the change to guarantee there are no significant changes in the number of tests run before/after change (as this have a potential of skipping lots of those for no good reasion
- Never use
skipIfdecorator unless it hard crashes, instead useextectedFailureIf
| _dtype = dtype_name(dtype) | ||
| if torch.version.cuda or device == 'xpu': | ||
| skip_tests = [ | ||
| f"test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16_{device}_{_dtype}", |
There was a problem hiding this comment.
Err, why do you need to parametrize test name of of a sudden?
| def test_grid_sample_half_precision(self): | ||
| @onlyAccelerator | ||
| # TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64 | ||
| @skipIfMPS |
There was a problem hiding this comment.
Never use skipIfMPS, rather @expectedFailureMPS or something of that nature
| device_list = None | ||
| # https://github.com/pytorch/pytorch/issues/177119 | ||
| if os.environ.get('PYTORCH_TEST_WITH_DYNAMO', '0') == '1': | ||
| device_list = ('cpu', ) | ||
|
|
||
| instantiate_device_type_tests(TestNN, globals(), except_for=device_list) |
There was a problem hiding this comment.
Can you elaborate why this is needed?
|
To make probability of revert much smaller, do you mind breaking it down into several smaller PRs? For example one can add |
For #114850, we will port aten unit tests to Intel GPU. This PR will work on test/test_nn.py TEST_NN class for single GPU test only. We could enable Intel GPU with following methods and try the best to keep the original code styles: