-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Closed
Labels
module: error checkingBugs related to incorrect/lacking error checkingBugs related to incorrect/lacking error checkingmodule: rocmAMD GPU support for PytorchAMD GPU support for PytorchrocmThis tag is for PRs from ROCm teamThis tag is for PRs from ROCm teamtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Hi,
I was trying to reproduce this error:
https://github.com/pytorch/pytorch/actions/runs/14086862973/job/39455645090
PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=1 PYTORCH_TEST_WITH_ROCM=1 python test/test_ops.py TestCommonCUDA.test_noncontiguous_samples_native_layer_norm_cuda_float32
Traceback (most recent call last):
File "/home/ahmads/personal/pytorch/test/test_ops.py", line 2825, in <module>
instantiate_device_type_tests(TestCommon, globals())
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_device_type.py", line 928, in instantiate_device_type_tests
device_type_test_class.instantiate_test(
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_device_type.py", line 536, in instantiate_test
instantiate_test_helper(
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_device_type.py", line 443, in instantiate_test_helper
test = decorator(test)
^^^^^^^^^^^^^^^
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_device_type.py", line 1784, in skipCUDAIfNoCusolver
not has_cusolver() and not has_hipsolver(), "cuSOLVER not available"
^^^^^^^^^^^^^^^
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_device_type.py", line 1776, in has_hipsolver
rocm_version = _get_torch_rocm_version()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_cuda.py", line 267, in _get_torch_rocm_version
return tuple(int(x) for x in rocm_version.split("."))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_cuda.py", line 267, in <genexpr>
return tuple(int(x) for x in rocm_version.split("."))
^^^^^^
ValueError: invalid literal for int() with base 10: 'None'
I think this error should be more descriptive and say something like "you need to run this on a ROCm machine" instead of this error.
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @malfet
Metadata
Metadata
Assignees
Labels
module: error checkingBugs related to incorrect/lacking error checkingBugs related to incorrect/lacking error checkingmodule: rocmAMD GPU support for PytorchAMD GPU support for PytorchrocmThis tag is for PRs from ROCm teamThis tag is for PRs from ROCm teamtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Done