Skip to content

Fix get_device_properties crash when CUDA is installed but no GPU#45509

Open
Jah-yee wants to merge 1 commit intohuggingface:mainfrom
Jah-yee:fix/cuda-guard-get-device-properties
Open

Fix get_device_properties crash when CUDA is installed but no GPU#45509
Jah-yee wants to merge 1 commit intohuggingface:mainfrom
Jah-yee:fix/cuda-guard-get-device-properties

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Apr 18, 2026

Good day

Problem

In src/transformers/testing_utils.py, the get_device_properties() function checks IS_CUDA_SYSTEM to determine whether to call torch.cuda.get_device_capability(). However, IS_CUDA_SYSTEM is set to True when torch.version.cuda is not None, regardless of whether a GPU is actually available.

When CUDA is installed (e.g., CUDA toolkit on a headless cloud instance with no GPU), but no GPU device is present, torch.cuda.get_device_capability() raises a RuntimeError, causing the function to crash.

Solution

Added a guard check for torch.cuda.is_available() before calling torch.cuda.get_device_capability():

# Before
if IS_CUDA_SYSTEM or IS_ROCM_SYSTEM:

# After
if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available():

When CUDA is installed but no GPU is available, the function will now fall through to the else branch and return (torch_device, None, None) instead of crashing.

Testing

  • The fix is a minimal, targeted change — only one line modified.
  • The reproduction is straightforward: run get_device_properties() on a CUDA-installed but GPU-less environment (e.g., cloud CPU-only instance with CUDA toolkit).
  • The existing test test_get_device_properties should continue to pass on GPU-enabled machines.

Fixes #45341


Thank you for your attention. If there are any issues or suggestions, please leave a comment and I will address them promptly.

Warmly,
RoomWithOutRoof

…available

When torch.cuda.is_available() returns False (e.g., headless cloud instance
with CUDA toolkit installed but no GPU), torch.cuda.get_device_capability()
would raise a RuntimeError. Added a guard check before accessing CUDA.

Fixes huggingface#45341
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45509&sha=92806c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A little bug in testing_utils.py

1 participant