Skip to content

fix(testing_utils): guard get_device_capability with torch.cuda.is_available()#45472

Closed
kevinmalana wants to merge 1 commit intohuggingface:mainfrom
kevinmalana:fix/45341-get-device-properties-cuda-guard
Closed

fix(testing_utils): guard get_device_capability with torch.cuda.is_available()#45472
kevinmalana wants to merge 1 commit intohuggingface:mainfrom
kevinmalana:fix/45341-get-device-properties-cuda-guard

Conversation

@kevinmalana
Copy link
Copy Markdown

What does this PR do?

Fixes a crash in get_device_properties() in testing_utils.py when CUDA is installed on the system but no GPU device is present (e.g., a CPU-only cloud studio with CUDA libraries installed).

The function called torch.cuda.get_device_capability() immediately after checking IS_CUDA_SYSTEM (which is True whenever torch.version.cuda is not None), without first verifying that an actual GPU is available. On CUDA-installed but GPU-less systems, get_device_capability() raises an error.

Fixes #45341

Changes

  • src/transformers/testing_utils.py: Add if not torch.cuda.is_available(): return (torch_device, None, None) guard inside the IS_CUDA_SYSTEM or IS_ROCM_SYSTEM branch of get_device_properties(), before the get_device_capability() call.

Tests

This is a fix to the test infrastructure itself (testing_utils.py). The change prevents a crash that occurs in environments where IS_CUDA_SYSTEM=True but no physical GPU is present.

Notes

…ailable()

Fixes crash in get_device_properties() when CUDA is installed but no GPU
is present (e.g., CPU-only cloud studio with CUDA libraries). The function
checked IS_CUDA_SYSTEM (torch.version.cuda is not None) but then called
torch.cuda.get_device_capability() without verifying an actual GPU exists.

Fixes huggingface#45341
@Rocketknight1
Copy link
Copy Markdown
Member

Please check that there aren't 4 other identical duplicate code agent PRs before you open number 5
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A little bug in testing_utils.py

2 participants