-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cache device_count if we haven't initialized CUDA yet #122815
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122815
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit a2889f2 with merge base 852111e (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
torch/cuda/__init__.py
Outdated
# NB: Do not cache the device count prior to CUDA initialization, because | ||
# the number of devices can change due to changes to CUDA_VISIBLE_DEVICES | ||
# setting prior to CUDA initialization. | ||
if _cached_device_count is None and _initialized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: _cached_device_count can only be None if we're here
cc @wyli |
Cool, this potentially fixes #95073 |
Thanks, I agree. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: .github/workflows/trunk.yml / linux-focal-rocm6.0-py3.8 / build, .github/workflows/trunk.yml / macos-12-py3-arm64 / test (default, 1, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…122815) Before initializing CUDA, it can change by modifying CUDA_VISIBLE_DEVICES Fixes pytorch#122085 Fixes pytorch#38616 Fixes pytorch#110000 Fixes pytorch#110971 Fixes pytorch#95073 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#122815 Approved by: https://github.com/albanD
Stack from ghstack (oldest at bottom):
Before initializing CUDA, it can change by modifying CUDA_VISIBLE_DEVICES
Fixes #122085
Fixes #38616
Fixes #110000
Fixes #110971
Fixes #95073
Signed-off-by: Edward Z. Yang ezyang@meta.com