-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix the device type for with_comms decorator #125798
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125798
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d9621b6 with merge base bd3cbdb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
# if enough GPU we can use GPU, otherwise we fallback to CPU | ||
if torch.cuda.is_available() and torch.cuda.device_count() < self.world_size: | ||
self.device_type = "cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your intention, but the "and" here seems a bit weird.
What about:
if not torch.cuda.is_available() or torch.cuda.device_count() < self.world_size:
Or maybe I missed your other intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that works too, let me change to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might also fix the current CI issues, let's see
found by yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. #125366 cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
found by yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. #125366 cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG if CI passes!
found by yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. #125366 cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
found by yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. #125366 cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
found by yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. #125366 cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
found by @yifuwang, it looks like we are wrongly using self.device_type="cuda" for gloo backend, which are triggering some flakiness. i.e. pytorch#125366 Pull Request resolved: pytorch#125798 Approved by: https://github.com/yifuwang
Stack from ghstack (oldest at bottom):
found by @yifuwang, it looks like we are wrongly using
self.device_type="cuda" for gloo backend, which are triggering some
flakiness. i.e. #125366
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k