-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make object-related collective functions accept a device parameter #98130
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98130
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 83ee045: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@H-Huang could you take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure why not
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR: Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -r |
@pytorchbot successfully started a rebase job. Check the current status here |
accept a device parameter.
Successfully rebased |
64ea3ef
to
83ee045
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a temporary hold here as we were discussing in #97938 (comment) that we may want to move all **_object
collectives to cpu
only. (In that case, custom backend don't need to implement them.)
FYI of similar try: |
Make object-related collective functions accept a device parameter. Then those functions can support custom device to fix #97938
The next step will be to extend the _get_pg_device for supporting the custom device and backend.