Make object-related collective functions accept a device parameter #98130

shaoyf42 · 2023-04-01T06:53:13Z

Make object-related collective functions accept a device parameter. Then those functions can support custom device to fix #97938
The next step will be to extend the _get_pg_device for supporting the custom device and backend.

pytorch-bot · 2023-04-01T06:53:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98130

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 83ee045:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shaoyf42 · 2023-04-04T12:42:11Z

@H-Huang could you take a look

ezyang

Sure why not

ezyang · 2023-04-05T00:56:06Z

@pytorchbot merge

pytorchmergebot · 2023-04-05T00:58:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-04-05T06:56:29Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

ezyang · 2023-04-05T12:12:53Z

@pytorchbot merge

pytorchmergebot · 2023-04-05T12:14:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-04-05T12:14:49Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR:
@pytorchbot merge -r
Or just rebase by leaving @pytorchbot rebase comment

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-04-05T14:46:12Z

@pytorchbot merge -r

pytorchmergebot · 2023-04-05T14:49:24Z

@pytorchbot successfully started a rebase job. Check the current status here

accept a device parameter.

pytorchmergebot · 2023-04-05T14:49:29Z

Successfully rebased dev_c10d onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dev_c10d && git pull --rebase)

pytorchmergebot · 2023-04-05T14:50:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-04-05T20:47:41Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

kwen2501

Putting a temporary hold here as we were discussing in #97938 (comment) that we may want to move all **_object collectives to cpu only. (In that case, custom backend don't need to implement them.)

kwen2501 · 2023-04-28T18:06:53Z

FYI of similar try:
Figure out device to use for object collectives #100238

shaoyf42 · 2023-05-11T12:38:22Z

Closing as the rebased version #100954 has been merged to main.
A thousand thanks to @huihoaan @kwen2501 !

shaoyf42 requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, wanchaol, fegin, kiukchung and d4l3k as code owners April 1, 2023 06:53

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Apr 1, 2023

pytorchbot added the open source label Apr 1, 2023

ezyang approved these changes Apr 5, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 5, 2023

Make object-related collective functions

83ee045

accept a device parameter.

pytorchmergebot force-pushed the dev_c10d branch from 64ea3ef to 83ee045 Compare April 5, 2023 14:49

kwen2501 requested changes Apr 5, 2023

View reviewed changes

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 10, 2023

shaoyf42 closed this May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make object-related collective functions accept a device parameter #98130

Make object-related collective functions accept a device parameter #98130

shaoyf42 commented Apr 1, 2023 •

edited

pytorch-bot bot commented Apr 1, 2023 •

edited

shaoyf42 commented Apr 4, 2023

ezyang left a comment

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

kwen2501 left a comment

kwen2501 commented Apr 28, 2023

shaoyf42 commented May 11, 2023

Make object-related collective functions accept a device parameter #98130

Make object-related collective functions accept a device parameter #98130

Conversation

shaoyf42 commented Apr 1, 2023 • edited

pytorch-bot bot commented Apr 1, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98130

✅ No Failures

shaoyf42 commented Apr 4, 2023

ezyang left a comment

Choose a reason for hiding this comment

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

Merge started

pytorchmergebot commented Apr 5, 2023

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

Merge started

pytorchmergebot commented Apr 5, 2023

Merge failed

ezyang commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

pytorchmergebot commented Apr 5, 2023

Merge started

pytorchmergebot commented Apr 5, 2023

kwen2501 left a comment

Choose a reason for hiding this comment

kwen2501 commented Apr 28, 2023

shaoyf42 commented May 11, 2023

shaoyf42 commented Apr 1, 2023 •

edited

pytorch-bot bot commented Apr 1, 2023 •

edited