Thread PG: add class _World to distributed_c10d.py (#86348) #781

yhcharles · 2022-11-03T21:48:01Z

Summary:
Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

X-link: pytorch/pytorch#86348

Differential Revision: D40236769

Pulled By: yhcharles

facebook-github-bot · 2022-11-03T21:48:59Z

This pull request was exported from Phabricator. Differential Revision: D40236769

Summary: X-link: pytorch/pytorch#88471 Pull Request resolved: meta-pytorch#781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: pytorch/pytorch#86348 Reviewed By: gnadathur Differential Revision: D40236769 Pulled By: yhcharles fbshipit-source-id: ebd6080e4923da549800a048f089fa0bb69eb331

facebook-github-bot · 2022-11-04T23:01:57Z

This pull request was exported from Phabricator. Differential Revision: D40236769

Summary: Pull Request resolved: pytorch#88471 X-link: meta-pytorch/torchrec#781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: pytorch#86348 Reviewed By: gnadathur Differential Revision: D40236769 Pulled By: yhcharles fbshipit-source-id: efeaa7990e26a58987769a93cedf7318d5cae445

Summary: X-link: meta-pytorch/torchrec#781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: #86348 Differential Revision: D40236769 Pulled By: yhcharles Pull Request resolved: #88471 Approved by: https://github.com/gnadathur, https://github.com/rohan-varma

Summary: X-link: pytorch/pytorch#88471 Pull Request resolved: meta-pytorch#781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: pytorch/pytorch#86348 Reviewed By: gnadathur Differential Revision: D40236769 Pulled By: yhcharles fbshipit-source-id: c6aecff5b0801938713f867827d0d3b4b5c906e6

pytorch#88471) Summary: X-link: meta-pytorch/torchrec#781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: pytorch#86348 Differential Revision: D40236769 Pulled By: yhcharles Pull Request resolved: pytorch#88471 Approved by: https://github.com/gnadathur, https://github.com/rohan-varma

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Nov 3, 2022

yhcharles mentioned this pull request Nov 4, 2022

[2/n] Thread PG: add class _World to distributed_c10d.py (#781) pytorch/pytorch#88471

Closed

yhcharles force-pushed the export-D40236769 branch from ca6aa4c to cc220ba Compare November 4, 2022 23:01

facebook-github-bot closed this in bc69643 Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thread PG: add class _World to distributed_c10d.py (#86348) #781

Thread PG: add class _World to distributed_c10d.py (#86348) #781

Uh oh!

yhcharles commented Nov 3, 2022

Uh oh!

facebook-github-bot commented Nov 3, 2022

Uh oh!

facebook-github-bot commented Nov 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Thread PG: add class _World to distributed_c10d.py (#86348) #781

Thread PG: add class _World to distributed_c10d.py (#86348) #781

Uh oh!

Conversation

yhcharles commented Nov 3, 2022

Uh oh!

facebook-github-bot commented Nov 3, 2022

Uh oh!

facebook-github-bot commented Nov 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants