Skip to content

Conversation

@yhcharles
Copy link
Contributor

Summary:
Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

X-link: pytorch/pytorch#86348

Differential Revision: D40236769

Pulled By: yhcharles

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Nov 3, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40236769

Summary:
X-link: pytorch/pytorch#88471

Pull Request resolved: meta-pytorch#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch/pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: ebd6080e4923da549800a048f089fa0bb69eb331
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40236769

yhcharles pushed a commit to yhcharles/pytorch that referenced this pull request Nov 4, 2022
Summary:
Pull Request resolved: pytorch#88471

X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: efeaa7990e26a58987769a93cedf7318d5cae445
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 7, 2022
Summary:
X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: #86348

Differential Revision: D40236769

Pulled By: yhcharles

Pull Request resolved: #88471
Approved by: https://github.com/gnadathur, https://github.com/rohan-varma
lequytra pushed a commit to lequytra/torchrec that referenced this pull request Dec 6, 2022
Summary:
X-link: pytorch/pytorch#88471

Pull Request resolved: meta-pytorch#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch/pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: c6aecff5b0801938713f867827d0d3b4b5c906e6
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
pytorch#88471)

Summary:
X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch#86348

Differential Revision: D40236769

Pulled By: yhcharles

Pull Request resolved: pytorch#88471
Approved by: https://github.com/gnadathur, https://github.com/rohan-varma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants