Skip to content

Add basic GPU support to distributed autograd. #40312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pritamdamania87
Copy link
Contributor

@pritamdamania87 pritamdamania87 commented Jun 19, 2020

Stack from ghstack:

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

  1. Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
    autograd graph.
  2. The long lived CPU thread has its own ready_queue and this queue is used for
    all GraphTasks created by DistEngine.
  3. In thread_main(), the CPU thread cannot exit once the GraphTask is done
    processing because of the new CPU thread added in 1).
  4. To resolve this, thread_main() now has a parameter device_thread instead
    of reentrant_thread. When device_thread is True, we expect this to be a long
    lived device thread that does not exit.
  5. When device_thread is False, thread_main is expected to run a GraphTask and
    return once done.

Differential Revision: D22146183

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jun 19, 2020

💊 CI failures summary and remediations

As of commit be9f57d (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 23 02:19:21 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.
Jun 23 02:19:21 processing existing schema:  __str__(__torch__.torch.classes._TorchScriptTesting._StackString _0) -> (str _0) 
Jun 23 02:19:21 processing existing schema:  __init__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jun 23 02:19:21 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int[] _0) 
Jun 23 02:19:21 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._PickleTester _0, int[] _1) -> (None _0) 
Jun 23 02:19:21 processing existing schema:  top(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jun 23 02:19:21 processing existing schema:  pop(__torch__.torch.classes._TorchScriptTesting._PickleTester _0) -> (int _0) 
Jun 23 02:19:21 processing existing schema:  get(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, Tensor _1) -> (str _0) 
Jun 23 02:19:21 processing existing schema:  __getstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0) -> (int _0) 
Jun 23 02:19:21 processing existing schema:  __setstate__(__torch__.torch.classes._TorchScriptTesting._LiteInterpreterTest _0, int _1) -> (None _0) 
Jun 23 02:19:21 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Jun 23 02:19:21 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Jun 23 02:19:21  
Jun 23 02:19:21 Broken ops: [ 
Jun 23 02:19:21 	aten::mkldnn_max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[0, 0, 0], int[3] dilation=[1, 1, 1], bool ceil_mode=False) -> (Tensor) 
Jun 23 02:19:21 	aten::mkldnn_reorder_conv3d_weight(Tensor self, int[3] padding=[0, 0, 0], int[3] stride=[1, 1, 1], int[3] dilation=[1, 1, 1], int groups=1) -> (Tensor) 
Jun 23 02:19:21 ] 
Jun 23 02:19:21 + cleanup 
Jun 23 02:19:21 + retcode=1 
Jun 23 02:19:21 + set +x 
Jun 23 02:19:21 =================== sccache compilation log =================== 
Jun 23 02:19:21 =========== If your build fails, please take a look at the log above for possible reasons =========== 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 20 times.

@pritamdamania87
Copy link
Contributor Author

Verified the RPC examples are working with this PR.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, the device_thread naming is confusing as reentrant backwards could happen in the device threads as well. Maybe call it sth like spin or long_live?

@@ -443,7 +432,7 @@ void Engine::reentrant_thread_init() {
// set the local_ready_queue to the ready queue on the graph_task->owner_ device
local_ready_queue = ready_queue_by_index(graph_task->cpu_ready_queue_, graph_task->owner_);
total_depth = graph_task->reentrant_depth_;
thread_main(graph_task, /* reentrant thread*/ true);
thread_main(graph_task, /* device_thread */ false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confused on the argument changed here, device threads can also have reentrant backwards right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree here. Since we have multiple use case for this, it might be clearer to just describe what it does.
Or just remove this flag that is completely redundant with the first arguments :D

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jun 20, 2020
Pull Request resolved: #40312

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.
ghstack-source-id: 106287186

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)
As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jun 21, 2020
Pull Request resolved: #40312

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.
ghstack-source-id: 106299861

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch for the increment/decrement issues !
Just a minor point about argument naming but I think the overall logic is good.

@@ -443,7 +432,7 @@ void Engine::reentrant_thread_init() {
// set the local_ready_queue to the ready queue on the graph_task->owner_ device
local_ready_queue = ready_queue_by_index(graph_task->cpu_ready_queue_, graph_task->owner_);
total_depth = graph_task->reentrant_depth_;
thread_main(graph_task, /* reentrant thread*/ true);
thread_main(graph_task, /* device_thread */ false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree here. Since we have multiple use case for this, it might be clearer to just describe what it does.
Or just remove this flag that is completely redundant with the first arguments :D

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jun 22, 2020
Pull Request resolved: #40312

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.
ghstack-source-id: 106352396

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. LGTM

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Jun 23, 2020
Pull Request resolved: #40312

As part of #40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.
ghstack-source-id: 106391329

Differential Revision: [D22146183](https://our.internmc.facebook.com/intern/diff/D22146183/)
@mrshenli
Copy link
Contributor

Shall we land this today before branch cut?

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 54c05fa.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 54c05fa.

@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/144/head branch June 27, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants