Skip to content

Defer pg agent listener thread until contexts are initialized #28013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Oct 15, 2019

Stack from ghstack:

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

Differential Revision: D17932271

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

[ghstack-poisoned]
@mrshenli
Copy link
Contributor Author

@pytorchbot retest this please

@@ -103,7 +103,8 @@ PyObject* rpc_init(PyObject* /* unused */) {
&ProcessGroupAgent::sync,
py::call_guard<py::gil_scoped_release>());

module.def("_init_rpc_agent", [](std::shared_ptr<RpcAgent> agent) {
module.def("_start_rpc_agent", [](std::shared_ptr<RpcAgent> agent) {
agent->start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we reverse this order?

@satgera
Copy link
Contributor

satgera commented Oct 15, 2019

LGTM !

…zed"

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

Differential Revision: [D17932271](https://our.internmc.facebook.com/intern/diff/D17932271)

[ghstack-poisoned]
Copy link
Contributor

@xush6528 xush6528 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now!

…zed"

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

Differential Revision: [D17932271](https://our.internmc.facebook.com/intern/diff/D17932271)

[ghstack-poisoned]
mrshenli added a commit that referenced this pull request Oct 15, 2019
ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

ghstack-source-id: 86f4c71
Pull Request resolved: #28013
@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in 59cd0fa.

@facebook-github-bot facebook-github-bot deleted the gh/mrshenli/24/head branch October 28, 2019 22:17
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
…h#28013)

Summary:
Pull Request resolved: pytorch#28013

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

Test Plan: Imported from OSS

Differential Revision: D17932271

Pulled By: mrshenli

fbshipit-source-id: 3b408477594d4d19319e7cd08dd6f383a7ed7670
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants