Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python excepiton handling catch block to resolve deadlock #35283

Closed
wants to merge 1 commit into from

Conversation

xush6528
Copy link
Contributor

@xush6528 xush6528 commented Mar 24, 2020

Stack:
    :black_circle:  #35283 Add python excepiton handling catch block to resolve deadlock  💛

#34260

Deadlock on destructing py::error_already_set.

There are request callback impls in Python, where Python exceptions could be thrown. For releasing Python exception py::objects, GIL must be held.

Differential Revision: D7753253

Differential Revision: D7753253
Differential Version: 100740035
@dr-ci
Copy link

dr-ci bot commented Mar 24, 2020

💊 CircleCI build failures summary and remediations

As of commit f3f23bf (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no CircleCI failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 1 time.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Do we know why rpc_test.py::test_py_function_exception didn't capture this?

@@ -477,6 +477,16 @@ std::shared_ptr<FutureMessage> RequestCallbackImpl::processMessage(
// processMessage().
ClearAutogradContextGuard guard;
processRpc(*rpc, messageType, id, retFuture);
} catch (py::error_already_set& e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only error type we gonna get from Python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is pybind implementation. If C++ calls pybind, and a Python exception is thrown and not handled in Python, this is the only exception that will be thrown to C++ land.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the scope of things that can be called from procesRpc? We don't always call C++ through pybind in PyTorch.

@xush6528
Copy link
Contributor Author

xush6528 commented Mar 24, 2020

@mrshenli

rpc_test.py::test_py_function_exception only tests PythonRpcHandler::runPythonUdf(..), which handles Python exceptions in Python land.

while this time the Python exception is thrown from PythonRpcHandler::serialize(..).

So this PR is a catch-call fix for all possible pybind python functions called in RPC server-side request handlers.

@xush6528 xush6528 added the module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer label Mar 24, 2020
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 36e3c00.

mrshenli pushed a commit to mrshenli/pytorch that referenced this pull request Mar 25, 2020
…h#35283)

Summary:
Note: This PR has been merged into master after the 1.5.0 branch cut at
36e3c00 (see original PR: pytorch#35283). This PR is to cherry pick it into 1.5.

---- Original Commit Description Follows ---

Pull Request resolved: pytorch#35283

pytorch#34260

Deadlock on destructing py::error_already_set.

There are request callback impls in Python, where Python exceptions
could be thrown. For releasing Python exception py::objects, GIL must
be held.

Differential Revision: D7753253

fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39
seemethere pushed a commit that referenced this pull request Mar 26, 2020
#35402)

Summary:
Note: This PR has been merged into master after the 1.5.0 branch cut at
36e3c00 (see original PR: #35283). This PR is to cherry pick it into 1.5.

---- Original Commit Description Follows ---

Pull Request resolved: #35283

#34260

Deadlock on destructing py::error_already_set.

There are request callback impls in Python, where Python exceptions
could be thrown. For releasing Python exception py::objects, GIL must
be held.

Differential Revision: D7753253

fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
@facebook-github-bot facebook-github-bot deleted the export-D7753253 branch March 28, 2020 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants