-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Release GIL during RPC shutdown. #69586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release GIL during RPC shutdown. #69586
Conversation
In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. Differential Revision: [D32937687](https://our.internmc.facebook.com/intern/diff/D32937687/) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 13546f6 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. Differential Revision: [D32937687](https://our.internmc.facebook.com/intern/diff/D32937687/) ghstack-source-id: 145062265 Pull Request resolved: #69586
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Summary: Pull Request resolved: #69586 In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. ghstack-source-id: 145062265 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D32937687 fbshipit-source-id: 980adbcc1e3799b40206f7bca6e7695ca67f0fc2
Summary: Pull Request resolved: #69586 In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. ghstack-source-id: 145062265 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D32937687 fbshipit-source-id: 980adbcc1e3799b40206f7bca6e7695ca67f0fc2
Stack from ghstack:
In certain scenarios during shutdown the following assert failed:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39.
This was due to _reset_current_rpc_agent not releasing GIL.
Fixed this issue by releasing GIL.
Differential Revision: D32937687
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang