-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[RPC tests] Fix test_init_(rpc|pg)_then_(rpc|pg) not shutting down RPC #41558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Addresses this bug report: #41474 The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch!
…ing down RPC" The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/) Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 411dffd (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
|
Do we also need to add the shutdown call to other tests, e.g., |
…ing down RPC" The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/) Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779) [ghstack-poisoned]
Pull Request resolved: #41558 The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 ghstack-source-id: 107947863 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/)
|
Added it to |
…ing down RPC" The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/) Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779) [ghstack-poisoned]
Pull Request resolved: #41558 The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 ghstack-source-id: 107995887 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/)
|
Actually, |
…ing down RPC" The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/) Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779) [ghstack-poisoned]
Pull Request resolved: #41558 The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call `rpc.shutdown()`). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT. Fixes #41474 ghstack-source-id: 108231453 Differential Revision: [D22582779](https://our.internmc.facebook.com/intern/diff/D22582779/)
|
This pull request has been merged in fced54a. |
Stack from ghstack:
The problem was due to non-deterministic destruction order of two global static variables: the mutexes used by glog and the RPC agent (which was still set because we didn't call
rpc.shutdown()). When the TensorPipe RPC agent shuts down some callbacks may fire with an error and thus attempt to log something. If the mutexes have already been destroyed this causes a SIGABRT.Fixes #41474
Differential Revision: D22582779
Differential Revision: D22582779