-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[TorchScript] thread-safe ErrorReport::CallStack #160386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TorchScript] thread-safe ErrorReport::CallStack #160386
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160386
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4fc831a with merge base f33ce40 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc EikanWang jgong5 wenzhe-nrv sanchitintel [ghstack-poisoned]
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
cc EikanWang jgong5 wenzhe-nrv sanchitintel Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) [ghstack-poisoned]
cc EikanWang jgong5 wenzhe-nrv sanchitintel Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) [ghstack-poisoned]
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings. The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). **This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called**. When this happens, it causes a segfault. This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults. Added a test `test_thread_safe_error_stacks` which segfaults prior to these changes, and no longer segfaults. cc EikanWang jgong5 wenzhe-nrv sanchitintel Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) [ghstack-poisoned]
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings. The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). **This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called**. When this happens, it causes a segfault. This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults. Added a test `test_thread_safe_error_stacks` which segfaults prior to these changes, and no longer segfaults. cc EikanWang jgong5 wenzhe-nrv sanchitintel Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) [ghstack-poisoned]
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings. The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). **This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called**. When this happens, it causes a segfault. This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults. Added a test `test_thread_safe_error_stacks` which segfaults prior to these changes, and no longer segfaults. Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) Pull Request resolved: #160386 Approved by: https://github.com/eellison
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings. The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). **This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called**. When this happens, it causes a segfault. This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults. Added a test `test_thread_safe_error_stacks` which segfaults prior to these changes, and no longer segfaults. Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) Pull Request resolved: #160386 Approved by: https://github.com/eellison
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings. The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). **This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called**. When this happens, it causes a segfault. This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults. Added a test `test_thread_safe_error_stacks` which segfaults prior to these changes, and no longer segfaults. Differential Revision: [D80054972](https://our.internmc.facebook.com/intern/diff/D80054972) Pull Request resolved: pytorch#160386 Approved by: https://github.com/eellison
Stack from ghstack (oldest at bottom):
Context: During jit.script, the TorchScript frontend maintains a callstack of Python frames, which is used to present the corresponding user code in case TorchScript errors. The callstack is maintained via ErrorReport::CallStack RAII guards. Before recursing into a function, an ErrorReport::CallStack guard is created and the CallStack guard pushes the frame information onto a thread_local callstack (a list of calls); and after exiting, the frame information is popped off the callstack. Note that the CallStack guards are also sometimes used in python via pybindings.
The problem is that sometimes another thread can obtain a reference to the CallStack guard (if it's a Python CallStack guard). This means that the destructor for a CallStack guard can be called from a different thread than the constructor was called. When this happens, it causes a segfault.
This PR makes the callstack vector thread-safe to access, and each CallStack guard will store a reference to the callstack vector onto which it pushed. When the CallStack guard is destructed, it pops off the appropriate callstack vector. Although this could potentially lead to mangled callstacks, it should prevent segfaults.
Added a test
test_thread_safe_error_stacks
which segfaults prior to these changes, and no longer segfaults.cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel
Differential Revision: D80054972