New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of Autograd+Fork errors. #33885
Better handling of Autograd+Fork errors. #33885
Conversation
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
torch/csrc/autograd/engine.cpp
Outdated
void Engine::initialize_threads_pool() { | ||
track_bad_autograd_forks(); | ||
TORCH_CHECK(!in_bad_autograd_fork, | ||
"Unable to handle autograd's threading in combination with fork. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "fork" -> "fork-based multiprocessing"
Maybe some people use the multiprocessing package without knowing what fork is?
💊 CircleCI build failures summary and remediationsAs of commit a3c88af: None of the build failures appear to be your fault.
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. ❄️ 1 failure recognized as flakyThe following build failures have been detected as flaky and may not be your fault: pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (1/1)Step: "Test" (full log | pattern match details) ❄️
|
Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Differential Revision: [D20144024](https://our.internmc.facebook.com/intern/diff/D20144024) [ghstack-poisoned]
ghstack-source-id: 5b62de9df7aa09124b075c81a21d0219da99f21f Pull Request resolved: #33885
@@ -821,15 +834,6 @@ def test_is_shared_cuda(self): | |||
t = torch.randn(5, 5).cuda() | |||
self.assertTrue(t.is_shared()) | |||
|
|||
@unittest.skip('this test occasionally fails and deadlocks; see https://github.com/pytorch/pytorch/issues/5834') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch !
@VitalyFedyunin merged this pull request in 877ab3a. |
Summary: Pull Request resolved: #33885 Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f
Summary: Pull Request resolved: pytorch#33885 Fixes: pytorch#32835 Fixes: pytorch#5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f
Stack from ghstack:
Fixes: #32835
Fixes: #5834
Can not combine with CUDA's implementation as each of them requires individual
std::once_flag
as well as differentforked_autograd_child
functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp.Differential Revision: D20144024