-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch multiprocessing sync crash with python barrier. #29855
Comments
@VitalyFedyunin any update here? ;) |
The call seems to be segfaulting when trying to acquire the lock from the barrier in the child process.
And the corresponding python stack trace:
|
btw this is a know upstream problem in CPython: python/cpython#77377 The simplest pure python repro: import multiprocessing as mp
import faulthandler
# Just to print the segfault in the child
faulthandler.enable()
def _mp_fn(barrier):
barrier.wait()
if __name__ == '__main__':
barrier = mp.get_context("fork").Barrier(1)
p = mp.get_context("spawn").Process(target=_mp_fn, args=(barrier,))
p.start()
p.join() |
This is now fixed in CPython and will nicely raise an error on 3.11+ (once the bugfix patch is released) |
This is an issue original found in pytorch/xla repo by @jysohn23 @dlibenzi .
The text was updated successfully, but these errors were encountered: