Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
os.waitpid(pid) seems racy under libuv on linux #1104
Seen on travis:
@contextlib.contextmanager def simple_subprocess(testcase): pid = os.fork() if pid == 0: # Don't raise an exception; it would be caught by the test harness. os._exit(72) yield None pid2, status = os.waitpid(pid, 0) testcase.assertEqual(pid2, pid) testcase.assertEqual(72 << 8, status)
The pid we spawned doesn't match the pid we waited for, even though we explicitly passed exactly that pid to
I have been unable to reproduce this on Ubuntu 16.04 (?) with kernel 4.4.0-112 in a virtual machine with 2 cpus and 4GB of memory on CPython 2.7.12. test_socketserver.py runs in about .7s (half what it takes on the failed example). If I reduce the memory by half, cut it back to one processer and throttle it so that test_socketserver takes 1.4s, I have been able to reproduce it one out of ten times, so that's a start.
I've also been able to reproduce a hang waiting on a process.
The AssertionError above is just an implementation bug, but the hang is actually a race condition: starting a new child watcher for the pid at the same time the old child watcher runs. The new child watcher will never be called. This can be fixed by more carefully controlling when child watchers run (in libev they're batched, here they were called when the signal handler ran---we can defer that to a batch).