-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regrtest: use process groups #82683
Comments
Problem. When regrtest is interrupted by CTRL+c and regrtest multiprocessing code (-jN command line option) is used, regrtest randomly fails to stop come TestWorkerProcess threads. The problem is that only the direct child process is kill by SIGKILL. If the test spawns a grandchild process, this one will not be killed. Moreover, the grandchild process inherits the child process stdout and stderr. But regrtest mltiprocessing uses pipes for the child process stdout and stderr. At the end, Popen.communicate() hangs randomly in the main regrtest process (in a TestWorkerProcess thread) until the child process *and* the grandchild process completes. Except that the main regrtest does not directly kill the grandchild process. => see bpo-38207 Solution. I propose to: (1) use Popen() with start_new_session=True: a worker process calls setsid() to create a new process group. Attached PR implements this solution. |
Example of the problem: sometimes, interrupting a multiprocessing test hangs. regrtest fails to interrupt a TestWorkerProcess thread. $ ./python -m test test_multiprocessing_fork -j10 -F --timeout=60 --slowest
0:00:00 load avg: 1.09 Run tests in parallel using 10 child processes
^C
Kill <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=2.3 sec>
Kill <TestWorkerProcess #2 running test=test_multiprocessing_fork pid=4503 time=2.3 sec>
Kill <TestWorkerProcess #3 running test=test_multiprocessing_fork pid=4506 time=2.3 sec>
Kill <TestWorkerProcess #4 running test=test_multiprocessing_fork pid=4505 time=2.3 sec>
Kill <TestWorkerProcess #5 running test=test_multiprocessing_fork pid=4510 time=2.3 sec>
Kill <TestWorkerProcess #6 running test=test_multiprocessing_fork pid=4509 time=2.3 sec>
Kill <TestWorkerProcess #7 running test=test_multiprocessing_fork pid=4513 time=2.3 sec>
Kill <TestWorkerProcess #8 running test=test_multiprocessing_fork pid=4514 time=2.3 sec>
Kill <TestWorkerProcess #9 running test=test_multiprocessing_fork pid=4516 time=2.3 sec>
Kill <TestWorkerProcess #10 running test=test_multiprocessing_fork pid=4517 time=2.3 sec>
0:00:03 load avg: 1.88 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=3.3 sec> thread for 1.0 sec
0:00:04 load avg: 1.88 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=4.3 sec> thread for 2.0 sec
0:00:05 load avg: 1.73 Waiting for <TestWorkerProcess #1 running test=test_multiprocessing_fork pid=4504 time=5.3 sec> thread for 3.0 sec
(...) With my PR, I cannot reproduce the "Waiting for ..." issue anymore. |
See also bpo-18969: "test suite: enable faulthandler timeout in assert_python". This issue proposes to use killpg() to send a signal to all Python processes spawned by regrtest: trigger faulthandler to dump the Python traceback of all Python threads. |
I will wait at least one day to see how buildbots like this change before backporting it to 3.7 and 3.8. |
I don't know for sure that this is the cause but both 3.x builds following this commit on my bolen-ubuntu worker (Ubuntu 18.04.3) have had test_pty crash in the first attempt, with the retry succeeding. For example https://buildbot.python.org/all/#/builders/141/builds/2679 In spot checking, the same behavior seems to be occurring on many other non-Windows builders as well, presumably those for which it gets used. There doesn't seem to be anything useful produced by the first crash. Just an entry for "test_pty crashed (Exit code -1)" in the test log. |
I can recreate this manually by running regrtest.py against test_pty. Crashes with any "-j#" option, but fine when run sequentially. Removing the process group change avoids the crash. With the process group change in place, the trigger point appears to be the final "os.close(master_fd)" in PtyTest.test_basic. Commenting that out avoids the crash. |
I created bpo-38547: "test_pty fails when using setsid()". |
The test_pty has been fixed. regrtest now calls setsid() when using multiprocessing (-jN), but only in the main branch. It's a nice feature, but I'm not comfortable about backporting it yet. Maybe if tests become too unstable on other branches, we may backport the feature to 3.7 and 3.8. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: