-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threading._shutdown() race condition: test_threading test_threads_join_2() fails randomly #80583
Comments
https://buildbot.python.org/all/#/builders/168/builds/801 0:23:17 load avg: 5.00 [334/420/1] test_threading crashed (Exit code -6) -- running: test_tools (8 min 42 sec), test_multiprocessing_spawn (5 min 41 sec), test_zipfile (30 sec 787 ms) Current thread 0x0000000800acd000 (most recent call first): The test crashed once, but then passed when run again in verbose mode ("Re-running test 'test_threading' in verbose mode"). |
As far as I remember, test_threads_join_2() was already unstable. I created this issue to try to track if it's a regression or not. If it's a regression, I would suggest to have a look at Eric Snow's recent commits. At this point, I simply have no idea if the test fails exactly one in the lifetime of the buildbot worker, or if it started to fail frequently on this FreeBSD buildbot. |
New failure: AMD64 FreeBSD CURRENT Shared 3.x: ... Current thread 0x0000000800ac3000 (most recent call first): Stop. |
In the same build: 0:28:57 load avg: 12.93 [208/423/1] test_threading crashed (Exit code 1) -- running: test_lib2to3 (7 min 9 sec), test_multiprocessing_spawn (1 min 36 sec) |
I wrote PR 13889: with this change, I can easily reproduce the crash on Linux: $ ./python -m test test_threading -m test_threads_join_2 -F
Run tests sequentially
0:00:00 load avg: 0.51 [ 1] test_threading
Fatal Python error: Py_EndInterpreter: not the last thread Current thread 0x00007f84ad74d740 (most recent call first): Py_EndInterpreter() calls wait_for_thread_shutdown() to wait until threading._shutdown() completes. When the assertion fails, threading.enumerate() only contains the main thread: the spawned thread is already gone. But the assertion fails, which means that the Python thread state of the thread (which looks to be completed) is still around. This unit test comes from bpo-18808: commit 7b47699
|
Oh. Using PR 13889, I'm able to reproduce the bug up to Python 3.4. Example at commit commit e76cbc7 (tag: v3.4.10): $ wget 'https://github.com/python/cpython/pull/13889.patch'
$ git apply 13889.patch
$ ./python -m test -F -m test_threads_join_2 test_threading
[ 1] test_threading
[ 2] test_threading
(...)
[ 10] test_threading
[ 11] test_threading
Fatal Python error: Py_EndInterpreter: not the last thread Current thread 0x00007f418b3280c0 (most recent call first): |
threading._shutdown() uses threading.enumerate() which iterations on threading._active. threading.Thread registers itself into threading._active using its _bootstrap_inner() method. It unregisters itself when _bootstrap_inner() completes, whereas its is_alive() method still returns true: since the underlying native thread still runs and the Python thread state still exists. _thread._set_sentinel() creates a lock and registers a tstate->on_delete callback to release this lock. It's called by threading.Thread._set_tstate_lock() to set threading.Thread._tstate_lock. This lock is used by threading.Thread.join() to wait until the thread completes. _thread.start_new_thread() calls the C function t_bootstrap() which ends with: tstate->interp->num_threads--;
PyThreadState_Clear(tstate);
PyThreadState_DeleteCurrent();
PyThread_exit_thread(); _PyThreadState_DeleteCurrent() calls tstate->on_delete() which releases threading.Thread._tstate_lock lock. In test_threads_join_2() test, PyThreadState_Clear() blocks on clearing thread variables: the Sleeper destructor of the Sleeper instance sleeps. The race condition is that:
|
Other references to test_threads_join_2() failures:
See also bpo-18808: "Thread.join returns before PyThreadState is destroyed" (issue which added the test). |
test_threading: test_threads_join_2() was added by commit 7b47699 in 2013, but the test failed randomly since it was added. It's just that failures were ignored until I created https://bugs.python.org/issue36402 last March. In fact, when the test failed randomly on buildbot (with tests run in parallel), it was fine since test_threading was re-run alone and then the test passed. The buildbot build was seen overall as a success. Previous issues were closed (see my previous comment). The test shows the bug using subinterpreters (Py_EndInterpreter), but the bug also exists in Py_Finalize() which hash the same race condition (it also calls threading._shutdown()). It's just that Py_EndInterpreter() is stricter, it contains this assertion: if (tstate != interp->tstate_head || tstate->next != NULL)
Py_FatalError("Py_EndInterpreter: not the last thread"); Attached py_finalize.patch adds the same assertion to Py_Finalize. I added test_threading.test_finalization_shutdown() to PR 13948. If you run test_finalization_shutdown() with py_finalize.patch, Py_Finalize() fails with a similar assertion error. But py_finalize.patch is incompatible with the principle of daemon threads and so cannot be commited. |
The bpo-18808 "Thread.join returns before PyThreadState is destroyed" was not fixed in Python 2.7: threading.Thread has no _tstate_lock attribute. I'm not comfortable to backport bpo-18808 "feature" or "bugfix" to Python 2.7, not to backport this change. Python 2.7 works as it is, and it's going to reach its end of life at the end of the year. I guess that people learnt how to work around Python 2.7 limitation like bpo-18808. |
Ok, the root issue (threading._shutdown() race condition) has been fixed in Python 3.7, 3.8 and master branches. I close the issue. Thanks for the reviews! |
Please note that this fix appears to be the cause of bpo-37788 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: