-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_thread module: Remove redundant PyThread_exit_thread() call to avoid glibc fatal error: libgcc_s.so.1 must be installed for pthread_cancel to work #88600
Comments
The glibc pthread_exit() functions loads an unwind function from libgcc_s.so.1 using dlopen(). dlopen() can fail to open libgcc_s.so.1 file to various reasons, but the most likely seems to be that the process is out of available file descriptor (EMFILE error). If the glibc pthread_exit() fails to open libgcc_s.so.1, it aborts the process. Extract of pthread_cancel(): /* Trigger an error if libgcc_s cannot be loaded. */
{
struct unwind_link *unwind_link = __libc_unwind_link_get ();
if (unwind_link == NULL)
__libc_fatal (LIBGCC_S_SO
" must be installed for pthread_cancel to work\n");
} Sometimes, libgcc_s.so.1 library is loaded early in Python startup. Sometimes, it only loaded when the first Python thread exits. Hitting in a multithreaded real world application, dlopen() failing with EMFILE is not deterministic. It depends on precise timing and in which order threads are running. It is unlikely in a small application, but it is more likely on a network server which has thousands of open sockets (file descriptors). -- Attached scripts reproduces the issue. You may need to run the scripts (especially pthread_cancel_emfile.py) multiple times to trigger the issue. Sometimes libgcc_s library is loaded early for an unknown reason, it works around the issue. (1) pthread_cancel_bug.py $ python3.10 pthread_cancel_bug.py
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped) (2) pthread_cancel_emfile.py: $ python3.10 ~/pthread_cancel_emfile.py
spawn thread
os.open failed: OSError(24, 'Too many open files')
FDs open by the thread: 2 (max FD: 4)
fd 0 valid? True
fd 1 valid? True
fd 2 valid? True
fd 3 valid? True
fd 4 valid? True
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped) -- Example of real world issue on RHEL8: The RHEL reproducer uses a very low RLIMIT_NOFILE (5 file descriptors) to trigger the bug faster. It simulates a busy server application. -- There are different options: () Modify thread_run() of Modules/_threadmodule.c to remove the *redundant PyThread_exit_thread() call. This is the most simple option and it sounds perfectly safe to me. I'm not sure why PyThread_exit_thread() is called explicitly. We don't pass any parameter to the function. (*) Link the Python _thread extension on libgcc_s.so if Python it built with the glibc. Checking if Python is linked to the glibc is non trivial and we have hardcode the "libgcc_s" library name. I expect painful maintenance burden with this option. (*) Load explicitly the libgcc_s.so library in _thread.start_new_thread(): when the first thread is created. We need to detect that we are running the glibc at runtime, by calling confstr('CS_GNU_LIBC_VERSION') for example. The problem is that "libgcc_s.so.1" filename may change depending on the Linux distribution. It will likely have a different filename on macOS (".dynlib"). In short, it's tricky to get it right. (*) Fix the glibc! I discussed with glibc developers who explained me that there are good reasons to keep the unwind code in the compiler (GCC), and so load it dynamically in the glibc. In short, this is not going to change. -- Attached PR implements the most straightforward option: remove the redundant PyThread_exit_thread() call in thread_run(). |
See also bpo-18748 "io.IOBase destructor silence I/O error on close() by default" which was caused by a bug in an application, the application closed the libgcc_s file descriptor by mistake. It closed the same file decriptor twice, whereas the FD was reused by dlopen() in the meanwhile. But the result was the same, the process aborted with this error message: "libgcc_s.so.1 must be installed for pthread_cancel to work" |
PyThread_exit_thread() was modified in 2011 to fix daemon threads: commit 0d5e52d
PyThread_exit_thread(void)
{
dprintf(("PyThread_exit_thread called\n"));
- if (!initialized) {
+ if (!initialized)
exit(0);
- }
+ pthread_exit(0);
} This change remains important for Python/ceval.c. When a daemon thread tries to acquire the GIL, it calls PyThread_exit_thread() if Python already exited to exit immediately the thread. Example from take_gil(): if (tstate_must_exit(tstate)) {
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
thread which called Py_Finalize(), exit immediately the thread.
This code path can be reached by a daemon thread after Py_Finalize()
completes. In this case, tstate is a dangling pointer: points to
PyThreadState freed memory. */
PyThread_exit_thread();
} See also my articles on daemon threads fixes: |
_thread.start_new_thread() always called "exit thread", since the function was added to Python: commit 1984f1e
static void
t_bootstrap(args_raw)
void *args_raw;
{
object *args = (object *) args_raw;
object *func, *arg, *res;
restore_thread((void *)NULL);
func = gettupleitem(args, 0);
arg = gettupleitem(args, 1);
res = call_object(func, arg);
DECREF(arg); /* Matches the INCREF(arg) in thread_start_new_thread */
if (res == NULL) {
fprintf(stderr, "Unhandled exception in thread:\n");
print_error(); /* From pythonmain.c */
fprintf(stderr, "Exiting the entire program\n");
goaway(1);
}
(void) save_thread();
exit_thread();
} exit_thread() was partially replaced with PyThread_exit_thread() in: commit bcc2074
|
Unix pthread_create() manual page. The new thread terminates in one of the following ways: (...)
Calling pthread_exit(0) is optional. -- MSDN _beginthreadex() documentation: "When the thread returns from that routine, it is terminated automatically." Calling _endthreadex(0) is optional. |
See also bpo-44436 "[Windows] _thread.start_new_thread() should close the thread handle". |
Ok, the issue is now fixed in 3.9, 3.10 and main branches. |
I marked bpo-42888 as a duplicate of this issue. I created PR 26943 based on Alexey's PR 24241 to complete my fix (remove two calls in two tests). Copy of his interesting PR commit message: PyThread_exit_thread() uses pthread_exit() on POSIX systems. In glibc, While providing libgcc_s.so is the reponsibility of the user The only exception are calls in take_gil() (Python/ceval_gil.h) Of course, since PyThread_exit_thread() is a public API, [1] https://sourceware.org/legacy-ml/libc-help/2014-07/msg00000.html |
Se also bpo-35866 which looks like a duplicate. |
I marked bpo-37395 "Core interpreter should be linked with libgcc_s.so on Linux" as a duplicate of this issue. |
On Linux, there is a workaround for Python versions which don't include this fix: $ LD_PRELOAD=/usr/lib64/libgcc_s.so.1 python3 ... To preload the libgcc_s.so.1 library in the Python process when running Python. |
Good news: this change fixed bpo-35866 "concurrent.futures deadlock". |
I started "Does anyone use threading debug PYTHONTHREADDEBUG=1 env var? Can I remove it?" thread on python-dev: |
I created bpo-44584: "Deprecate thread debugging PYTHONTHREADDEBUG=1". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: