-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?) #91414
Comments
We had a python hang at shutdown. The gdb python backtrace and C backtraces are below. It is hung in the COND_WAIT(gil->switch_cond, gil->switch_mutex) call in drop_gil(). Py_FinalizeEx -> handle_system_exit() -> PyGC_Collect -> handle_weakrefs -> drop_gil I think from the stack trace it may have been printing the warning: sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/build-st-1560250/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'> however I'm not sure if it was that or trying to show a different exception. Even if we have a resource leak, it shouldn't really hang! (gdb) py-bt
Traceback (most recent call first):
File "/usr/lib64/python3.10/weakref.py", line 106, in remove
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
Garbage-collecting #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f0f7bd54b20 <_PyRuntime+512>) at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f0f7bd54b20 <_PyRuntime+512>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007f0f7b88979f in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f0f7bd54b20 <_PyRuntime+512>, expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3 0x00007f0f7b88beb0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f0f7bd54b28 <_PyRuntime+520>, cond=0x7f0f7bd54af8 <_PyRuntime+472>) at pthread_cond_wait.c:504
#4 ___pthread_cond_wait (cond=cond@entry=0x7f0f7bd54af8 <_PyRuntime+472>, mutex=mutex@entry=0x7f0f7bd54b28 <_PyRuntime+520>) at pthread_cond_wait.c:619
#5 0x00007f0f7bb388d8 in drop_gil (ceval=0x7f0f7bd54a78 <_PyRuntime+344>, ceval2=<optimized out>, tstate=0x558744ef7c10)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval_gil.h:182
#6 0x00007f0f7bb223e8 in eval_frame_handle_pending (tstate=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1185
#7 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1775
#8 0x00007f0f7bb19600 in _PyEval_EvalFrame (throwflag=0,
f=Frame 0x7f0f7a0c8a60, for file /usr/lib64/python3.10/weakref.py, line 106, in remove (wr=<KeyedRef at remote 0x7f0f7a2f5fd0>, selfref=<weakref at remote 0x7f0f7a2bb880>, _atomic_removal=<built-in method _remove_dead_weakref of module object at remote 0x7f0f7b67aac0>), tstate=0x558744ef7c10)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/internal/pycore_ceval.h:46
#9 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=1, kwnames=<optimized out>)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:5065
#10 0x00007f0f7bb989a8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7fff8b815bc8, callable=<function at remote 0x7f0f7a2ab520>,
tstate=0x558744ef7c10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:114
#11 PyObject_CallOneArg (func=<function at remote 0x7f0f7a2ab520>, arg=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:184
#12 0x00007f0f7bb0fce1 in handle_weakrefs (old=0x558744edbd30, unreachable=0x7fff8b815c70) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:887
#13 gc_collect_main (tstate=0x558744ef7c10, generation=2, n_collected=0x7fff8b815d50, n_uncollectable=0x7fff8b815d48, nofail=0)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1281
#14 0x00007f0f7bb9194e in gc_collect_with_callback (tstate=tstate@entry=0x558744ef7c10, generation=generation@entry=2)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1413
#15 0x00007f0f7bbc827e in PyGC_Collect () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:2099
#16 0x00007f0f7bbc7bc2 in Py_FinalizeEx () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:1781
#17 0x00007f0f7bbc7d7c in Py_Exit (sts=0) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:2858
#18 0x00007f0f7bbc4fbb in handle_system_exit () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:775
#19 0x00007f0f7bbc4f3d in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0x558744ef7c10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:785
#20 PyErr_PrintEx (set_sys_last_vars=1) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:880
#21 0x00007f0f7bbbcece in PyErr_Print () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:886
#22 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=1, flags=0x7fff8b815f18) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:462
#23 0x00007f0f7bbbcc57 in _PyRun_AnyFileObject (fp=0x558744ed9370, filename='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/bitbake/bin/bitbake', closeit=1,
flags=0x7fff8b815f18) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:90
#24 0x00007f0f7bbb9e44 in pymain_run_file_obj (skip_source_first_line=0, filename='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/bitbake/bin/bitbake', program_name='python3')
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:353
#25 pymain_run_file (config=0x558744edbe30) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:372
#26 pymain_run_python (exitcode=0x7fff8b815f10) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:587
#27 Py_RunMain () at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:666
#28 0x00007f0f7bb8670d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/main.c:720
#29 0x00007f0f7b82c560 in __libc_start_call_main (main=main@entry=0x558743bfd160 <main>, argc=argc@entry=5, argv=argv@entry=0x7fff8b816168)
at ../sysdeps/nptl/libc_start_call_main.h:58
#30 0x00007f0f7b82c60c in __libc_start_main_impl (main=0x558743bfd160 <main>, argc=5, argv=0x7fff8b816168, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fff8b816158) at ../csu/libc-start.c:409
#31 0x0000558743bfd095 in _start () |
Similar problems are found in Python 3.9.11. The stack is as follows;We have six threads, five of which are daemon threads, two of which periodically wait for the thread lock, and two of which periodically release the lock. When the lock is hanged, both the thread and the main thread that wait for the lock in the two periods are suspended.
|
@itachaaa Can you provide a minimal reproducer? Also can you try main branch? |
The difficulty is that it does not always occur. The code is abstracted as follows, but it cannot be reproduced.Does the main branch refer to the master branch? import gc
import socket
import sys
import threading
import time
class MyQueue(object):
def __init__(self):
self.event = threading.Event()
self.i = 0
def set_(self,):
self.event.set()
time.sleep(1)
gc.collect()
if self.i > 2:
sys.exit()
def wait(self, name):
self.event.wait()
print('waiting')
self.event.clear()
self.i += 1
with open('log.txt', 'a') as fp:
fp.write(str(self.i) + name)
fp.write('\n')
def start(self, daemon_, name):
threading.Thread(
target=loop_wrap(self.wait),
args=(name, ),
name="wait",
daemon=daemon_
).start()
threading.Thread(
target=loop_wrap(self.set_),
name="set",
daemon=True
).start()
def loop_wrap(func):
def wrapper(*args):
while True:
try:
func(*args)
except Exception:
time.sleep(1)
return wrapper
if __name__ == '__main__':
queue1 = MyQueue()
queue2 = MyQueue()
queue1.start(True, 'queue1')
queue2.start(True, 'queue2')
# time.sleep(1)
print('exit main') |
Yes main means master branch. |
Any news on this? I have the same problem but can't create an MWE. I have a periodic task that runs ~300 containers per day with the same code and randomly ~4 of them hang in this state |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: