Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in threaded application when using sys._current_frames #106883

Open
mmohrhard opened this issue Jul 19, 2023 · 6 comments
Open

Deadlock in threaded application when using sys._current_frames #106883

mmohrhard opened this issue Jul 19, 2023 · 6 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@mmohrhard
Copy link
Contributor

mmohrhard commented Jul 19, 2023

Bug report

When using sys._current_frames in a threaded application there is a high risk of a deadlock in the interpreter. Below is my analysis from one such hang. We have never seen this problem with Python 3.8 but experience it regularly with Python 3.11. I know that the sys._current_frames documentation warns about the function and one solution to this bug report could be to document that the function is not thread safe.

We have a huge number of threads in this application but I'll limit the output to the two threads that cause the deadlock.

We have a thread calling sys._current_frames with a Python stack trace:

Thread 118868 (idle): "Profile"
    _gc_callback (distributed/utils_perf.py:184)
    _watch (distributed/profile.py:261)
    run (threading.py:975)
    _bootstrap_inner (threading.py:1038)
    _bootstrap (threading.py:995)

and a corresponding C stack trace:

Thread 325 (Thread 0x7f8b992d6700 (LWP 118868) ""):
#0  0x00007f8ce83adde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004eae32 in PyCOND_TIMEDWAIT (us=<optimized out>, mut=<optimized out>, cond=0x879128 <_PyRuntime+392>) at /usr/local/src/conda/python-3.11.4/Python/condvar.h:73
#2  take_gil (tstate=0x7f8b992d54e0) at /usr/local/src/conda/python-3.11.4/Python/ceval_gil.h:250
#3  0x00000000005134bc in eval_frame_handle_pending (tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1211
#4  _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1754
#5  0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e248, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73                                                                    #6  _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8cdc1d2340, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439
#7  _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8cdc1d2340>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#8  _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8cdc1d2340>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92#9  0x0000000000554172 in method_vectorcall (method=method@entry=<method at remote 0x7f8bce070c00>, args=args@entry=0x7f8b992d57d0, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:89
#10 0x00000000004fe378 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f8b992d57d0, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#11 _PyObject_CallFunctionVa (is_size_t=0, va=0x7f8b992d57b0, format=<optimized out>, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:536
#12 PyObject_CallFunction (callable=<method at remote 0x7f8bce070c00>, format=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:558
#13 0x00000000005c23d2 in invoke_gc_callback (phase=0x654129 "stop", generation=<optimized out>, collected=<optimized out>, uncollectable=<optimized out>, tstate=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1378
#14 0x00000000005c230a in gc_collect_with_callback (tstate=0x5c60c20, generation=0) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1401
#15 0x000000000053b418 in _PyObject_GC_Link (op=set()) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2270
#16 gc_alloc (presize=<optimized out>, basicsize=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2290
#17 _PyObject_GC_NewVar (nitems=<optimized out>, tp=0x869fe0 <PyFrame_Type>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2318
#18 _PyFrame_New_NoTrack (code=<optimized out>, code=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/frameobject.c:1024
#19 _PyFrame_MakeAndSetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Python/frame.c:33
#20 0x00000000004e19df in _PyFrame_GetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_frame.h:171                                                                                            #21 _PyThread_CurrentFrames () at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1420
#22 0x00000000005061c1 in cfunction_vectorcall_NOARGS (func=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/cpython/methodobject.h:52
#23 0x000000000051bde1 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#24 PyObject_Vectorcall (callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:299
#25 0x000000000050ed83 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:4774
#26 0x0000000000534f13 in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e188, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73
#27 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7f8bce07e978, locals=0x0, func=<optimized out>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439 
#28 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7f8bce07e978, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#29 0x000000000053ef0d in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<function at remote 0x7f8cdc121800>, func=0x534da0 <_PyFunction_Vectorcall>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:257
#30 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f8cdc121800>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:328
#31 PyObject_Call (callable=<function at remote 0x7f8cdc121800>, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:355
#32 0x000000000051290a in do_call_core (use_tracing=<optimized out>, kwdict={'thread_id': 140243172636480, 'interval': '10ms', 'cycle': '1000ms', 'log': <collections.deque at remote 0x7f8bc008b1f0>, 'omit': ('profile.py', 'selectors.py'), 'stop': <function at remote 0x7f8bc0058220>}, callargs=(), func=<function at remote 0x7f8cdc121800>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:7357
#33 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:5381
#34 0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e020, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73
#35 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8ce0e41b20, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439
#36 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8ce0e41b20>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#37 _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8ce0e41b20>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#38 0x00000000005541f8 in method_vectorcall (method=<optimized out>, args=0x8875d0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:67 
#39 0x0000000000627f6f in thread_run (boot_raw=0x7f8cc9c987b0) at /usr/local/src/conda/python-3.11.4/Modules/_threadmodule.c:1092
#40 0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241
#41 0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0
#42 0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6

So, this thread does not hold the GIL but seems to hold the runtime->interpreters.mutex

We have another thread that seems to hold the GIL:

Thread 388 (Thread 0x7f8b62f7f700 (LWP 120217) ""):
#0  0x00007f8ce83afb3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f8ce83afbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f8ce83afc6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0                                                                                                                                                                #3  0x00000000004ea969 in PyThread_acquire_lock_timed (intr_flag=<optimized out>, microseconds=<optimized out>, lock=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:497
#4  PyThread_acquire_lock (lock=0x18bd040, waitflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:751
#5  0x00000000005f7266 in tstate_delete_common (tstate=0x7f8cc417c250, gilstate=0x8791d8 <_PyRuntime+568>) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1080
#6  0x00000000005f6d93 in _PyThreadState_DeleteCurrent (tstate=0x7f8cc417c250) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1132
#7  0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241
#8  0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6

See the native_thread_id in the following gdb output.

(gdb) p *PyThreadState_Get()
$10 = {prev = 0x7f8cc4269680, next = 0x7f8c94483340, interp = 0x8875d8 <_PyRuntime+58936>, _initialized = 1, _static = 0, recursion_remaining = 1000, recursion_limit = 1000, recursion_headroom = 0, tracing = 0, tracing_what = 0, 
  cframe = 0x7f8cc417c3a0, c_profilefunc = 0x0, c_tracefunc = 0x0, c_profileobj = 0x0, c_traceobj = 0x0, curexc_type = 0x0, curexc_value = 0x0, curexc_traceback = 0x0, exc_info = 0x7f8cc417c390, dict = 0x0, gilstate_counter = 1, 
  async_exc = 0x0, thread_id = 140236637599488, native_thread_id = 120217, trash_delete_nesting = 0, trash_delete_later = 0x0, on_delete = 0x628f10 <release_sentinel>, on_delete_data = 0x7f8b974b1ee0, 
  coroutine_origin_tracking_depth = 0, async_gen_firstiter = 0x0, async_gen_finalizer = 0x0, context = 0x0, context_ver = 1, id = 1085, trace_info = {code = 0x0, bounds = {ar_start = 0, ar_end = 0, ar_line = 0, opaque = {
        computed_line = 0, lo_next = 0x0, limit = 0x0}}}, datastack_chunk = 0x7f8b99522000, datastack_top = 0x7f8b99522020, datastack_limit = 0x7f8b99526000, exc_state = {exc_value = 0x0, previous_item = 0x0}, root_cframe = {
    use_tracing = 0 '\000', current_frame = 0x0, previous = 0x0}}

Your environment

Python 3.11.4 on a CentOS 7 machine with conda-forge binaries

Linked PRs

@mmohrhard mmohrhard added the type-bug An unexpected behavior, bug, or error label Jul 19, 2023
@sunmy2019
Copy link
Member

Oh, after several modifications to the interpreter, sys._current_frames() can trigger GC with the lock held.

Anything can happen during GC.

In this case, it releases GIL and allows other threads to get deadlocked. Also, GC itself can already trigger deadlocks.


"GC being triggered with a lock held" has caused serval problems. See the discussion in #106207

@yilei
Copy link
Contributor

yilei commented Dec 15, 2023

We are also seeing this issue at work. I found one reproducible example running Python 3.11.7 (see below), it's much more difficult to reproduce on higher spec machines.

On my 8-core cloud VM, I had to run 8 processes at the same time. After a few minutes, a few processes no longer produce output. Although they are still using some CPUs doing something.

Since GC is involved here, I haven't been able to reproduce it on Python 3.12.1, likely due to the same 3.12 change to only run GC on the eval breaker as mentioned by @pablogsal here: #106905 (comment)

import argparse
import datetime
import difflib
import gc
import random
import sys
import threading
import time


def do_some_work(iterations, sleep_before_work=False):
    if sleep_before_work:
        # Wait before starting the work, so more concurrent threads could be created
        time.sleep(1)
    for _ in range(iterations):
        seq_a = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
        seq_b = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
        udiff = list(difflib.unified_diff(seq_a, seq_b))
        udiff.append("THE END")
        fibonacci(16)


def fibonacci(n):
    if n <= 2:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)


def gc_callback(phase, info):
    del phase, info
    do_some_work(1)


gc.callbacks.append(gc_callback)


class Runner:
    def __init__(self):
        self.watch = True

    def watch_frames(self):
        while self.watch:
            time.sleep(0.1)
            cf = sys._current_frames()
            print(
                f">>>> {datetime.datetime.now()} number of frames: {len(cf)}",
                flush=True,
            )

    def run(self, num_runs, num_threads, work_load):
        self.watch = True
        watcher = threading.Thread(target=self.watch_frames)
        watcher.start()
        for i in range(num_runs):
            start = time.time()
            print(
                f">>>> {datetime.datetime.now()} Run: {i}/{num_runs} started",
                flush=True,
            )

            ts = []
            for _ in range(num_threads):
                t = threading.Thread(target=do_some_work, args=(work_load, True))
                t.start()
                ts.append(t)

            for t in ts:
                t.join()

            print(
                f">>>> {datetime.datetime.now()} Run: {i}/{num_runs} took"
                f" {time.time() - start} seconds",
                flush=True,
            )

        self.watch = False
        watcher.join()


def main():
    print(f">>>> Running on {sys.version_info=}")
    print(f">>>> {datetime.datetime.now()} START")
    parser = argparse.ArgumentParser()
    parser.add_argument("--num_runs", type=int, default=1000)
    parser.add_argument("--num_threads", type=int, default=512)
    parser.add_argument("--work_load", type=int, default=1)
    ns = parser.parse_args()
    runner = Runner()
    runner.run(ns.num_runs, ns.num_threads, ns.work_load)
    print(f">>>> {datetime.datetime.now()} DONE")


if __name__ == "__main__":
    main()

@gpshead gpshead added the 3.11 only security fixes label Dec 16, 2023
@yilei
Copy link
Contributor

yilei commented Dec 16, 2023

FWIW, backporting #97920 to 3.11 makes the above example no longer reproducible.

@colesbury
Copy link
Contributor

Should we temporarily disable the GC during _PyThread_CurrentFrames() and _PyThread_CurrentExceptions() in 3.11?

@diegorusso
Copy link
Contributor

I also confirm that back porting #97920 to 3.11 fixes the issue I have with Dask (#116969)

diegorusso added a commit to diegorusso/cpython that referenced this issue Mar 28, 2024
When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

It has been suggested to disable GC during the
_PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.

Jira: ENTLLT-7285

Change-Id: I2548d07803fc98db8717057ae3006e6af68b2f86
diegorusso added a commit to diegorusso/cpython that referenced this issue Mar 28, 2024
When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

It has been suggested to disable GC during the
_PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.

Jira: ENTLLT-7285

Change-Id: I2548d07803fc98db8717057ae3006e6af68b2f86
diegorusso added a commit to diegorusso/cpython that referenced this issue Mar 28, 2024
When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

It has been suggested to disable GC during the
_PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.
@diegorusso
Copy link
Contributor

@colesbury I've created a PR #117332 trying to fix this issue by implementing what you have suggested.
I tested the code and it is not deadlocking anymore.

Please let me know if this looks reasonable or if I am completely off-road :)

diegorusso added a commit to diegorusso/cpython that referenced this issue Apr 5, 2024
When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

By disabling the GC during the _PyThread_CurrentFrames() and
_PyThread_CurrentExceptions() calls fixes the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants