gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106
gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106efroemling wants to merge 3 commits intopython:mainfrom
Conversation
…ck() Add a keyword-only `max_threads` argument to `dump_traceback()` and `dump_traceback_later()`, defaulting to 100 to preserve existing behavior. Allows server processes with many worker threads to dump beyond the historical 100-thread cap (previously a hardcoded `MAX_NTHREADS = 100` in `Python/traceback.c`). The cap matters in practice: tstates are prepended to the PyInterpreterState linked list, so the dump walks newest-first. With more than 100 threads alive, the main thread (oldest, at the tail) is silently elided from watchdog dumps -- exactly the thread that's usually wanted. The hardcoded value is moved to a new internal macro `_Py_TRACEBACK_MAX_NTHREADS` in `pycore_traceback.h` so the in-tree fatal-signal callers all reference one source of truth.
Documentation build overview
6 files changed ·
|
vstinner
left a comment
There was a problem hiding this comment.
I would prefer to also add max_threads parameter to enable().
| if (all_threads == 1) { | ||
| (void)_Py_DumpTracebackThreads(fd, NULL, tstate); | ||
| /* Fatal-signal path has no caller-supplied cap; use the | ||
| historical default. */ |
There was a problem hiding this comment.
Can you modify faulthandler.enable() to add a max_threads parameter and store the value in fatal_error? See _faulthandler_runtime_state structure in Include/internal/pycore_faulthandler.h.
| #define MAX_NTHREADS 100 | ||
| /* The historical default thread-dump cap is declared as | ||
| _Py_TRACEBACK_MAX_NTHREADS in pycore_traceback.h so callers of | ||
| _Py_DumpTracebackThreads can reference it directly. */ |
There was a problem hiding this comment.
I don't think that this comment is useful. I suggest removing it.
| * It is limited to 100 frames per thread, and by default to 100 threads | ||
| total in newest-first order (configurable via *max_threads*). |
There was a problem hiding this comment.
| * It is limited to 100 frames per thread, and by default to 100 threads | |
| total in newest-first order (configurable via *max_threads*). | |
| * It is limited to 100 frames per thread, and 100 threads | |
| (configurable via *max_threads*). |
| Dump the tracebacks of all threads into *file*. If *all_threads* is | ||
| ``False``, dump only the current thread. | ||
| ``False``, dump only the current thread. *max_threads* caps the number | ||
| of threads dumped; a ``...`` marker is written if there are more. |
There was a problem hiding this comment.
a
...marker is written if there are more
That's an implementation detail, please don't document it (remove it from the doc). Same remark for all documentation of this PR.
| .. versionchanged:: 3.5 | ||
| Added support for passing file descriptor to this function. | ||
|
|
||
| .. versionchanged:: 3.15 |
There was a problem hiding this comment.
| .. versionchanged:: 3.15 | |
| .. versionchanged:: next |
| Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback` | ||
| and :func:`faulthandler.dump_traceback_later`, raising the per-call cap on | ||
| the number of threads dumped (previously a hard-coded ``MAX_NTHREADS = 100`` | ||
| in :file:`Python/traceback.c`). Useful for server processes with many | ||
| worker or gRPC threads, where dump order (newest-thread-first) means the | ||
| historical 100-thread cap silently elided the main thread. The default of | ||
| ``100`` preserves existing behavior. |
There was a problem hiding this comment.
| Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback` | |
| and :func:`faulthandler.dump_traceback_later`, raising the per-call cap on | |
| the number of threads dumped (previously a hard-coded ``MAX_NTHREADS = 100`` | |
| in :file:`Python/traceback.c`). Useful for server processes with many | |
| worker or gRPC threads, where dump order (newest-thread-first) means the | |
| historical 100-thread cap silently elided the main thread. The default of | |
| ``100`` preserves existing behavior. | |
| Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback` | |
| and :func:`faulthandler.dump_traceback_later`. |
I don't think that it's needed to mention gRPC usecase here, having it described in the issue is enough. Also, it's not needed to mention that the default is 100 since it's not changed.
| _Py_DumpTracebackThreads(int fd, PyInterpreterState *interp, | ||
| PyThreadState *current_tstate) | ||
| PyThreadState *current_tstate, | ||
| unsigned int max_nthreads) |
There was a problem hiding this comment.
I suggest renaming the parameter to max_threads for consistency.
| output, _ = process.communicate() | ||
| process.wait() | ||
| # Truncation marker is written when the cap is hit. | ||
| self.assertIn(b"...\n", output) |
There was a problem hiding this comment.
Check the whole line:
| self.assertIn(b"...\n", output) | |
| self.assertIn(b"\n...\n", output) |
| t.join() | ||
| """).strip() | ||
| # spawn_python merges stderr into stdout by default. | ||
| with support.SuppressCrashReport(): |
There was a problem hiding this comment.
This process is not supposed to fail. If it fails, I would prefer to use the default crash reporter (such as dumping a coredump file).
| with support.SuppressCrashReport(): | ||
| process = script_helper.spawn_python('-c', code) | ||
| with process: | ||
| output, _ = process.communicate() | ||
| process.wait() |
There was a problem hiding this comment.
assert_python_ok() can be used instead (to make the code shorter):
| with support.SuppressCrashReport(): | |
| process = script_helper.spawn_python('-c', code) | |
| with process: | |
| output, _ = process.communicate() | |
| process.wait() | |
| proc = script_helper.assert_python_ok('-c', code) | |
| output = proc.err |
- Drop _Py_TRACEBACK_MAX_NTHREADS macro; use 0 as sentinel for default 100 inside _Py_DumpTracebackThreads so internal callers don't have to pass the default explicitly. - Rename max_nthreads -> max_threads everywhere for naming consistency with the public Python kwarg. - Add max_threads kwarg to faulthandler.enable(); store in fatal_error.max_threads and pass through faulthandler_dump_traceback to the fatal-signal dump path on both POSIX and Windows. - Drop the three redundant explanatory comments vstinner flagged. - Doc: tighten the limitations bullet, drop implementation-detail mentions of the "..." truncation marker, switch versionchanged directives to "next", document the new enable() kwarg. - Tests: assertEqual exact count, check whole-line "\n...\n" marker, use script_helper.assert_python_ok, drop the default-value test, add test_enable_max_threads exercising the fatal-signal path. - NEWS: trim to two lines, mention all three functions.
|
Thanks for the feedback. |
The fatal-signal handler only dumps the current thread when the GIL is disabled (pythongh-104812 / 3.14 versionchanged), so the truncation marker assertion in test_enable_max_threads fails on free-threading CI runs. Skip it there.
Closes #149085.
Adds a keyword-only
max_threadsargument tofaulthandler.dump_traceback()andfaulthandler.dump_traceback_later(), raising the per-call cap on the number of threads dumped (previously a hardcodedMAX_NTHREADS = 100inPython/traceback.c). Default of 100 preserves existing behavior.Motivation (covered in the issue): on server processes with many worker or gRPC threads, watchdog dumps silently lose the main thread because tstates are prepended to the interpreter's thread list and the cap chops the tail. This was the failure mode that prompted the issue.
Scope per @vstinner's confirmation in the issue:
max_threadsonly; the frame/stack limits raised by @ZeroIntensity are left as-is for now.The hardcoded 100 is moved to a new internal macro
_Py_TRACEBACK_MAX_NTHREADSinpycore_traceback.hso the in-tree fatal-signal callers (faulthandler.c,pylifecycle.c) all share one source of truth.📚 Documentation preview 📚: https://cpython-previews--149106.org.readthedocs.build/