Skip to content

gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106

Open
efroemling wants to merge 3 commits intopython:mainfrom
efroemling:faulthandler-max-threads-kwarg
Open

gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106
efroemling wants to merge 3 commits intopython:mainfrom
efroemling:faulthandler-max-threads-kwarg

Conversation

@efroemling
Copy link
Copy Markdown

@efroemling efroemling commented Apr 28, 2026

Closes #149085.

Adds a keyword-only max_threads argument to faulthandler.dump_traceback() and faulthandler.dump_traceback_later(), raising the per-call cap on the number of threads dumped (previously a hardcoded MAX_NTHREADS = 100 in Python/traceback.c). Default of 100 preserves existing behavior.

Motivation (covered in the issue): on server processes with many worker or gRPC threads, watchdog dumps silently lose the main thread because tstates are prepended to the interpreter's thread list and the cap chops the tail. This was the failure mode that prompted the issue.

Scope per @vstinner's confirmation in the issue: max_threads only; the frame/stack limits raised by @ZeroIntensity are left as-is for now.

The hardcoded 100 is moved to a new internal macro _Py_TRACEBACK_MAX_NTHREADS in pycore_traceback.h so the in-tree fatal-signal callers (faulthandler.c, pylifecycle.c) all share one source of truth.


📚 Documentation preview 📚: https://cpython-previews--149106.org.readthedocs.build/

…ck()

Add a keyword-only `max_threads` argument to `dump_traceback()` and
`dump_traceback_later()`, defaulting to 100 to preserve existing
behavior. Allows server processes with many worker threads to dump
beyond the historical 100-thread cap (previously a hardcoded
`MAX_NTHREADS = 100` in `Python/traceback.c`).

The cap matters in practice: tstates are prepended to the
PyInterpreterState linked list, so the dump walks newest-first. With
more than 100 threads alive, the main thread (oldest, at the tail) is
silently elided from watchdog dumps -- exactly the thread that's
usually wanted.

The hardcoded value is moved to a new internal macro
`_Py_TRACEBACK_MAX_NTHREADS` in `pycore_traceback.h` so the in-tree
fatal-signal callers all reference one source of truth.
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 28, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32456372 | 📁 Comparing 4689f4e against main (8a8d737)

  🔍 Preview build  

6 files changed · ± 6 modified

± Modified

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to also add max_threads parameter to enable().

Comment thread Modules/faulthandler.c Outdated
if (all_threads == 1) {
(void)_Py_DumpTracebackThreads(fd, NULL, tstate);
/* Fatal-signal path has no caller-supplied cap; use the
historical default. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify faulthandler.enable() to add a max_threads parameter and store the value in fatal_error? See _faulthandler_runtime_state structure in Include/internal/pycore_faulthandler.h.

Comment thread Python/traceback.c Outdated
#define MAX_NTHREADS 100
/* The historical default thread-dump cap is declared as
_Py_TRACEBACK_MAX_NTHREADS in pycore_traceback.h so callers of
_Py_DumpTracebackThreads can reference it directly. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this comment is useful. I suggest removing it.

Comment thread Doc/library/faulthandler.rst Outdated
Comment on lines +34 to +35
* It is limited to 100 frames per thread, and by default to 100 threads
total in newest-first order (configurable via *max_threads*).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* It is limited to 100 frames per thread, and by default to 100 threads
total in newest-first order (configurable via *max_threads*).
* It is limited to 100 frames per thread, and 100 threads
(configurable via *max_threads*).

Comment thread Doc/library/faulthandler.rst Outdated
Dump the tracebacks of all threads into *file*. If *all_threads* is
``False``, dump only the current thread.
``False``, dump only the current thread. *max_threads* caps the number
of threads dumped; a ``...`` marker is written if there are more.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a ... marker is written if there are more

That's an implementation detail, please don't document it (remove it from the doc). Same remark for all documentation of this PR.

Comment thread Doc/library/faulthandler.rst Outdated
.. versionchanged:: 3.5
Added support for passing file descriptor to this function.

.. versionchanged:: 3.15
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionchanged:: 3.15
.. versionchanged:: next

Comment on lines +1 to +7
Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback`
and :func:`faulthandler.dump_traceback_later`, raising the per-call cap on
the number of threads dumped (previously a hard-coded ``MAX_NTHREADS = 100``
in :file:`Python/traceback.c`). Useful for server processes with many
worker or gRPC threads, where dump order (newest-thread-first) means the
historical 100-thread cap silently elided the main thread. The default of
``100`` preserves existing behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback`
and :func:`faulthandler.dump_traceback_later`, raising the per-call cap on
the number of threads dumped (previously a hard-coded ``MAX_NTHREADS = 100``
in :file:`Python/traceback.c`). Useful for server processes with many
worker or gRPC threads, where dump order (newest-thread-first) means the
historical 100-thread cap silently elided the main thread. The default of
``100`` preserves existing behavior.
Add a *max_threads* keyword argument to :func:`faulthandler.dump_traceback`
and :func:`faulthandler.dump_traceback_later`.

I don't think that it's needed to mention gRPC usecase here, having it described in the issue is enough. Also, it's not needed to mention that the default is 100 since it's not changed.

Comment thread Python/traceback.c Outdated
_Py_DumpTracebackThreads(int fd, PyInterpreterState *interp,
PyThreadState *current_tstate)
PyThreadState *current_tstate,
unsigned int max_nthreads)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming the parameter to max_threads for consistency.

Comment thread Lib/test/test_faulthandler.py Outdated
output, _ = process.communicate()
process.wait()
# Truncation marker is written when the cap is hit.
self.assertIn(b"...\n", output)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the whole line:

Suggested change
self.assertIn(b"...\n", output)
self.assertIn(b"\n...\n", output)

Comment thread Lib/test/test_faulthandler.py Outdated
t.join()
""").strip()
# spawn_python merges stderr into stdout by default.
with support.SuppressCrashReport():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process is not supposed to fail. If it fails, I would prefer to use the default crash reporter (such as dumping a coredump file).

Comment thread Lib/test/test_faulthandler.py Outdated
Comment on lines +753 to +757
with support.SuppressCrashReport():
process = script_helper.spawn_python('-c', code)
with process:
output, _ = process.communicate()
process.wait()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_python_ok() can be used instead (to make the code shorter):

Suggested change
with support.SuppressCrashReport():
process = script_helper.spawn_python('-c', code)
with process:
output, _ = process.communicate()
process.wait()
proc = script_helper.assert_python_ok('-c', code)
output = proc.err

- Drop _Py_TRACEBACK_MAX_NTHREADS macro; use 0 as sentinel for default
  100 inside _Py_DumpTracebackThreads so internal callers don't have to
  pass the default explicitly.
- Rename max_nthreads -> max_threads everywhere for naming consistency
  with the public Python kwarg.
- Add max_threads kwarg to faulthandler.enable(); store in
  fatal_error.max_threads and pass through faulthandler_dump_traceback
  to the fatal-signal dump path on both POSIX and Windows.
- Drop the three redundant explanatory comments vstinner flagged.
- Doc: tighten the limitations bullet, drop implementation-detail
  mentions of the "..." truncation marker, switch versionchanged
  directives to "next", document the new enable() kwarg.
- Tests: assertEqual exact count, check whole-line "\n...\n" marker,
  use script_helper.assert_python_ok, drop the default-value test,
  add test_enable_max_threads exercising the fatal-signal path.
- NEWS: trim to two lines, mention all three functions.
@efroemling
Copy link
Copy Markdown
Author

efroemling commented Apr 28, 2026

Thanks for the feedback.
I took a pass and tried to address everything (added max_threads to faulthandler.enable(), scaled back excessive comments, removed _Py_TRACEBACK_MAX_THREADS in favor of 0 for historical default, etc.).
Please holler if I missed anything or you notice anything else I can clean up or improve.

The fatal-signal handler only dumps the current thread when the GIL
is disabled (pythongh-104812 / 3.14 versionchanged), so the truncation
marker assertion in test_enable_max_threads fails on free-threading
CI runs. Skip it there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

faulthandler: make per-call thread dump cap configurable

2 participants