-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-139462: Make the ProcessPoolExecutor BrokenProcessPool exception report which child process terminated #139486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Lib/concurrent/futures/process.py
Outdated
if p.exitcode: # Report any nonzero exit codes | ||
errors.append(f"Process {p.pid} terminated abruptly with exit code {p.exitcode}") | ||
if errors: | ||
bpe.__cause__ = _RemoteTraceback("\n".join(errors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without having swapped in my context on this code... the above cause is not None case surrounds the value within with \n''' joined_value '''
- should we do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another way to think about this... can this logic just set cause to be this errors list so there's only a single _RemoteTraceback construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - initially I was just aiming to modify as little code as possible 😄 but you're right that this is worth a small refactor. Let me know what you think.
Only possible downside is that now the traceback looks a little more funky. Do you know why we have the '''
and the newlines this way?
Lib.concurrent.futures.process._RemoteTraceback:
'''
Process 54534 terminated abruptly with exit code 99'''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add that I'm more than happy to keep the formatting like this. Consistency is good, and I'm sure that there is at least one user out there directly parsing _RemoteTraceback strings whose use case would break if we made any modifications :)
Just want to confirm that this all looks good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the bot, please add a news entry, and please also update the "What's New in Python 3.15" document.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just down to docs nitpicks for me. The rest looks good.
Misc/NEWS.d/next/Library/2025-10-02-22-29-00.gh-issue-139462.VZXUHe.rst
Outdated
Show resolved
Hide resolved
There was a fun issue where the precommit merge conflict checker thought that a perfectly legitimate string was a problem:
I removed one of the |
Thanks @ZeroIntensity, appreciate all the feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good to me. I'll wait for @gpshead as the multiprocessing expert to decide on the weird triple-quotes in the message.
Hmm, something seems to be causing the tests to hang. |
Hi @ZeroIntensity and @gpshead! Tests are passing, let me know if there are any other changes you'd like me to make, or if this is good to go. |
cause_str = ''.join(cause) | ||
else: | ||
# No cause known, synthesize from child process exitcodes | ||
errors = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there really cases where multiple processes fail together ? If not, the errors
list does not seem necessary. Otherwise, a test would be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible! Doing
with ProcessPoolExecutor(max_workers=2) as executor:
futures = [
executor.submit(os._exit, 99),
executor.submit(os._exit, 100),
]
for future in as_completed(futures):
try:
future.result()
except BrokenProcessPool as e:
traceback.print_exception(e)
sometimes gives me
Lib.concurrent.futures.process._RemoteTraceback:
'''
Process 84477 terminated abruptly with exit code 100
Process 84478 terminated abruptly with exit code 99'''
But this is basically a race between the subprocesses terminating and when we build the traceback. Testing this in a non-flaky way would be tricky at best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original concern when deciding to report every known termination was that we could potentially end up with a "real" failure and a "red herring" failure, and since we can't say for sure which happened first or which was more important, it would be safer to just dump all known failures into the traceback. And the total size of the traceback would be bounded by the number of processes that could terminate at the same time, i.e.
On Windows, max_workers must be less than or equal to 61. If it is not then ValueError will be raised. If max_workers is None, then the default chosen will be at most 61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm the flaky number of catched failed processes. Sometimes there is only one ...
I am wondering if we should not insert a brief comment. It's up to you.
See #139462 for more context.
Short summary: prior to this change, if a child process segfaulted when running in a concurrent.futures.ProcessPoolExecutor, the user would get a BrokenProcessPool exception with no information about which child process terminated or why.
In order to improve the debugging experience, this change attempts to report which child process terminated and with what exit code. For instance, if I have a worker process that segfaults, I'll now see something like: