Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessPoolExecutor deadlock when a child process crashes while data is being sent in call queue #94777

Closed
lpaulot opened this issue Jul 12, 2022 · 1 comment · Fixed by #94784
Labels
3.11 only security fixes topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@lpaulot
Copy link
Contributor

lpaulot commented Jul 12, 2022

Bug report

When using a ProcessPoolExecutor with forked child processes, if one of the child processes suddenly dies (segmentation fault, not a Python exception) and if simultaneously data is being sent into the call queue, then the parent process hangs forever.

Reproduction

import ctypes
from concurrent.futures import ProcessPoolExecutor


def segfault():
    ctypes.string_at(0)


def func(i, data):
    print(f"Start {i}.")
    if i == 1:
        segfault()
    print(f"Done {i}.")
    return i


data = list(range(100_000_000))
count = 10
with ProcessPoolExecutor(2) as pool:
    list(pool.map(func, range(count), [data] * count))
print(f"OK")

In Python 3.8.10 it raises a BrokenProcessPool exception whereas in 3.9.13 and 3.10.5 it hangs.

Analysis

When a crash happens in a child process, all workers are terminated and they stop reading in communication pipes. However if data is being send in the call queue, the call queue thread which writes data from buffer to pipe (multiprocessing.queues.Queue._feed) can get stuck in send_bytes(obj) when the unix pipe it's writing to is full. _ExecutorManagerThread is blocked in self.join_executor_internals() on line

self.call_queue.join_thread()
(called from self.terminate_broken()). The main thread itself is blocked on
self._executor_manager_thread.join()
coming from the __exit__ method of the Executor.

Proposed solution

Drain call queue buffer either in terminate_broken method before calling join_executor_internals or in queue close method.
I will create a pull request with a possible implementation.

Your environment

  • CPython versions tested on: reproduced in 3.10.5 and 3.9.13 (works well in 3.8.10: BrokenProcessPool exception)
  • Operating system and architecture: Linux, x86_64

Linked PRs

@lpaulot lpaulot added the type-bug An unexpected behavior, bug, or error label Jul 12, 2022
@lpaulot
Copy link
Contributor Author

lpaulot commented Jul 12, 2022

It was probably introduced by #31913 .
Call queue reader used to be closed in self.call_queue.close() in join_executor_internals() before calling self.call_queue.join_thread(). In pull request #94784 I propose to close this reader end only in case of an already broken pool.

@lpaulot lpaulot changed the title ProcessPoolExecutor hangs when a child process crashes while data is being sent in call queue ProcessPoolExecutor deadlock when a child process crashes while data is being sent in call queue Jul 13, 2022
gpshead pushed a commit that referenced this issue Jul 10, 2023
Fixes a hang in multiprocessing process pool executor when a child process crashes and code could otherwise block on writing to the pipe.  See GH-94777 for more details.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 10, 2023
Fixes a hang in multiprocessing process pool executor when a child process crashes and code could otherwise block on writing to the pipe.  See pythonGH-94777 for more details.
(cherry picked from commit 6782fc0)

Co-authored-by: Louis Paulot <55740424+lpaulot@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 10, 2023
Fixes a hang in multiprocessing process pool executor when a child process crashes and code could otherwise block on writing to the pipe.  See pythonGH-94777 for more details.
(cherry picked from commit 6782fc0)

Co-authored-by: Louis Paulot <55740424+lpaulot@users.noreply.github.com>
carljm pushed a commit that referenced this issue Jul 10, 2023
)

gh-94777: Fix deadlock in ProcessPoolExecutor (GH-94784)

Fixes a hang in multiprocessing process pool executor when a child process crashes and code could otherwise block on writing to the pipe.  See GH-94777 for more details.
(cherry picked from commit 6782fc0)

Co-authored-by: Louis Paulot <55740424+lpaulot@users.noreply.github.com>
@gpshead gpshead added the 3.11 only security fixes label Jul 10, 2023
carljm pushed a commit that referenced this issue Jul 10, 2023
)

gh-94777: Fix deadlock in ProcessPoolExecutor (GH-94784)

Fixes a hang in multiprocessing process pool executor when a child process crashes and code could otherwise block on writing to the pipe.  See GH-94777 for more details.
(cherry picked from commit 6782fc0)

Co-authored-by: Louis Paulot <55740424+lpaulot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants