-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
Description
Bug report
Bug description:
When a multiprocessing.Process is used from within a thread managed via concurrent.futures.ThreadPoolExecutor, multiple methods start producing unexpected and even contradictory results. Specifically, my example below exposes these unexpected behaviors:
proc.join(timeout)returns even though the process has not terminated and the timeout has not expired.proc.close()throws aValueError, claiming the process is still running, even thoughproc.kill()andproc.join()previously returned for the process.EDIT: this was probably a known side effect of using theproc.join(timeout)runs into a timeout, even though the process should have terminated before the timeout. This might not be a real bug, since scheduling delays, execution times and contention around the available threads in the pool can cause unexpected delays. However, in the test setup below, each thread needs to process 4 tasks on average and each tasks should not take much longer than 1s, i.e. 4s in total. With a timeout of 10s in total, this leaves 6s seconds for any execution or scheduling delays, or 1.5s per task. This seems to me like too big a margin of error to fail this consistently.forkstart method in a multithreaded program, which is unsupported anyway.
I used this program to reproduce these effects.
import concurrent.futures
import multiprocessing
import time
import traceback
TASK_COUNT = 64
MAX_THREADS = 16
def sub_proc():
time.sleep(1)
def timeout_proc(expected_end_time: float):
proc = multiprocessing.Process(target=sub_proc)
proc.start()
try:
timeout = expected_end_time - time.monotonic()
proc.join(timeout)
observed_end_time = time.monotonic()
timeout_remaining = expected_end_time - observed_end_time
exitcode = proc.exitcode
if exitcode is None:
print(f"WARNING: timeout expired: {timeout_remaining:.4f}s")
assert exitcode is not None or observed_end_time >= expected_end_time
finally:
proc.kill()
proc.join()
proc.close()
if __name__ == '__main__':
# EDIT: The problems only manifest with spawn or fork, though the latter doesn't seem relevant, since it is already expected to cause problems in multithreaded programs.
multiprocessing.set_start_method('spawn')
end_time = time.monotonic() + 10
with concurrent.futures.ThreadPoolExecutor(MAX_THREAD) as e:
futs = [e.submit(timeout_proc, end_time) for _ in range(TASK_COUNT)]
for fut in futs:
try:
fut.result()
except Exception as e:
traceback.print_exception(e)The unexpected behaviors can be recognized from the following symptoms:
-
proper timeouts of
proc.join(timeout)cause messages like this, with a negative number:WARNING: timeout expired: -0.0072s -
if the time is instead positive, it means that
proc.join(timeout)returned before either the timeout expired or the process terminated. In addition, we get an assertion failure:Traceback (most recent call last): File "/home/sven/test.py", line 35, in <module> fut.result() ~~~~~~~~~~^^ File "/usr/lib/python3.13/concurrent/futures/_base.py", line 456, in result return self.__get_result() ~~~~~~~~~~~~~~~~~^^ File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run result = self.fn(*self.args, **self.kwargs) File "/home/sven/test.py", line 23, in timeout_proc assert exitcode is not None or observed_end_time >= expected_end_time ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError -
Unexpected fails of
proc.close()cause an additional exception:ValueError: Cannot close a process while it is still running. You should first call join() or terminate(). Traceback (most recent call last): File "/home/sven/test.py", line 23, in timeout_proc assert exitcode is not None or observed_end_time >= expected_end_time ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/sven/test.py", line 35, in <module> fut.result() ~~~~~~~~~~^^ File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result return self.__get_result() ~~~~~~~~~~~~~~~~~^^ File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run result = self.fn(*self.args, **self.kwargs) File "/home/sven/test.py", line 27, in timeout_proc proc.close() ~~~~~~~~~~^^ File "/usr/lib/python3.13/multiprocessing/process.py", line 181, in close raise ValueError("Cannot close a process while it is still running. " "You should first call join() or terminate().") ValueError: Cannot close a process while it is still running. You should first call join() or terminate()curiously, this exception never occurs when the assertion didn't fail.
At first glance this behavior seems similar to #130895, but that issue involves calling join on same process concurrently from multiple threads, whereas in my case, each process is owned by a single thread and only interacted with by the owning thread.
My example program above doesn't produce the observed behaviors deterministically, so you may need to run it more than once, though in my environment it always produces at least some unexpected results and some runs even produce all three kinds of unexpected results. EDIT: Switching to the spawn start method has reduced the frequency of the problems noticably, so you might need to run the example a few times, before seeing all symptoms.
The rate at which these unexpected behaviors occur seems to correlate with the ratio of TASK_COUNT / MAX_THREADS, with more tasks per thread causing more unexpected results. I've never seen any of these effects when MAX_THREADS >= TASK_COUNT, nor when starting threads by hand (through threading.Thread).
CPython versions tested on:
3.13
Operating systems tested on:
Linux