-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_concurrent_futures: test_gh105829_should_not_deadlock_if_wakeup_pipe_full() hangs on ARM64 macOS 3.x #109917
Comments
I have only looked very briefly but I suspect this is a symptom of a race due to the wakeup mock that is used to force blocking behaviour. I will investigate further. |
If my guess for what goes wrong is correct then I think this would fix it: --- a/Lib/test/test_concurrent_futures/test_deadlock.py
+++ b/Lib/test/test_concurrent_futures/test_deadlock.py
@@ -280,11 +280,12 @@ def wakeup(self):
super().wakeup()
def clear(self):
+ super().clear()
try:
while True:
self._dummy_queue.get_nowait()
except queue.Empty:
- super().clear()
+ pass
with (unittest.mock.patch.object(futures.process._ExecutorManagerThread,
'run', mock_run), I cant reproduce the failure locally though so I might be completely wrong about what's going on. |
The only purpose of _ThreadWakeup is to notify _ExecutorManagerThread when something bad happen: def wait_result_broken_or_wakeup(self):
...
wakeup_reader = self.thread_wakeup._reader
...
ready = mp.connection.wait(readers + worker_sentinels)
...
elif wakeup_reader in ready:
is_broken = False The number of wakeup() calls doesn't matter: 1 or 1000 is the same. I don't get test_gh105829_should_not_deadlock_if_wakeup_pipe_full(). It blocks if wakeup() is called more than once? Why? _ThreadWakeup.wakeup() has a strange implementation: it works 0 bytes into the writer side of the pipe: class _ThreadWakeup:
def __init__(self):
self._closed = False
self._reader, self._writer = mp.Pipe(duplex=False)
def wakeup(self):
if not self._closed:
self._writer.send_bytes(b"") Why not sending a byte and ignoring when the pipe is full? asyncio has a similar "self-pipe" pattern, but it just ignores when the pipe is full: class BaseSelectorEventLoop(base_events.BaseEventLoop):
def _write_to_self(self):
try:
self._csock.send(b'\0')
except OSError:
if self._debug:
logger.debug("Fail to write a null byte into the "
"self-pipe socket",
exc_info=True) Also, there is a simple way to not fill the pipe: write a single byte and then recall in an attribute that a byte was written. |
See also issue gh-105829: concurrent.futures.ProcessPoolExecutor pool deadlocks when submitting many tasks. |
…p.wakeup() Remove test_gh105829_should_not_deadlock_if_wakeup_pipe_full() of test_concurrent_futures.test_deadlock. The test is no longer relevant.
I proposed PR gh-110129 to fix the issue. |
This is not true. When the main thread pushes a new work item it needs to make sure the _ExecutorManagerThread wakes up and moves the work item into the worker process call queue. The push of the work item is not sufficient for that, the _ExecutorManagerThread is only waiting on results and worker process sentinels and this wakeup pipe that can be used to "manually" trigger wakeup. In the current implementation the main thread pushes one object into the wakeup pipe for every new work item and the wakeup pipe is then fully drained when the _ExecutorManagerThread wakes up. This makes sure new work items are moved into the call queue and made available to the workers.
We started with that type of fix in #108513, please read through the commits and discussion for the full background on why we ended up with the current state.
The tradeoff done was to prefer removing the lock vs avoiding the write of the pipe. @tomMoral thinks it is possible to do both and that is likely true but we went with the safer option for the time being to get the deadlock fix out the door. There are some interesting corner cases surrounding what happens when the call queue is full and the management thread cannot not actually move any new work at that time. It is likely ok (assuming it is ok without the change) but at least I didn't feel like risking breaking anything. One thing is very clear however, there is a lot of contention while new work is submitted from the main thread and the workers can easily get starved during this period because the _ExecutorManagerThread cannot move work into their queue fast enough. Improvements here is probably good but would require better tests and benchmarks. |
…p.wakeup() Replace test_gh105829_should_not_deadlock_if_wakeup_pipe_full() test which was mocking too many concurrent.futures internals with a new test_wakeup() functional test. Co-Authored-By: elfstrom <elfstrom@users.noreply.github.com>
…p.wakeup() Replace test_gh105829_should_not_deadlock_if_wakeup_pipe_full() test which was mocking too many concurrent.futures internals with a new test_wakeup() functional test. Co-Authored-By: elfstrom <elfstrom@users.noreply.github.com>
Add a lock to _ThreadWakeup which is used internally by _ThreadWakeup methods to serialize method calls to make the API thread safe. No longer use the shutdown lock to access _ThreadWakeup.
Hum. You are describing a bad synchronization between two threads. Usually the solution to synchronization is adding a new lock. I wrote PR gh-110137 to add a lock to _ThreadWakeup to make it thread safe. Currently, the shutdown lock is used to access _ThreadWakeup, but IMO it's a bad idea to attempt to protect too many things with an unique lock. |
The purpose of the wakeup object is not about synchronization (atomicity or mutual exclusiveness). The only goal is to communicate that new work may be available. This is currently done by sending messages on a Pipe which is thread safe. The only time the main thread and the _ExecutorManagerThread reads/writes shared state is during shutdown (the |
The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words).
The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words).
…honGH-110306) The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words). (cherry picked from commit a376a72) Co-authored-by: elfstrom <elfstrom@users.noreply.github.com>
…honGH-110306) The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words). (cherry picked from commit a376a72) Co-authored-by: elfstrom <elfstrom@users.noreply.github.com>
…-110306) (#110316) gh-109917: Fix test instability in test_concurrent_futures (GH-110306) The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words). (cherry picked from commit a376a72) Co-authored-by: elfstrom <elfstrom@users.noreply.github.com>
…-110306) (#110315) gh-109917: Fix test instability in test_concurrent_futures (GH-110306) The test had an instability issue due to the ordering of the dummy queue operation and the real wakeup pipe operations. Both primitives are thread safe but not done atomically as a single update and may interleave arbitrarily. With the old order of operations this can lead to an incorrect state where the dummy queue is full but the wakeup pipe is empty. By swapping the order in clear() I think this can no longer happen in any possible operation interleaving (famous last words). (cherry picked from commit a376a72) Co-authored-by: elfstrom <elfstrom@users.noreply.github.com>
Fixed by a376a72 |
ARM64 macOS 3.x:
The test passed when re-run in verbose mode:
build: https://buildbot.python.org/all/#/builders/725/builds/5749
Linked PRs
The text was updated successfully, but these errors were encountered: