Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition in ThreadChildWatcher (default) and MultiLoopChildWatcher #86276

Closed
sourcejedi mannequin opened this issue Oct 21, 2020 · 8 comments
Closed

race condition in ThreadChildWatcher (default) and MultiLoopChildWatcher #86276

sourcejedi mannequin opened this issue Oct 21, 2020 · 8 comments
Labels
pending The issue will be closed if no feedback is provided topic-asyncio

Comments

@sourcejedi
Copy link
Mannequin

sourcejedi mannequin commented Oct 21, 2020

BPO 42110
Nosy @asvetlov, @1st1, @sourcejedi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-10-21.18:44:25.842>
labels = ['3.10', 'expert-asyncio']
title = 'race condition in ThreadChildWatcher (default) and MultiLoopChildWatcher'
updated_at = <Date 2020-10-22.15:09:14.856>
user = 'https://github.com/sourcejedi'

bugs.python.org fields:

activity = <Date 2020-10-22.15:09:14.856>
actor = 'sourcejedi'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['asyncio']
creation = <Date 2020-10-21.18:44:25.842>
creator = 'sourcejedi'
dependencies = []
files = []
hgrepos = []
issue_num = 42110
keywords = []
message_count = 3.0
messages = ['379229', '379296', '379297']
nosy_count = 3.0
nosy_names = ['asvetlov', 'yselivanov', 'sourcejedi']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42110'
versions = ['Python 3.10']

@sourcejedi
Copy link
Mannequin Author

sourcejedi mannequin commented Oct 21, 2020

## Test program ##

import asyncio
import time
import os
import signal
import sys

# This bug happens with the default, ThreadedChildWatcher
# It also happens with MultiLoopChildWatcher,
# but not the other three watcher types.
#asyncio.set_child_watcher(asyncio.MultiLoopChildWatcher())

# Patch os.kill to call sleep(1) first,
# opening up the window for a race condition
os_kill = os.kill
def kill(p, n):
    time.sleep(1)
    os_kill(p, n)

os.kill = kill

async def main():
    p = await asyncio.create_subprocess_exec(sys.executable, '-c', 'import sys; sys.exit(0)')
    p.send_signal(signal.SIGTERM)
    # cleanup
    await p.wait()

asyncio.run(main())

## Test output ##

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alan-sysop/src/cpython/Lib/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/alan-sysop/src/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "<stdin>", line 3, in main
  File "/home/alan-sysop/src/cpython/Lib/asyncio/subprocess.py", line 138, in send_signal
    self._transport.send_signal(signal)
  File "/home/alan-sysop/src/cpython/Lib/asyncio/base_subprocess.py", line 146, in send_signal
    self._proc.send_signal(signal)
  File "/home/alan-sysop/src/cpython/Lib/subprocess.py", line 2081, in send_signal
    os.kill(self.pid, sig)
  File "<stdin>", line 3, in kill
ProcessLookupError: [Errno 3] No such process

## Tested versions ##

  • v3.10.0a1-121-gc60394c7fc
  • python39-3.9.0-1.fc32.x86_64
  • python3-3.8.6-1.fc32.x86_64

## Race condition ##

main thread vs ThreadedChildWatcher._do_waitpid() thread

p=create_subprocess_exec(...)
			waitpid(...)  # wait for process exit
<Process p exits>
			<waitpid() returns.  Process p is reaped, and no longer exists>
p.send_signal(9)

## Result ##

A signal is sent to p.pid, after p.pid has been reaped by waitpid(). It might raise an error because p.pid no longer exists.

In the worst case the signal will be sent successfully - because an unrelated process has started with the same PID.

## How easy is it to reproduce? ##

It turns out the window for this race condition has been kept short, due to mitigations in the subprocess module. IIUC, the mitigation protects against incorrect parallel use of a subprocess object by multiple threads.

        def send_signal(self, sig):
            # bpo-38630: Polling reduces the risk of sending a signal to the
            # wrong process if the process completed, the Popen.returncode
            # attribute is still None, and the pid has been reassigned
            # (recycled) to a new different process. This race condition can
            # happens in two cases [...]
            self.poll()
            if self.returncode is not None:
                # Skip signalling a process that we know has already died.
                return
            os.kill(self.pid, sig)

## Possible workarounds ##

  • SafeChildWatcher and FastChildWatcher should not have this defect. However we use ThreadedChildWatcher and MultiLoopChildWatcher to support running event loops in different threads.

  • PidfdChildWatcher should not have this defect. It is only available on Linux, kernel version 5.3 or above.

It would be possible to avoid the ThreadedChildWatcher race by using native code and pthread_cancel(), so that the corresponding waitpid() call is canceled before sending a signal. Except the current implementation of pthread_cancel() is also unsound, because of race conditions.

@sourcejedi sourcejedi mannequin added 3.10 only security fixes topic-asyncio labels Oct 21, 2020
@sourcejedi
Copy link
Mannequin Author

sourcejedi mannequin commented Oct 22, 2020

There's one way to fix this in MultiLoopChildWatcher (but not ThreadedChildWatcher). Make sure the waitpid() runs on the same thread that created the subprocess. Prototype: sourcejedi@92f979b

The other approach would be to copy subprocess.wait(timeout) - keep waking up regularly and polling to see if the process has exited yet.

I'm not convinced ThreadedChildWatcher is fixable. You can't call waitpid() in one thread, and let someone call kill() in a different thread.

You could try avoiding calling waitpid() until someone does await asyncio.subprocess.Process.wait(). I think I didn't like it because - what happens if you cancel a wait() task? I think you want to cancel the waitpid() so you could safely kill() again... And we don't have any way to cancel the blocking waitpid() call, at least not from python, definitely not since PEP-475.

@sourcejedi
Copy link
Mannequin Author

sourcejedi mannequin commented Oct 22, 2020

Put the other way, if you wanted to fix this bug in ThreadedChildWatcher, and go as far as allowing cancelling Process.wait(), followed by kill() / send_signal(), then I think you need -

  • siginterrupt(SIGCHLD, 1)
  • not to mind about any random C code that doesn't really handle being interrupted. Like printf(), ho hum. https://groups.google.com/g/comp.unix.programmer/c/QZmFw1VytYs/m/BSBXBHTI1REJ
  • add & use a new call like os.waitpid_interruptible(), which doesn't restart on EINTR, as a workaround for PEP-475.
  • set a "stop" flag for the watcher thread
  • use threading.pthread_kill() (available on *most* python platforms) to interrupt the watcher thread. Spoof SIGCHLD, this will avoid conflicts with a python handler for any other signal.
  • wait for the watcher thread to finish using Thread.join()
  • now you can safely find out whether the child process has been reaped, or whether it's safe to kill it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@saito828koki
Copy link
Contributor

I cannot reproduce the same bug. (Tested environment is 3.10.6)
Is this issue already fixed?

@gvanrossum
Copy link
Member

Is this issue already fixed?

I don't know, the description is hard to follow, and child watchers are a minefield (e.g. #94597).

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Oct 9, 2022

I am unable to reproduce the reported race. FWIW PidfdChildWatcher is now used by default wherever supported which does not has any of these issue.

@kumaraditya303 kumaraditya303 added pending The issue will be closed if no feedback is provided and removed 3.10 only security fixes labels Oct 9, 2022
@gvanrossum
Copy link
Member

But ThreadedChildWatcher is still used on macOS and other non-Linux UNIX variants. I think the OP is claiming that the race condition is in subprocess.py (not the asyncio version) and due to waiting for a process in a thread that didn't create the process. Maybe we can reproduce this outside asyncio?

@gvanrossum
Copy link
Member

This was fixed by https://github.com/python/cpython/pull/20010/files, in particular the try/except ProcessLookupError around the os.kill() call added there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided topic-asyncio
Projects
Status: Done
Development

No branches or pull requests

3 participants