-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not immediately SIGKILL subprocess child processes upon ^C #70130
Comments
Python 3.3 introduces timeout support in subprocess.call, implemented by sending a SIGKILL if the Popen.wait is interrupted by a TimeoutExpired exception. However, the "except" clause is too broad, and will, for instance, trigger on a KeyboardInterrupt. For practical purposes, this means that sending a Ctrl-C to a python program before 3.3 sent a SIGINT to both the parent and subprocess.call()d child, whereas under 3.3+ sends a SIGINT _and_ a SIGKILL to the child. The child will not be able to clean up appropriately. For a real world example of this, see http://stackoverflow.com/q/34458583/132382 The fix is, I think, simply changing the clause to "except TimeoutExpired". At least, that works for me. See attached patch. |
The code was introduced to solve bpo-12494, so I'm adding Victor to weigh in. |
If I understand correctly, the _try_wait mechanics (or 3.5's syscall behavior) already handle EINTR the way we way: ignore it and try wait()ing again. So, this patch would kill only on a timeout, and never on another error like Ctrl-C, a UserDefinedTimeoutException from a signal handler, etc. That's probably the lesser of two evils, the other being a SIGKILL against an arbitrary child process. Better to document that a non-timeout-parameter interruption to subprocess.call will separate the parent from its child, than to hard kill arbitrary programs when a polite SIGINT was intended. |
Doesn’t this scenario apply equally to run(), or check_output() in 3.4? I suspect the explicit p.wait() call is not needed either. The context manager should already be calling wait(). One possible problem that I can think of: if you set a timeout, then interrupt the call with KeyboardInterrupt or similar, the context manager will now wait without without a timeout. Demo: # Hit Ctrl+C before the 3 s timeout, and it will delay 10 s
call('trap "" INT && sleep 10', shell=True, timeout=3) |
The issue explained in 12494 is that the caller doesn't have access to the I also disagree to say that we kill an "arbitrary" process. IMHO it's part Maybe we need to flag to send SIGTERM on exception and then wait N wait Anyway, not sending any signal on exction is not a good idea. We must read |
The reported problem is when no timeout is given. Perhaps it would be sufficient to:
If a timeout is specified, the current code is good enough, but if no timeout is specified, maybe we should just do the equivalent of: with Popen(...) as p:
return p.wait()
# If interrupted, the context manager will wait again. If the interruption is due to a terminal-wide SIGINT, the child may also have been interrupted. For comparison, the Posix system() function is supposed to ignore SIGINT and SIGQUIT (i.e. signals from the terminal). See the Gnu implementation: <https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/system.c\>. |
Even if we can’t agree on any behaviour change, I think it might be worth documenting how these functions behave on exceptions (interrupts) other than TimeoutExpired. Currently all I can find is “If the timeout expires, the child process will be killed and waited for.” I think this could be expanded to also say what happens if the parent is interrupted by a signal such as KeyboardInterrupt:
|
Again, the problem is that the exception exits from the function which owns the last reference to the Popen object. If the Popen is left alive when you exit the function, you create a zombi process, you can leave open pipes, etc. It's unclear to me if CTRL+c sends SIGTERM to all processes or only a few of them (only the parent process?). Even if SIGTERM is sent to all processes, a child process can decide to completly ignore it, or can block in a deadlock or whatever. Giving a few seconds to the child process to wait until it ends is not easy because it's hard to choose an arbitrary timeout. If the timeout is too low, you kill the child process (SIGKILL) before it flushes files. If the timeout is too long, the parent process is blocked too long when the child process is really blocked. I suggest to keep the current behaviour by default. If you really want to give time to the child process, I suggest to add a *new* optional parameter . For example ctrlc_timeout=5.0 to send SIGTERM and then wait 5 seconds. I don't know if the parent process must always send a SIGTERM to the child process or not. One signal or two don't have the same behaviour. It's possible to explicitly send a SIGTERM to only one process using the kill command. |
I don’t know how it works on Windows, but on Unix in most cases the parent and child will share a controlling terminal. Pressing Ctrl+C in the terminal will broadcast SIGINT to all processes, parent and child. That is probably why os.system() ignores SIGINT. I doubt the usefulness of building in extra timeouts to send SIGTERM and SIGKILL. If the user really cares that much, they can probably design their own timeout mechanism. That is why I suggested above to treat the non-timeout mode differently. |
Yes, standard UNIX terminal behavior is to map Ctrl-C to a SIGINT sent to the foreground process group, so that every member of a pipeline (e.g.) or hidden helper children processes can be terminated by the interactive user and have the chance to clean up. Handling a child process behind a convenience interface, like system() or subprocess.call(), is inherently a bit tricky when things go wrong. My expectation for .call() would be that it behave something like os.system() (or the C library system() for that matter) and _not_ be interrupted by conventional signals. That the EINTR be "swallowed" and the p.wait() resumed, as _try_wait() does already. That way a user timeout= does what we want, but a Ctrl-C has familiar semantics. Yes, it will be possible for a coder to contrive to throw some other exception during the wait() ... in that case we should close the pipes but _not_ kill and reap the child. There will be zombies. Zombies are better than SIGKILLing a 3rd-party process that perhaps needs a graceful shutdown. |
When no timeout is specified, these are the options as I see them:
Here is my proposal, taking into account Victor’s desire to never leave a zombie, and Mike’s desire to let the child handle SIGINT in its own time: After the first KeyboardInterrupt or other exception, wait() a second time, and only use SIGKILL if the second wait() is interrupted. It’s a bit complicated, but I think this would solve everyone’s concerns raised so far: def call(*popenargs, timeout=None, **kwargs):
p = Popen(*popenargs, **kwargs)
try:
if timeout is None:
try:
return p.wait()
except:
p.wait() # Let the child handle SIGINT
raise
else:
return p.wait(timeout=timeout)
except:
p.kill() # Last resort to avoid leaving a zombie
p.wait()
raise |
I don’t think Victor likes #2 because of the zombie. I would be interested in #4 (one of the documented purposes of subprocess is to replace os.system), but it might need careful consideration and discussion. Ignoring signals is such a significant change I think it would have to be a new feature for 3.6+ only. Mike/Victor, what do you think of my proposal (call it #6) about waiting a second time before resorting to SIGKILL? Posting a patch which implements this. |
Re: #2, I'd rather have a zombie than a hard kill on a child whose code I perhaps don't control. Zombies are a fact of life (er, a fact of undeath?) in UNIX process management, and are the historical and IMHO expected behavior. |
#4283 adds a secondary timeout, which defaults to 1 s when there is no main timeout. But this seems complicated and arbitrary. As I understand, the main use case discussed here was waiting without a timeout for a child that exits soon after the interrupt. But are there any practical use cases or demand for:
|
I changed the issue title to "Add a new optional cleanup_timeout parameter to subprocess.call()" to make it more positive and update it to the currently proposed change ;-) |
you'll notice I added an alternate PR. I don't like the complication of adding yet another knob (cleanup_timeout) to subprocesses already giant API surface. It will rarely be used. My PR tries to take a practical approach: Just wait a little while (arbitrary value of little chosen in the code) for the child after receiving SIGINT before reraising the exception and triggering a .kill() matching existing behavior. The one controversial thing in my PR (which could be undone, it is independent of the other changes) is that I also modify the context manager __exit__ behavior to not do an infinite wait() upon KeyboardInterrupt. This means context managers of Popen _will_ complete potentially leaving a dangling process around (which our existing __del__ will pick up and put in the internal subprocess._active list). Relatively harmless, but a change none-the-less. I went that far to try and better match the Python 2.7 and 3.2 behavior: On SIGINT our process sees the KeyboardInterrupt "right away" (not quite as instantaneously here given the wait timeouts, but close enough for interactive command line tool ^C happiness). It seems like an improvement all around and is IMNSHO less complicated. |
I went with my change to give the child process a small amount of time to cleanup by default. Not perfect, but this should be more similar to the Python <=3.2 behavior. Lets see if any issues crop up during the 3.7 betas. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: