Allow subprocesses to continue on after KeyboardInterrupt #406

Closed
bitprophet opened this Issue Nov 8, 2016 · 14 comments

Projects

None yet

1 participant

@bitprophet
Member

Semi-spinoff of #58 - some interactive programs want to handle Ctrl-C themselves, but right now, we store KeyboardInterrupt while sending the interrupt downstream, and then raise it like any other exception (as well as re-catching it and exiting 130 at the very end. Life's complicated.)

This means that mashing Ctrl-C in a REPL (even one that is itself doing except KeyboardInterrupt, if it's a Python repl!) or an editor or etc, will immediately exit, even if the subprocess itself would not have.

Ideally, fixing this will still allow us to exit with 130 if the subprocess itself does so - a quick attempt at a fix seems to not cause this (e.g. sleep 60 + Ctrl-C normally exits 130, but within run(), exits 0.)

@bitprophet
Member

Rather, there's no exit code (Runner.process.returncode is None), which I think makes sense given Ctrl-C is ending up as a signal to the subprocess - it's one's shell that is putting out the exit code 130, not the program itself (IIRC).

So we probably do need to preserve the fact that a KeyboardInterrupt occurred (as a nearby TODO mentioned, actually) and put in code 130 if the subproc has none of its own.

@bitprophet
Member

Ah, another (or the real) issue is that our current logic is structured as:

try:
    self.wait()
except BaseException as e:
    # handle exceptions here, including KeyboardInterrupt
# continue to shutdown step

So even if we "neuter" the KeyboardInterrupt, it doesn't matter, we stop waiting and instead act as if the subprocess has closed - which is at least one reason why we end up with a None exit status. (Practically, this also means that even in a real vim session, it will exit out prematurely because our controlling process & its IO threads & their pipes, all get closed down.)

Need sound logic that can handle arbitrary numbers of KeyboardInterrupts and return to waiting, but which will break out otherwise.

@bitprophet
Member
bitprophet commented Nov 8, 2016 edited

Got that sorted, but amusingly, vim doesn't actually register the Ctrl-C keystroke unless I preface it with Ctrl-V. Which makes sense, because, we're never actually submitting anything on stdin (unless we use Ctrl-V to prevent the actual interrupt)! (And it is, presumably, self-resistant to SIGINT. I think.)

Fabric (and 2) do send \x03 down the pipe, partly because there is no good OOB process control when connected to a remote shell. Wondering if Local.send_interrupt has been wrong all along and we should simply be sending the escape as bytes & relying solely on that to trigger remote shutdown. In other words, this is yet more of the confusing conflation of foreground Ctrl-C with external SIGINT.

Will experiment.

@bitprophet
Member
bitprophet commented Nov 8, 2016 edited

Yup, that makes vim work fine. Also appears to work for sleep - it exits right away - but the exit code is still funky (it's still 0, not 130 or None).

Plus, when pty=False, sleep exits with -2 (which then becomes 254 because reasons, presumably unsigned integer ones).

Other things like tail -f have identical behavior, exit 0 when pty=True, exit -2 when pty=False.

Fabric v2's got some preemptory comments in its send_interrupt noting how (presumably for remote only) the escape sequence is ignored when no PTY is allocated, thus the need to raise the interrupt exception so we shut down. Guessing that's remote-only since, as above, it's still doing something locally, even if it's resulting in a somewhat odd exit code. (May be due to Ctrl-C's interactions with a controlling PTY.)


This is somewhat problematic because it means it's harder for us to reliably act like "regular" apps / straight up execs of subprocesses - even if we track that a KeyboardInterrupt happened during execution, we can only honestly turn that into an exit of 130 if the subprocess doesn't have a "real" return code of its own.

But here, in either case, it appears to have a real return code - how can we tell a WEXITSTATUS->0 from something like a Ctrl-C'd tail (which "should" become an exit of 130), apart from a happy Vim exit from a session that used Ctrl-C (which should exit 0)? How can we tell these -2s apart from a subprocess "really" exiting -2? Etc.

@bitprophet
Member

Seems like it's a tradeoff between "be as-correct-as-possible re: Ctrl-C submission to subprocess" and "have a super correct exit code". Personally, I'd much rather have apps that care hard about Ctrl-C get as-useful-as-possible behavior, than have a super-duper-correct exit code in situations where it's kinda screwy.

In other words, this feels like another case of interactive, side-effect-oriented run() calls vs noninteractive, results-oriented run() calls. The former almost never care about exit codes, and will care more about how Ctrl-C is handled; the latter may or may not care about exit codes, but are unlikely to care about Ctrl-C as long as it allows the script's executor to halt execution.

Main downside I can see is pty=True causing Ctrl-C to result in exits of 0, which will not trigger a halt of the Invoke process. Wonder if this is some sort of edge case re: how os.waitpid and/or its status return value (and/or os.WEXITSTATUS) behave.

@bitprophet
Member

Behavior seems same on Linux so far, FTR.

@bitprophet
Member

Cool, glad I looked. WEXITSTATUS' result is meaningless if WIFEXITED is False, and sure enough, it is False in this situation. Means the program did not use the exit(2) syscall to exit.

Of course, then I have to wonder how one does get the "right" exit code. Offhand guess is that it's up to us to determine, since we're basically now the "shell" running it, so we could use this as the guide re: whether to continue exiting 130. Question would then be, what other situations besides Ctrl-C could cause WIFEXITED==False? Presumably any other sort of signal-related termination like SIGKILL...

@bitprophet
Member
bitprophet commented Nov 9, 2016 edited

Right - feels like one of the other siblings of WIFEXITED would be true in every situation: core dump, stopped/continued via job control, signaled (this is probably what's true for Ctrl-C and I'd assume also SIGKILL etc), and there's also os.W(STOP|TERM)SIG which let you tell which signal caused a stoppage.

EDIT: ok, yea, man wait lays all of this out clearly. All exited procs should be True for one of IFEXITED, IFSIGNALED or IFSTOPPED.

On the non-pty (subprocess) front, the -2 exit I was seeing is in line with the docs, -2 means signal 2, which is INT, which is appropriate. Presumably we can get the same info in the pty use case via WIFSIGNALED + WTERMSIG and unify things that way. (Would be nice if there was generic code we could use for this, but I haven't seen any in stdlib so far.)

Still leaves the question of whether to try and act shell-like and transmute a propogated Ctrl-C + an exit of -2 into us exiting 130. Leaning towards 'yes' for now.

@bitprophet
Member
bitprophet commented Nov 9, 2016 edited

Found some nice details on the shell behavior in this SO answer; a tl;dr could be "the 130 you get in bash/zsh is 128 + abs(exit)". But it's very shell specific and varies between shell families.

Since Invoke is not itself a shell and doesn't want to be terrifically shelly (outside of how we may still default to bash for certain situations requiring a shell wrapper...but that is or will be configurable) it makes me think it's probably best to just pass on the subprocess' exit directly, and our earlier "exit 130 on Ctrl-C" was a bit naive.

Reckon if users complain enough (rather, can make a compelling argument that they absolutely need this "signal + 128" behavior because just the signal value is insufficient...) we could entertain the idea of being 'shell-like' in some situations.

Though it'd need to be a good argument. As noted above, my chief concern right now is that we're not exiting 0 incorrectly, and as long as that is the case, arguing over the exit status of a 'real' Ctrl-C'd human-driven Invoke process is secondary.

@bitprophet
Member
bitprophet commented Nov 14, 2016 edited

Remaining TODO:

  • Exit 1 on regular KeyboardInterrupt - impl + test
  • WIFEXITED/WIFSIGNALED exit status stuff - impl + Test
  • Integration tests need an update re: all of this; unclear offhand if the complex and annoying bullshit around the sub-sub-subprocess' detection/expectation of SIGINT is still necessary (i.e. do we keep all that crap but have it assert the stdin bytes instead? or can it be simplified?)
@bitprophet
Member

Discrepancy alert, one I should have noticed above: WTERMSIG will return 2 on interrupt, but subprocess.returncode says -2 in the same situation, as per subprocess docs. We need to normalize these somehow; right now if we just pass either value straight to Result / UnexpectedExit / sys.exit we get different behavior depending on pty-ness.

Subprocess' docs do not say why it uses negative numbers here (and not e.g. a tuple); I'm guessing it's just a way of packing signal-or-regular-exit into a single integer "field"?

Either way, this is a bit of an impedance mismatch (or whatever) - there is no "real" exit status code, and we aren't technically a shell, so I think the least-crappy normalization is to actually follow subprocess and turn signal-derived numbers negative.

@bitprophet
Member
bitprophet commented Nov 14, 2016 edited

OR I could be smarter and only 'normalize' at exit time, leaving the two statuses separate inside Result. Otherwise advanced users would have no clean way to tell (besides manually looking at negativity, which, ugh).

EDIT: Double-or: that's actually harder than it looks as it requires us to rip apart more of the return code handling functionality. Already too deep in this rabbit hole so fuckit. Negative numbers it is.

@bitprophet
Member
bitprophet commented Nov 14, 2016 edited

Don't see a real point to the integration test part of this; it slays me to simply delete all the sweat & tears that went into it during #152 , but all an integration test would do now is prove what I can easily test at the unit level: that the \x03 I send to stdin arrives on the subprocess' stdin.

Much simpler functionality == much simpler tests. Which is nice. Just wish I'd done this before wasting all that time earlier. Programming :(

@bitprophet bitprophet added a commit that referenced this issue Nov 14, 2016
@bitprophet bitprophet Changelog re #406 65cbd93
@bitprophet bitprophet added a commit that referenced this issue Nov 14, 2016
@bitprophet bitprophet Bulk of #406 implemented & tested.
Still need to deal with the exit code shenanigans and integration tests
b235bd4
@bitprophet
Member

Tests all passing locally and on Travis - we're done! Unless I run into brokenness again with my real world use cases. Sigh.

@bitprophet bitprophet closed this Nov 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment