-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit support for backgrounded subprocesses #682
Comments
See also #194 which has a Windows specific wrinkle on this (and should perhaps be mentioned in changelog, since it really is the same issue as this one, unless we punt on the Windows angle and just make it about that one). |
I dug into the mysterious case of the missing subprocesses when you do It may be some sort of bug with Python's Interestingly, on macOS (Mojave/10.14) while trying to recreate the repro case, I found that even without This is true even if the child is not explicitly trying to print anything (which I could imagine triggers a pipe I/O error) - though it just struck me that we might still be seeing Python trying to write a traceback and then hitting the same thing. OTOH, even trying At any rate it's clear that |
Confirmed that the trivial in-Invoke repro of |
The terminology for this is a bit fraught. I think we really want two flavors/modes here:
Annoyingly though, both of those could be accurately described by the terms/kwargs that spring to mind: Brainstorm:
Probably best to avoid explicit mention of threads as they're arguably an implementation detail and we could replace them in future with true Python 3-only async (#2020IsComing), or greenlets, or etc. Offhand, I think I like the combo of
|
More rambling: the "threads still in play" use case has subtleties around what the threads do exactly / how we implicitly mutate other arguments:
|
Lightning bolt thought: what if The downside is that it's slightly more boilerplate for the truly-just-disown-it use case, now they have to do Not entirely related but inspired by the above: what if we made Also inspired: does it make sense for us to automagically plop in the Conversely: is this the right time to examine running things "directly" without a shell (i.e. |
Real world That prompted me to get mad and poke
If I care to continue this (eg if doc-driven fix like "if you really, really need pty=True with this, and don't want to use screen/tmux instead, use nohup" is insufficient) it may be worth digging into my code TODO around the extra pty-oriented stuff we don't do yet such as |
…ally' Includes some comment shuffling/rewriting for clarity. Also I'm not sure what was up with the BaseException here anyway, maybe holdover from my Python 2.4-2.6 days? Moot now... Kinda re: #682
- I hate how we're half and half re: local block state vs object state, so let's lean harder into the latter as it simplifies subroutine use. - Dry-run result was missing a bunch of potentially non-default attrs vs the real result, so let's unify that quickly for now - Moar subroutines Re #682
- Mentally group timer thread w/ IO thread - Sort default config value keys
If I might throw some support around this work, we utilise background processes a fair bit, doing kafka in kubernetes work that requires starting background port forwarders then executing commands through them. |
@jgrasett I've got my internal users testing the |
Notes to self - I see a couple possible options for the details around
I haven't gone deep enough to see whether one of these has a serious implementation hangup requiring use of the other, but the alternate plan "feels good" in its simplicity and obviousness (and its lack of the potential for confusion with the original plan) - while I also worry that it's not hitting the convenience factor. OTOH, I can't see too many use cases that care strongly about doing stuff with an AsyncResult, as most cases I can imagine are either:
So having the "convenience" of a one-shot object probably doesn't matter a ton and it comes more down to what's easier to write. Related thought: do we want these objects to act like contextmanagers for easier cleanup?
|
… filled in yet disown internal use case has been tested, reported works fine \o/
More bits as I dig in:
|
I'm super dumb, there's no real point for an additional thread here, all we need is the Promise itself plus the related Runner refactoring; those allow the user to regain control (the IO worker threads keep running in the background) and defer the " The main thrust of what users want to be running "asynchronously" here is already in the background in either case: the subprocess and the threads shoveling data in/out of its pipes. As I wrote this I also found that this "almost" just wants Promise to fold back into Runner (all it does so far is call methods on the wrapped Runner) though I think for clarity/responsibility purposes (not to mention future potential changes) it makes more sense to keep the separate class. |
In the weeds at this point, trying to clean up any low hanging TODOs. Right now it's "can we easily let folks interrupt/terminate an async Promise?". Seems like it should be easy - just have them call (directly or via a wrapper) However in practice (as in, using The code called in both cases should be the same (the literal only difference is whether the new Been a while since I wrote/read that stuff so need to check, but if I'm right, it means that I should be able to reproduce it in the synchronous case by saying |
That itself isn't it, in either case - cannot repro w/ async=False, in_stream=False; and temporarily making it so async=True does not imply in_stream=False, also does not make a difference. Even more interesting, I realized that Ctrl-C after calling What's the difference? Wondering if this boils down simply to the lack of a real Ctrl-C; is this just a process group thing? Ctrl-C in controlling terminal is signaling to both Python (KeyboardInterrupt) and the subprocess (SIGINT, iirc) and our writing of the interrupt sequence to proc stdin is, at least for my test case ( This seems to be correct - in regular Python REPL (which does not mask Ctrl-C like ipython does - glad I noticed that early), Ctrl-C'ing after obtaining the Promise and before calling Promise.join(), causes the subsequent join() to immediately return and see exitcode 2, as expected (I never wrote this down above, but in the bad case, the exitcode is always 0, as if the subproc never got interrupted). Arguably this means If I can't get that to work nicely soon I'll just punt on it, as it's non-critical. |
I might punt - I can see how to make the interrupt work for Local, but it's thornier for Fabric's Remote runner. There's no clean way to send signals to a remote subprocess w/o additional out-of-band subprocesses, and on top of that, while all we "really" need here is Probably gonna push a branch/issue with the partial work on this, and pick it up later if enough folks clamor for that. |
OK this is done enough besides making sure that as-is (without the explicit interrupt shit above) Fabric 2 can work well enough with it. Once that's squared away (hopefully in a way where Fabric >=2,<=2.5 can still use Invoke >=1.4 w/o exploding) should be time to release. |
After spending entirely too much time getting fabric's test suite un-slow again (for 100% unrelated reasons. Entropy: a thing) it looks like this all Just Works over there with no additions required and no major issues - both keywords tested. |
1.4.0 going up momentarily! |
Overview
Kind of an odd duck, but this came out of an internal need at my dayjob and I think it's worth examining for public use.
The tl;dr use case is to have a (long running in their case, but that's actually orthogonal) Python process using Fabric 1 or Invoke to kick off subprocesses that then live outside the control of the main program. Roughly analogous to sourcing a shell script that uses
&
a bunch.An obvious counter-argument to this case is "use real process supervision" -- but I've determined that my local users have a good-enough need for true "orphaned children" subprocesses that this falls under a legit, if corner, use case for a toolset like Invoke. Plus the normal adage that if one set of users is doing an unusual thing, there will be others, and I'd rather the software cope gracefully with this instead of surprising users.
Investigation
local(..., capture=False)
(the default behavior), it's possible to dolocal("shell command &")
to background a shell command in a way that's completely disassociated from the parent Python process./bin/sh -c "shell command &"
, which (for reasons I don't fully understand yet) results in the subprocess being a child of PID 1/init, instead of the Python process as normal.pty=False
(the default), the ampersand appears to be ignored; therun
call hangs out until the subprocess completes.pty=True
, we get even stranger behavior where the subprocess appears to either not run or exit instantly (pending investigation - this is less critical for now)run
caller immediately, and the Python process can even exit without affecting the now orphaned child.None
to the child process' pipes, and the child process is not internally redirecting output, its out/err still goes to our controlling terminal; this doesn't seem super useful though.Upshot
A base case here could be to add a kwarg allowing the caller to say "hey, I don't care about this program's output - don't use any IO threads"; in tandem with a trailing ampersand (or eg
nohup
) this enables the use case under question.What this leads to is then the question of what should happen to the
Result
from therun
call, as it will generally be full of lies.Furthermore, this looks very much like an actual "async
run()
" where a user may want to continue interacting with the subprocess - the only real difference is that I'd expect a naive async setup to "clean up" subprocesses before Python exits, and we don't truly want that here. But that feels orthogonal enough that it can wait for an iterative improvement.TODO
Runner._run_body
some more - I've wanted to do this for some time anyway, it's the longest single function in probably the whole codebaseResult
behaves in this scenario - guessing we'd yield a different subclass?asynchronous
should "just work", butdisown
is possibly a no-op (then again, openssh sshd does very roughly the same crap we do locally re: fork+execv, so it's plausible that if we "behave weirdly" and just cut off all our pipes, it will persist remotely?The text was updated successfully, but these errors were encountered: