Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit support for backgrounded subprocesses #682

Closed
3 of 7 tasks
bitprophet opened this issue Dec 6, 2019 · 18 comments
Closed
3 of 7 tasks

Explicit support for backgrounded subprocesses #682

bitprophet opened this issue Dec 6, 2019 · 18 comments

Comments

@bitprophet
Copy link
Member

bitprophet commented Dec 6, 2019

Overview

Kind of an odd duck, but this came out of an internal need at my dayjob and I think it's worth examining for public use.

The tl;dr use case is to have a (long running in their case, but that's actually orthogonal) Python process using Fabric 1 or Invoke to kick off subprocesses that then live outside the control of the main program. Roughly analogous to sourcing a shell script that uses & a bunch.

An obvious counter-argument to this case is "use real process supervision" -- but I've determined that my local users have a good-enough need for true "orphaned children" subprocesses that this falls under a legit, if corner, use case for a toolset like Invoke. Plus the normal adage that if one set of users is doing an unusual thing, there will be others, and I'd rather the software cope gracefully with this instead of surprising users.

Investigation

  • Under Fabric 1's local(..., capture=False) (the default behavior), it's possible to do local("shell command &") to background a shell command in a way that's completely disassociated from the parent Python process.
    • at least under Linux, this is equivalent to /bin/sh -c "shell command &", which (for reasons I don't fully understand yet) results in the subprocess being a child of PID 1/init, instead of the Python process as normal.
  • Under Invoke, this doesn't currently work because we keep much tighter track of the subprocess, including the reader threads hooked up to its pipes, which don't shut down until forced to by the subprocess closing those pipes.
    • If one says pty=False (the default), the ampersand appears to be ignored; the run call hangs out until the subprocess completes.
    • If one says pty=True, we get even stranger behavior where the subprocess appears to either not run or exit instantly (pending investigation - this is less critical for now)
  • I've determined that the child process does actually get assigned as a child of init even in this scenario, but Invoke still "blocks" to completion because of those reader threads.
  • If we temporarily neuter the reader threads, we get the same behavior as under Fabric 1 - control returns to the run caller immediately, and the Python process can even exit without affecting the now orphaned child.
    • If we pass None to the child process' pipes, and the child process is not internally redirecting output, its out/err still goes to our controlling terminal; this doesn't seem super useful though.

Upshot

A base case here could be to add a kwarg allowing the caller to say "hey, I don't care about this program's output - don't use any IO threads"; in tandem with a trailing ampersand (or eg nohup) this enables the use case under question.

What this leads to is then the question of what should happen to the Result from the run call, as it will generally be full of lies.

Furthermore, this looks very much like an actual "async run()" where a user may want to continue interacting with the subprocess - the only real difference is that I'd expect a naive async setup to "clean up" subprocesses before Python exits, and we don't truly want that here. But that feels orthogonal enough that it can wait for an iterative improvement.

TODO

  • Cut up Runner._run_body some more - I've wanted to do this for some time anyway, it's the longest single function in probably the whole codebase
  • Add an option controlling whether the IO threads are used or not
  • Confirm whether that alone suffices for the real world internal use case
  • Update how Result behaves in this scenario - guessing we'd yield a different subclass?
  • Consider extending that to continue interacting with the subproc in some fashion - perhaps the threads actually always run (allowing access to stdout/err) but we simply control whether they get join()'d before returning control.
  • Examine the ramifications for interpreter shutdown, both re: IO threads if they do run, and re: an option for reaping or ignoring child processes
  • Figure out how Fabric 2 cares about this; think asynchronous should "just work", but disown is possibly a no-op (then again, openssh sshd does very roughly the same crap we do locally re: fork+execv, so it's plausible that if we "behave weirdly" and just cut off all our pipes, it will persist remotely?
@bitprophet
Copy link
Member Author

bitprophet commented Dec 10, 2019

See also #194 which has a Windows specific wrinkle on this (and should perhaps be mentioned in changelog, since it really is the same issue as this one, unless we punt on the Windows angle and just make it about that one).

@bitprophet bitprophet added this to the Priority milestone Dec 10, 2019
@bitprophet
Copy link
Member Author

bitprophet commented Dec 10, 2019

I dug into the mysterious case of the missing subprocesses when you do run("command &", pty=True) out of curiosity, and there was no clear reason why the subprocess never runs (and it definitely does not run; even doing things like injecting execv tracing under Linux show that the execve is never actually completing enough to be visible).

It may be some sort of bug with Python's pty.fork, since a stripped-down reproduction (with no Invoke code in play) has the same issue, but the same case with pty.fork swapped out for os.fork seems to behave normally.


Interestingly, on macOS (Mojave/10.14) while trying to recreate the repro case, I found that even without execv in play at all, pty.fork can make the child/forked process crash or die early if the parent process isn't always hanging around with a read on the child's communication file descriptor. It strikes me that this might be the cause of the issue on Linux as well (the repro case there was not bothering to do any FD reads, iirc).

This is true even if the child is not explicitly trying to print anything (which I could imagine triggers a pipe I/O error) - though it just struck me that we might still be seeing Python trying to write a traceback and then hitting the same thing. OTOH, even trying try/except and atexit is insufficient to capture what exactly is happening here.

At any rate it's clear that pty.fork has some unintuitive behaviors between platforms, so for the time being I'm going to focus on trying to address the high level need here, and for pty=False first; we can "grow" the fixes to cover other cases as we go / later on.

@bitprophet
Copy link
Member Author

Confirmed that the trivial in-Invoke repro of run("sleep 60 &", pty=False) behaves the same on macOS as on Linux, in both situations - without & the execve'd forked process is a child of the Python process, but once you put & on, it becomes a child of init instead (and as identified on Linux, the Python process blocks until end-of-file on the reader threads). So I can continue with basic dev on this under macOS and it'll probably apply to both.

@bitprophet
Copy link
Member Author

bitprophet commented Dec 10, 2019

The terminology for this is a bit fraught. I think we really want two flavors/modes here:

  • One that's probably more commonly useful and what people think of by "async" which returns a fast Result but keeps the I/O threads running, with the expectation that users will eventually choose to join/await the Result object and take out the final stdout/err strings, or whatever.
  • The other implies the first as well (though it is arguably still an either-or), but goes further and doesn't even spin any of those threads up - "fire and forget" - allowing the my command here & trick to persist the subprocess beyond Invoke's own lifetime.

Annoyingly though, both of those could be accurately described by the terms/kwargs that spring to mind: background, detach(ed), etc. And I can't use async for the first because it's a keyword under later Python 3.x versions. (Guess I could just use asynchronous in full...)

Brainstorm:

  • Use background or detach for the 2nd use case, and something else for the 1st use case, such as:
    • short_circuit
    • return_fast
    • asynchronous (as above)
    • other async-like terms implying the format of the result, such as promise
    • ...
  • Use background or detach for the 1st use case, and something else for the 2nd, such as:
    • disown
    • disable_io
    • disable_thread(s|ing)
    • disable_io_threads
    • no_io (ew)
    • no_thread(s|ing)
    • ...

Probably best to avoid explicit mention of threads as they're arguably an implementation detail and we could replace them in future with true Python 3-only async (#2020IsComing), or greenlets, or etc.

Offhand, I think I like the combo of asynchronous and disown are the most pleasing to the eye and relatively intuitive:

  • asynchronous: run will return right away and you must interact with the result to manage its lifecycle / returned data
  • disown: run returns right away and you don't care about the result (which is good because the result value will be largely useless).

@bitprophet
Copy link
Member Author

bitprophet commented Dec 11, 2019

More rambling: the "threads still in play" use case has subtleties around what the threads do exactly / how we implicitly mutate other arguments:

  • It's tempting to say asynchronous=True implies hide=True, since in the common case that's what we want - if you're being async you almost surely don't want the subprocess stdout/err in your terminal as your Invoke code moves on to doing other things (including, perhaps, other subprocesses).
  • However that would kill the ability to manually redirect out/err "inline" by overriding eg out_stream, so that's one problem that will harm advanced users.
  • Further, I could see some folks wanting to use this to effect parallelism (especially in lieu of Parallel execution #63 and friends) by spinning up a bunch of still-attached-to-foreground-terminal subprocesses.
    • However I suspect that OOB this runs into the problems Fabric 1 did where you need to coerce each subprocess into line-buffered output, or stuff looks REAL messed up. So this may be n/a.
    • I'm also not sure offhand how our stdin workers would behave here, they'd all be fighting over the same stdin and we'd probably have bizarro behavior where bytes from that pipe end up randomly scattered around the running subprocesses.
      • This use case - broadcasting stdin to multiple subprocs - is always very strange and weird because you can never expect all subprocs to be in exactly the same place. The "real" solution is to use Watchers to script that interaction, typically. Certainly it's not worth upending the model further for, at this time.
    • So for now I think we just make it disassociate and should enough users come in complaining, we add Yet Another Kwarg like Fab's old linewise=True that undoes this disassociation and tweaks thread behavior to internally line-buffer. Or something. (Plus they can always work around it and get bytewise output back by explicitly assigning out_stream=sys.stdout...)
  • How can we make the default act like hide=True but not force it? Maybe examine the stream args; for any that are None, we ensure they are hidden automatically. This looks like it should be possible.
  • Torn on whether the same applies to in_stream, we could make it act like in_stream=False and totally disable the stdin worker, but again - if a user actually wants their own FLO standing in for stdin (heh), there's no obvious reason why that shouldn't function correctly.
    • So I think it gets similar-but-distinct logic - if it's None, it acts like it was False. That's basically "hiding" for stdin.

@bitprophet
Copy link
Member Author

bitprophet commented Dec 11, 2019

Lightning bolt thought: what if disown was not a kwarg on run(), but was simply a method on the AsyncResult (or w/e) returned from asynchronous=True invocations? It helps with the weird disconnect between the two kwargs (where one is "a lot like" the other but not quite) and still enables the intended use case, while allowing greater flexibility (what if, for w/e reason, you wanted to interact with the subproc for a while and then eventually tie it off and let it go free?).

The downside is that it's slightly more boilerplate for the truly-just-disown-it use case, now they have to do c.run("foo &", asynchronous=True).disown(). And I'm honestly not sure if we can cleanly shut down threads that way (esp given the out/err ones currently rely primarily on the other end closing to terminate them) - it's a lot nicer to simply never start them up? I'll have to experiment.


Not entirely related but inspired by the above: what if we made asynchronous=True its own method instead: Context.async_run or something? This I'm less sold on (plus you'd then have to make async_sudo, uggggh) but putting it down as it occurs to me. It'd be slightly nicer as just Context.async("blah") but again, keyword on Python 3.


Also inspired: does it make sense for us to automagically plop in the & when disown=True? It's been bothering me that users have to remember to do it, since otherwise the intermediate shell will itself block and we're back to square one from a disowning perspective. That's always a nasty API smell.

Conversely: is this the right time to examine running things "directly" without a shell (i.e. os.execv*("/program", ["/program", "arg", "arg2"])? It would also solve that problem. But I'd rather make that a later iteration, probably.

@bitprophet
Copy link
Member Author

bitprophet commented Dec 13, 2019

Real world disown=True testing: simply short-circuiting within _run_body before any thread work but after self.start(), seems to work relatively well so far, and no ampersands are needed either (suspect that may have been part of all the confusion around the pty use case or something).


That prompted me to get mad and poke pty=True in tandem with this update:

  • My base case w/ pty=False and disown=True was run("python -c 'print(\"hello\"); import time; time.sleep(60)'") - works great, text appears in file if redirection is added to command string, sleeping observed to work correctly
  • The same with pty=True sees the process disappear entirely, as found earlier
  • However, rubbing nohup on front of the command (nohup python ...) seems to work just fine!.
  • I probably didn't discover that earlier because...when you add & at the end, things go back to not working. Weird.
  • Ditto trying to replicate nohup with ye olde 2>&1 >/path/to/log </dev/null - with or without an ampersand, this just always dies.

If I care to continue this (eg if doc-driven fix like "if you really, really need pty=True with this, and don't want to use screen/tmux instead, use nohup" is insufficient) it may be worth digging into my code TODO around the extra pty-oriented stuff we don't do yet such as tty.setraw, getrlimit, etc - though I suspect most of that doesn't matter.

bitprophet added a commit that referenced this issue Dec 16, 2019
bitprophet added a commit that referenced this issue Dec 16, 2019
…ally'

Includes some comment shuffling/rewriting for clarity.

Also I'm not sure what was up with the BaseException here anyway,
maybe holdover from my Python 2.4-2.6 days? Moot now...

Kinda re: #682
bitprophet added a commit that referenced this issue Dec 16, 2019
- I hate how we're half and half re: local block state vs object state,
  so let's lean harder into the latter as it simplifies subroutine use.
- Dry-run result was missing a bunch of potentially non-default attrs
  vs the real result, so let's unify that quickly for now
- Moar subroutines

Re #682
bitprophet added a commit that referenced this issue Dec 16, 2019
bitprophet added a commit that referenced this issue Dec 16, 2019
- Mentally group timer thread w/ IO thread
- Sort default config value keys
@jgrasett
Copy link

If I might throw some support around this work, we utilise background processes a fair bit, doing kafka in kubernetes work that requires starting background port forwarders then executing commands through them.

@bitprophet
Copy link
Member Author

@jgrasett I've got my internal users testing the disown=True part of this and it works for them. I'm guessing your use case is more the async=True side, which I'm hoping to wrap up soon. It's also the trickier part :D but I'm hoping to just rub some more threading on it. Everybody loves threads, right?

@bitprophet
Copy link
Member Author

bitprophet commented Dec 20, 2019

Notes to self - I see a couple possible options for the details around asynchronous=True:

  • Original plan:
    • Fast-return an AsyncResult which "is" a Result but whose attributes are "lazy" and may be lies, partially filled, or None until the subprocess completes. (This could include a new, not present on regular Results, stdin attribute that can be written to 🤔 )
    • Torn on whether that "partially filled" should be real - it sounds cute, to eg have async_result.stdout be whatever stdout has already been observed, but I could see it being confusing or just not truly useful - you'd need to pair consuming that data with checking of whether the process has actually exited. (Similar to manual subprocess.Popen() usage, basically.)
      • In fact, this gets us into "ok so why WOULD you use invoke.run(async=True) instead of manually using Popen?" territory...but let's assume there's enough reasons for this already (offhand: you still get all the rest of Invoke re: configuration, watchers, etc; you may be wanting to mix sync and async behavior; surely more).
      • Additionally, if your main async reason is to be reading incomplete stdout/err, you should probably be throwing custom FLOs into out_stream et al - that would work well enough with any flavor of async'ness, and does not require the result object to behave this way.
    • AsyncResult has a join which stands in for the current blocking/non-async behavior - wait til subprocess completes, shut down all attendant threads, and you're done / can reference the Result's attributes 100% safely.
  • Alternate plan:
    • Fast-return a Promise or Future styled object, which still has a join or whatnot, but is not itself Result-like
    • Instead, the join() method is what returns a (completely normal) Result.

I haven't gone deep enough to see whether one of these has a serious implementation hangup requiring use of the other, but the alternate plan "feels good" in its simplicity and obviousness (and its lack of the potential for confusion with the original plan) - while I also worry that it's not hitting the convenience factor.

OTOH, I can't see too many use cases that care strongly about doing stuff with an AsyncResult, as most cases I can imagine are either:

  • Doing something periodically with the output streams - where again, you should potentially be using a custom FLO anyways, and then either NOT calling asynchronously, or calling asynchronously and only caring to join later on when some other work is done.
  • Running the background subprocess purely for its side effect (forwarding a port, running a daemon, etc) and here, again, you're not likely to be wanting anything but join()

So having the "convenience" of a one-shot object probably doesn't matter a ton and it comes more down to what's easier to write.


Related thought: do we want these objects to act like contextmanagers for easier cleanup?

  • It makes some sense - e.g. it looks a lot like Fabric 2's Connection.forward_local/remote methods (in fact that is exactly @jgrasett's use case, perhaps? or a more narrow interpretation of it, if their forwarding isn't SSH-borne)
  • Still seems to fit the AsyncResult pattern most closely, in that you're incentivized to yield an object that's useful within and/or after the block (basically, it just means we add __enter__ and __exit__ to AsyncResult where __enter__ returns self, iirc, and now it can be used regularly)
  • Conversely, there isn't actually much "cleanup" for this problem domain - all run() really does is join worker threads and raise any exceptions from within them, then return the result.
  • Largely I'm just hampered here by not actually having this use case myself, my internal use case at work is the not-entirely-related disown part, and the rest of the time I tend to be doing very straightforward blocking things (the "Invoke as Pythonic shell script" case).
  • Suspect we could make this a future iteration if I don't feel like going the extra mile right now - whatever we return just needs to add those enter/exit methods.

bitprophet added a commit that referenced this issue Dec 21, 2019
… filled in yet

disown internal use case has been tested, reported works fine \o/
@bitprophet
Copy link
Member Author

bitprophet commented Dec 31, 2019

More bits as I dig in:

  • Still feeling the "Promise is not a Result but has a .join that returns one" approach is best and trying to execute that one.
  • .join should also yield exceptions if they were raised by the background thread (of any stripe - unexpected kaboom, IO sub-thread, or Failure - basically, it should behave exactly like a synchronous run() would).
  • Promise and Result both reveal that Result should probably be exposing ALL run kwargs, not just the subset that made obvious sense as "what you care about as a result" initially. Yea, stdout/stderr/exited are most useful, but for completion's sake, we really ought to expose hide and warn (and of course now asynchronous and disown) and etc etc.
    • Sadly when I started poking this it feels like A Bunch More Work so I may have to punt on it for now...a story as old as time...

@bitprophet
Copy link
Member Author

bitprophet commented Dec 31, 2019

I'm super dumb, there's no real point for an additional thread here, all we need is the Promise itself plus the related Runner refactoring; those allow the user to regain control (the IO worker threads keep running in the background) and defer the "wait(), create Result, wrap up" sequence til whenever they call promise.join().

The main thrust of what users want to be running "asynchronously" here is already in the background in either case: the subprocess and the threads shoveling data in/out of its pipes.


As I wrote this I also found that this "almost" just wants Promise to fold back into Runner (all it does so far is call methods on the wrapped Runner) though I think for clarity/responsibility purposes (not to mention future potential changes) it makes more sense to keep the separate class.

@bitprophet
Copy link
Member Author

In the weeds at this point, trying to clean up any low hanging TODOs. Right now it's "can we easily let folks interrupt/terminate an async Promise?". Seems like it should be easy - just have them call (directly or via a wrapper) Runner.send_interrupt().

However in practice (as in, using Local by hand), something falls down here. wait runs to completion because neither of the 'done' sentinels (process is finished, dead threads) get tripped, even though we appear to be writing the interrupt to process stdin the same way we do synchronously.

The code called in both cases should be the same (the literal only difference is whether the new Runner._finish() is called immediately or via the Promise's join) so my gut says it's to do with how async deigns not to spin up the stdin handling thread - a Ctrl-C from a human operator is probably triggering EOF or similar in our stdin, which kills its thread, which may be triggering the dead-io-threads check.

Been a while since I wrote/read that stuff so need to check, but if I'm right, it means that I should be able to reproduce it in the synchronous case by saying in_stream=False.

@bitprophet
Copy link
Member Author

That itself isn't it, in either case - cannot repro w/ async=False, in_stream=False; and temporarily making it so async=True does not imply in_stream=False, also does not make a difference.

Even more interesting, I realized that Ctrl-C after calling join also works great to shut things down.

What's the difference? Wondering if this boils down simply to the lack of a real Ctrl-C; is this just a process group thing? Ctrl-C in controlling terminal is signaling to both Python (KeyboardInterrupt) and the subprocess (SIGINT, iirc) and our writing of the interrupt sequence to proc stdin is, at least for my test case (sleep 10) extraneous?

This seems to be correct - in regular Python REPL (which does not mask Ctrl-C like ipython does - glad I noticed that early), Ctrl-C'ing after obtaining the Promise and before calling Promise.join(), causes the subsequent join() to immediately return and see exitcode 2, as expected (I never wrote this down above, but in the bad case, the exitcode is always 0, as if the subproc never got interrupted).

Arguably this means send_interrupt() wants to both submit the interrupt code AND signal the child explicitly, but I'd rather not change its semantics at this point. Seems better to have Promise grow a method to do that - which I wanted to do anyway because send_interrupt has a slightly weird API requiring an exception argument, having folks stick a None in there would have been stupid.

If I can't get that to work nicely soon I'll just punt on it, as it's non-critical.

@bitprophet
Copy link
Member Author

I might punt - I can see how to make the interrupt work for Local, but it's thornier for Fabric's Remote runner. There's no clean way to send signals to a remote subprocess w/o additional out-of-band subprocesses, and on top of that, while all we "really" need here is send_interrupt, having to deal with a send_signal method that is implemented in Local but a quiet no-op in Runner (allowing Promise's one-two punch of "interrupt sequence + actual signal" to succeed), feels gross.

Probably gonna push a branch/issue with the partial work on this, and pick it up later if enough folks clamor for that.

@bitprophet
Copy link
Member Author

OK this is done enough besides making sure that as-is (without the explicit interrupt shit above) Fabric 2 can work well enough with it. Once that's squared away (hopefully in a way where Fabric >=2,<=2.5 can still use Invoke >=1.4 w/o exploding) should be time to release.

@bitprophet
Copy link
Member Author

After spending entirely too much time getting fabric's test suite un-slow again (for 100% unrelated reasons. Entropy: a thing) it looks like this all Just Works over there with no additions required and no major issues - both keywords tested.

bitprophet added a commit that referenced this issue Jan 4, 2020
Fixes #637

Also re #682 (insofar as it impacts an implicit hide=True when asynchronous=True)
@bitprophet
Copy link
Member Author

1.4.0 going up momentarily!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants