Convenient sudo() wrapper #294

Closed
bitprophet opened this Issue Dec 21, 2015 · 18 comments

Projects

None yet

1 participant

@bitprophet
Member
bitprophet commented Dec 21, 2015 edited

tl;dr Fabric 1's sudo() except implemented on top of generic, user-exposed features (#289, #293) and available at Invoke-on-up, since it's not actually specific to remote execution.

Given those user-exposed features, there's no technical necessity to implement it in core, because it should just turn into something like this:

@task
def identity(c):
    assert c.run("whoami").stdout.strip() == "myuser"
    sudo_prompt = "Sudo password plz:"
    with c.prefix("sudo -S -p '{0}' ".format(sudo_prompt):
        result = c.run("whoami", responses={sudo_prompt: "mysudopassword\n"})
        assert result.stdout.strip() == "root"

But clearly, even if you remove the asserts, that's annoying boilerplate we can erase by implementing a commonly used pattern as a convenience method.


MUCH LATER EDIT: There is a wrinkle here - in Fabric 1, this functionality includes "prompt the user ourselves when the sudo-prompt response value isn't pre-set, then cache the result from then on". That prompt+cache happens "inline" within the sudo() call. Brainstorm:

  • The most-obvious answer is "you should store it in your config up front so no prompting is necessary".
  • But storing that data on-disk is insecure; a runtime prompt is better.
    • A runtime "pulled from a secure store over the network, flash key, etc" is best. No prompt, no on-disk.
  • But there's nothing saying we can't explicitly prompt up-front in sudo.
  • Prompting "lazily" / as late as possible, is "nice" insofar as it means users with passwordless sudo never have to deal with a prompt at all - no inner prompt seen, no outer prompt displayed.
  • In most cases, that is still just "nice" and not required - users who know up front that they will need the password, can simply enter it up front.
    • But the Fabric-specific use case of remote sudo is more complex - users with large fleets may not know whether the prompt will ever appear.
    • Counterpoint: users with large fleets will usually want "raise/store an exception and keep trucking" instead...
  • Users may also have heterogenous sudo passwords across their fleets, so a single up-front prompt is no longer enough - it'd need to be one prompt per host/group/whatever.
    • Counterpoint, if you have that setup, you could be using Fabric 2's per-host/group configs
    • But then you're back to storing the data on-disk
    • So you really do need one prompt per remote password, no matter when it pops up.
    • If you know ahead of time where the passwords differ, we could then say "ok, you just need to prompt for each group" (perhaps with a config setting like e.g. {'hostname': {'prompt_for_password': True}})
    • If you don't, that is when runtime inline prompting becomes truly "necessary". Whether that's enough of a use case to be worth implementing this ourselves, hard to say.
    • NOTE that all of these concerns pop up with Fabric connection/login passwords too - with the exception that the point of prompting is at connection time and not anywhere in Runner.
  • One way to enable "late but not inline" prompting is to have the prompt machinery raise an exception when it encounters a prompt key whose value is None, then sudo() can treat this as "prompt, fill config, try that same command again"
    • This isn't exactly the same as Fabric 1's behavior, because it involves running the command literally twice in a row.
    • But it feels cleaner because it doesn't require the truly-inline prompting.
    • It also arguably lets us push the entirety of "dealing with missing password caches" on the user, which is better for 'training' them to do the right thing, but worse from a usability standpoint.
  • If we did go with fully inline prompting again, at least we can make it more generic - again, anything whose value is None could trigger the pause-and-prompt, not just sudo specific passwords.
    • Though there's the added wrinkle of "when to use getpass.getpass vs (raw_)input"...
  • Yet another reason not to do inline prompting is that it adds implementation complexity - e.g. in Fabric 1 we need the prompt machinery to talk to the stdin handler so the stdin handler pauses itself while prompting is happening (otherwise getpass or raw_input never see the stdin, as it's being intercepted).
@bitprophet bitprophet referenced this issue Jun 4, 2016
Closed

Prompt/etc autoresponder #289

14 of 15 tasks complete
@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

In the interests of getting the fab 2 alpha out the door, I think my initial target for this feature, re: sudo password prompting, will be "it requires the autoresponse password value to be non-None". Users configuring it - as many do (despite it being technically insecure) in Fabric 1 via ~/.fabricrc + sudo_password = xxx - never have to see a prompt; or it will prompt up front, prior to actual command execution, if the value has been left unset.

We can revisit "prompt lazily & rerun", "prompt inline during autoresponding", and "options to control whether prompting is done at all" later.

@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

Another minor fun wrinkle that just occurred to me: Fabric 2 rebinds Invoke's run as local...what should it do with Invoke's sudo? Grump.

Upon reflection, while it's technically possible to offer a local sudo, I just don't see it being that commonly used (the number of legitimate local privilege-escalation needs is vanishingly small compared to that of remote ones), and it's also not cross-platform like run is either. So that's a poor tradeoff (losing API space in the client lib of Fabric 2 for little gain in Invoke itself).

Should I somehow be wrong about this we can certainly change things later.

So for now, this is just for "make the Fabric 2 implementation possible" (which should be almost entirely #293 now) and probably closed soon.

@bitprophet
Member
bitprophet commented Jun 20, 2016 edited

Occurred to me recently: the old "run a remote-oriented script locally instead" use case for Fabric (fabric/fabric#98) is one major reason to make sudo implemented in Invoke proper. (Also had a user asking about it on IRC recently too...)

The problem of how to expose the API in both libs remains open, none of the solutions I can think of are awesome unfortunately:

  • Don't rebind, just retarget: Context.sudo executes local sudo, Remote.sudo executes remote sudo, and there is no Remote.local_sudo or similar.
    • "Solves" the API collision problem, and leans on the part where I sincerely doubt a task targeting remote servers would need to do anything superuser-ly locally. And is of no consequence in the fab-98 situation - sudo is sudo is sudo.
    • But is inconsistent with run-vs-local, and perhaps unnecessarily annoys anyone who does inexplicably need to sudo on both ends.
  • Just rebind it, you doofus: Context.sudo becomes Remote.local_sudo.
    • Inelegant, sure, but does what it says on the tin, and because it's barely going to be used by anybody, who really cares? It's like 3 lines in the API doc.
  • Somehow turn "sudo-ing" into a kwarg to run/local, e.g. Context.run('xxx', sudo=True).
    • Not amazing as it pushes the sudo implementation (including all the phase-2 prompt related logic noted in ticket desc) inside run in some fashion, which at the very least means users tweaking the sudo implementation is now intertwined with any manipulation of run
    • OTOH, this stuff can all still happen at the Context level (i.e. would not actually impact Runner) so it could maybe be as simple as `return (self._sudo if sudo else self._run)(command, **kwargs).
    • But still feels like a lot of mental overhead for implementers and users just to avoid some renaming...
    • Also doesn't play nice with fabric/fabric#25, aka "how to select internal sudo for things like put/get and friends", since it removes sudo as a distinct top level execution target.

Given that brain dump I think "oh fuck it who cares if the names aren't 100% beautiful and if we don't expect sudo to be used much" is the obvious choice here, and will probably pull the stuff I've been hacking on Fabric 2 back into Invoke as Context.sudo.

What I probably will do is not mention sudo in the tutorial, since that was what prompted all this in the first place - it feels dumb to use up valuable new-user mental time for something we do not expect them to really use. It should still be listed in the Fabric 2 tutorial though.

@bitprophet
Member

Moved this back to Invoke and continued actually implementing it. Mostly there, sticking-up bits that need addressing now instead of later:

  • Thought I'd be clever by selecting an obviously-not-a-real-sudo-prompt string as the default forced password prompt, so that debugging would be more obvious. Unfortunately, if not paired with a Fabric 1 style "implicitly strip out that prompt from the mirrored+captured stderr stream", this looks pretty ugly.
    • Even if it was a regular-looking prompt, I assume it could still be pretty confusing to some users, especially if execution somehow got lagged or hung while the prompt was the last thing on the screen. Which is another argument for implementing "strip out things being responded to automatically".
    • But a counterargument is that magic is bad, and also that this would require more poking around inside run just to support sudo (though - if done it probably should be done for all auto-responses anyways).
    • For now I'll just revert to a "regular-looking" prompt, maybe with some text in it still making it clear it's part of the auto-response system.
  • Integration testing this feels hard to do cleanly; even if we mock out getpass, there's the glaring issue of requiring all developers running integration tests, mutate their local system config so the tests can pass w/o human interaction.
    • Not only is that ugly and fragile, it's flat-out insecure as either this test-only user's creds need persisting in the public repo, or the user must store them in e.g. ~/.invoke.yml, which is still putting a sudo-capable user's creds on disk.
    • I think the best middle ground is to make a selected-by-Travis-only integration test/suite, which of course can then rely on something always being present, and is ephemeral / not anybody's real system.
@bitprophet
Member

Made #365 for the hide-sudo-prompts-maybe angle.

@bitprophet
Member

And as noted in #365 too...without that in place, we get a silly thing where subsequent stdout starts on same line as the sudo prompt (which is stderr) as the autoresponded newline (if one's even present; it's not automatically added at this point) is not echoed to the local terminal.

I bet that'll come up in the alpha, but I'll wait until it does, for now. Not critical.

@bitprophet
Member

Reverted the sudo prompt to something more realistic looking (as implied above).

Integration test, grump, I forgot we'd switched to Travis' sudo-less (faster) architecture. Would be nice not to have to switch back, especially since they will presumably remove sudo-capable test runners entirely sometime. (OTOH, we hard require it for Fabric 1's suite so we can fix some frustrating and random-ass system level bugs, so same boat there if still applicable.)

Double checked their docs; maybe they will keep it around or we can work around it some other way, going by https://docs.travis-ci.com/user/ci-environment#Virtualization-environments and https://docs.travis-ci.com/user/trusty-ci-environment the 'Trusty' env is still being poked at / lives side by side with both legacy and containerized setups.

May well still be the case that tests are slower on Trusty than containerized, but if that's the price we have to pay, so be it?

For now, I'm going to punt on that, may make it another spinoff ticket.

@bitprophet bitprophet added a commit that referenced this issue Jun 22, 2016
@bitprophet bitprophet Back out expectations of literal local sudo().
This branch now specifically about anything necessary
to implement Fabric 2 sudo.

See #294 (comment)
9f31bcc
@bitprophet
Member
bitprophet commented Jun 26, 2016 edited

Ran into another snag: I'd totally forgotten how Fabric 1 not only autoresponds with sudo password, but also detects the "Sorry, try again" output from sudo and uses that to determine whether it succeeded / whether to re-prompt / etc.

Without that, any incorrect password results in a mildly silly/annoying back & worth waiting for sudo to hit its max number of retries (typically 3) before things exit & fail.

In a generic, Invoke/Fab 2 world, that requires #289's functionality to get beefed up sooner instead of later, so it can do more than simple always-respond-to-x-with-y. Will spin off another ticket for that soon after I have a brief think.

@bitprophet
Member

See #369 for thoughts on that. Think I see a way forwards that wouldn't even be that much code.

@bitprophet
Member
bitprophet commented Jun 27, 2016 edited

Aside from #369, which can help implement this, there's still the bigger question of exactly how to handle failed sudo passwords in a few ways - regardless of how exactly #369 lets us detect failure:

  • How much to retry, if at all?
    • Could "only ever except", but that's not helpful to adhoc/interactive users.
    • Following Fab 1 still seems best: try forever until success or user KeyboardInterrupts, unless config/kwarg says to only-except.
    • Given the increased emphasis on "make this work for libraries/correctness first, interactive users second, when feasible" - maybe we make excepting the default behavior, and let users seriously wanting "full interactivity" turn on the "no, please prompt me if the password I gave was wrong/nonexistent" behavior?
  • Should newly submitted passwords replace the old one the config (as in Fab 1)?
    • Not preserving the new password in some fashion feels bad, because if a user fat-fingered the first prompt, they shouldn't then have to re-submit the correct password for all subsequent sudo() calls.
    • But nuking the old password can also be bad, both because it's implicit, behind the scenes manipulation of config the user supplied; and because in Fabric 2 (or similar remote exec situations) the original password might still be valid for other connections not yet contacted.
      • Though those would be distinct Context instances - meaning the question is also "how far back up the chain does it get persisted?" (Which ties into #309 re: whether contexts/configs should bleed together, or not.)
    • Should an interactively prompted password be stored as a separate, not user-facing config path (e.g. sudo.prompted_password) or should it truly overwrite sudo.password?
    • Seems obvious right now that there should be a separate "register" for a live-prompted password, and it should be preferred over sudo.password if non-empty.
    • The question of whether this new value should be transmitted to other, "future" Contexts then lives in #309 and is less relevant here.
@bitprophet
Member

So given that, I think what this means is that the sudo version of Responder can simply be "detect failure, raise exception instead of responding again" (and thus could potentially be a generic, idk, FailureDetectingResponder).

Then sudo itself is responsible for noticing this exception & either reraising it or prompting, saving the new result as above, and trying again.

@bitprophet
Member
bitprophet commented Sep 9, 2016 edited

Have been mostly taking notes in TODO comments...bad habit.

At this point I have a working FailureResponder which takes a failure sentinel and raises ResponseFailure if it is seen. Still need to determine what sudo does when it sees one of these, as above.

On top of that: in the case(s) where we continue bubbling up the exception instead of retrying, it'd be nice for the exception to gain info about the sudo call it is bubbling up through...i.e. to turn into something like Failure/Result. Except that right now Result assumes it's a "finished" command, with attrs like exit_code, ok, etc - things which do not apply in a "sudo authentication failed partway" scenario.

So feels like what needs doing is:

  • Rip out the "this is a regular command that failed during execution" language/assumptions from Failure to a new subclass called e.g. CommandFailure
  • Create e.g. AuthFailure, subclassing Failure, for use by sudo
  • Update Context.sudo so it transmutes ResponseFailure into AuthFailure, so that receivers of sudo auth failures have the same info as receivers of regular command failures.
    • clients of sudo probably don't need to care about the details of the response failure itself, so it can just be discarded. Maybe someday we'll attach it to the new exception object, but that seems pretty pointless right now.
    • This actually becomes "update run so it lets callers such as sudo tell it how to deal with unexpected errors" - because we need the innards of run, which generate Failure/Response, to still do that, even when our ResponseFailure occurs. Unclear how best to handle this:
      • callbacks/injection, e.g. "here, call this with your Response when you see a FooException", hand it something that reraises the AuthException around the Response"
      • run learns to attach any inner exceptions to the raised Failure and then the caller is responsible for deciding whether to open and raise the inner exception, continue raising Failure, or etc.
        • This feels "best" given it can handle any other 'internal to run' exceptions without contortions
        • But it's bad in the sense that it messes with normal exception handling flow; it means by default, anything low level or otherwise unexpected would bubble up as Failure instead of e.g. TypeError or IOError or etc
          • But we could work around this by saying "hey, watchers may raise exceptions, but they should all inherit from eg WatcherException as a signal to run that it needs to staple them to a Failure", then anything not in that exception tree is not masked.
        • Still gets weird insofar as we could let this obviate the mucking of exception classes higher up in sudo; "a sudo auth fail" could just be "a Failure whose wrapped exception is a ResponseFailure. This would avoid some complexity, at the cost of making it impossible to tell different Failures apart via except...which is bad so yea.
        • Which brings me back to the 1st option above, make it so run can "natively" raise different failure exceptions via callbacks; except that is still "complicated", just with the exception code being pushed around. So also probably not worth it.
      • refactor run some more so the response-generating bits are torn out of the actually-running bits. (Not ideal since run is already split up into a few bits for various reasons.)
  • Document clearly how auth failure interacts, or doesn't, with warn=True - rather, that warn=True only affects command failures, and not auth failures.
  • Standardize on where exceptions live, I'm now split up between exceptions.py and per-responsibility modules. Kinda leaning towards the latter, honestly.
    • Informal twitter pool sees most people still doing the former, and mostly because you can easily get into circular import problems otherwise. Makes sense; might not hit that early here but certainly later, and/or anytime exceptions begin to be referenced in multiple places.
@bitprophet
Member
bitprophet commented Sep 12, 2016 edited

More ramblings about the watcher -> runner -> post-run exception handling BS:

  • Out of the above, the 'best' scenario seemed to be "have Runner create Failure subclass instances for both command failure (normally) and watcher failure (by catching a Watcher-specific exception type)".
    • This avoids one tier of exceptions & exception handling - sudo no longer has to care at all, both subtypes of Failure will bubble up.
  • Had another thought today of: why not pass the Result object in question (because Failure itself is mostly just a vehicle for the Result informing the catcher exactly what command is failing) into the watchers, allowing them to raise only one single exception that can bubble all the way up?
    • But that's probably tighter coupling than I want; it complicates Watcher and also spreads around the "how to turn a being-executed command request into a failure exception" logic to multiple classes.
    • Which doesn't seem worth it; I've managed to dilute my original overdone idea with 3 exception-handling steps, to 2, and going down to 1 isn't that much bigger a savings.
  • Though the issue of "are all Failures identical or do we want to raise subtypes?" is still open; if we do want e.g. CommandFailure vs AuthFailure, there still needs to be more logic somewhere.
  • I don't see a clean, generic solution to that problem: they're either non-generic (specific logic in sudo to 'unpack' Failures with ResponseFailures attached) or complex (eg some variant on "Watchers indicate a specific user-facing exception type to be raised when they notice problems").
    • For now I'm gonna go with the "logic in sudo" option because this rabbit hole is already too deep (and the "attach watcher errors to Failures" approach is at least halfway generic anyways).
    • However it'll always feel slightly off, because of the 'transmuting' of a 'regular' Failure into an AuthFailure. Which should otherwise not require any changes - all of the tweaking of Result to deal with a lack of regular shell exit, needs doing within run/Failure - so it's just a class change for easier user catching. But it's a code smell.
@bitprophet
Member

Pretty unhappy with how rabbit-hole-y this all got. Hope it pays off for somebody in the future somehow.

Still need to go thru some of the TODOs I spawned, and document the warn-vs-auth-failure bit. Then try this out via Fab 2 to make sure it works well that way.

@bitprophet
Member
bitprophet commented Sep 16, 2016 edited

My tolerance for which TODOs truly can't wait is now much lower, but there are still a few that should really be done now:

  • Solve, or at least workaround/document, the problem where Context.sudo wants to submit its watcher to run(..., watchers=[watcher]) but doesn't want to stomp on any actual user watchers.
    • Right now the implementation gives watchers as a kwarg, meaning user-modified config watchers will be overridden (kwarg always wins), and users passing in their own watchers= kwarg will encounter a "you gave the same kwarg twice!" error.
    • We can at least merge a user-supplied kwarg with our value, but that still leaves "user supplied config, did not supply kwarg, is confused when their config values are ignored because of our kwarg"
    • Feels sorta like the old "merging vs overriding" configuration cascade problem - which isn't truly solved for our config stuff yet either.
    • Maybe we just need to be fully explicit and check the config and kwarg and slot ours in alongside the highest-importance such value, if any are given. (And if none were, then we can just do what we do now and stick it in our own kwarg value.)
  • AuthFailure handling at the Program level:
    • Currently not handled so stacktrace + implicit exit 1
    • Ideally would match UnexpectedExitFailure re: not showing stacktrace
    • Except right now, that actually does nothing but ensure the exit code matches; if you have stderr hidden, you'll see NO explanation of why it exited badly. In sudo's case, we definitely want to at least sys.exit(str(e)) so our message is shown.
      • Probably also want to print more of our Result so the command generating the error is displayed...?
    • Ideally, Program should do as little as possible; I want these kinds of exits to look very similar between inv and eg python -c "from invoke import blah; blah()", just with stacktraces hidden by default when you run via Program.
  • There's some outstanding naming worries:
    • s/Failure/Error because that's Python/PEP8 style
    • stuff in the responder classes, some of the kwargs are a bit verbose
  • Changelog entry for all this, including backwards compat notice for renames/etc
@bitprophet
Member

Minor wrinkle with the watcher kwarg/config merging - mutating a config object feels bad/wrong: if user introspects during debugging they'll wonder why the config does not actually reflect what they submitted; and we don't have a good story for merge-or-override in configs yet either.

Best solution for now is to rely on "kwarg beats config" by copying the config to be the kwarg, when only config exists. Then it resembles the base case of "user only gave kwarg", we append, and everyone's happy.

@bitprophet
Member

Punting the AuthFailure non-traceback stuff to #393.

@bitprophet
Member

Kinda wanted to s/Failure/CommandFailure/g but I can't reeeeeally justify the backwards compatibility hit. Maybe I'll do it just prior to 1.0. Otherwise tho, I dropped Failure-as-a-suffix from most other classes. (AuthFailure is purposefully semi-generic so it can stay.)

@bitprophet bitprophet added a commit that referenced this issue Sep 19, 2016
@bitprophet bitprophet Changelog re #294, closes #294 407fd4f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment