Allow skipping of "bad" host connections #8

Closed
bitprophet opened this Issue Aug 19, 2011 · 7 comments

Comments

Projects
None yet
2 participants
Owner

bitprophet commented Aug 19, 2011

Description

Make it possible for Fabric to "skip" hosts that timeout (either via our explicit timing out, or any other pre-existing timeout condition) instead of aborting. This should not be the default behavior, but it should be configurable.

We may also want to extend this to skip hosts that encounter any connection problem, such as authentication failures, and even down to actual run/sudo-failure issues (in that case, almost a sort of post-warn_only, continue-Python-keyword like behavior. See #448 for a PR for this angle.)

See the comments for some more detailed thoughts/approaches.

(Note: Ticket was originally created to deal with allowing control over network timeouts, which has been moved over to #249)

(Also note: There are some likely-applicable patches over in #189 which really belong here. Clean that up sometime; #189 will be closed with a more limited implementation but those patches will still be there.)


Originally submitted by Jeff Forcier (bitprophet) on 2009-07-20 at 05:26pm EDT

Relations

  • Related to #249: Add timeout support
  • Related to #348: Retry support for put, run, sudo
  • Duplicated by #131: Allow graceful user-controlled handling of connection failures
  • Related to #96: Remote 3-strikes auth failure causes traceback
  • Related to #189: Add option: skip password prompting upon failure

bitprophet was assigned Aug 19, 2011

Owner

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Took a crack at this tonight while thinking about it after a user request.

The basics of skipping over a bad connect() are easy (just declare env var, test for it to select abort vs warn function). What to do other than that is a bit tougher.

This is mostly because of the lazy connections -- if you have a task that makes no connections and you tell it to run against a host list, it'll happily run per host in the list, even if none of the hosts are valid. If you have a task that does some non cxn stuff and then calls run() halfway down, that will then blow up on bad hosts.

What should that latter type of task do? It's already partially executed; if we want the entire task to not run for bad hosts, then connections have to occur before the task is even executed -- but we can't detect whether a task even needs connections, other than the fact that it is being run against a host list.

Should this indicate a change in when connections happen? After all, it might be more intuitive anyway, to connect at task start -- it's always a tad jarring when you see a few lines go by and then get a connection prompt. But I'm sure there's at least one or two reasons why the lazy version is equally valid (or rather, I'm sure such a change would screw up other folks, even as it would fix the issue for a different set of users).

And otherwise, what if we just say that all run/sudo/put/get functions simply warn and continue? (Which again would be easy to code -- just have _execute() return None or something if it gets back a bad connection object, perhaps.) That strikes me as being very error-prone and not very intuitive, though it would solve the above problems re: lazy connections.

Will definitely have to think about this a bit more.


on 2010-09-08 at 09:36pm EDT

Owner

bitprophet commented Aug 19, 2011

Axel Rutz (aexl) posted:


very good point.
wouldn't everything be easier if we first iterate hosts then iterate tasks (instead the other way round)?
for me this would be more intuitive anyway.


on 2010-09-09 at 01:47am EDT

Owner

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Axel -- IMO that's really just a special case of the "connect before executing" scenario, and if one assumes we're going down that road, then I'm not sure flipping the order around really makes a big difference.

What we really need to do is break out the "call X function on Y hosts" stuff (which has been mentioned many times) so that these sorts of execution scenarios are more easily swapped out (so that one could specify to do hosts-then-tasks, for example). Will take a bit of care to ensure nothing is seriously screwed up by doing so, though.


on 2010-09-09 at 09:54pm EDT

Owner

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


Did some brainstorming with Gekitsuu on IRC and came up with the following ideas:

  • Add one new setting, env.test_hosts or env.preconnect or similar, that determines whether we preconnect/test hosts before running tasks.
    • when False, we get the current lazy-connect behavior;
    • when True, connections are initiated as early as possible, so usually when running a task on a host list.
    • This allows users to set it to True to preserve atomicity of tasks so they don't run at all for a given host if that host is down.
  • Add another new setting, env.skip_bad_hosts or similar (as in earlier comment).
    • Only really makes sense if env.preconnect is True (perhaps automatically sets the other/implies it?)
    • Allows one to not fail-fast if some hosts aren't up.

I think that by combining both of these we can preserve backwards compatibility while allowing the overall skip-bad-hosts behavior to be implemented.


on 2010-09-09 at 10:13pm EDT

Owner

bitprophet commented Aug 19, 2011

Jeff Forcier (bitprophet) posted:


So, I'm an idiot, and the skip-bad-hosts feature is related to, but not actually intimate with, the idea of allowing control over various timeouts. Since this ticket has more content in it for the former than the latter, I'm repurposing it and will spin off the timeout stuff to its own ticket. Sorry for any confusion, Axel!


on 2010-10-29 at 12:13pm EDT

@bitprophet bitprophet added a commit that referenced this issue Nov 23, 2011

@bitprophet bitprophet WIP re #8 8a6388a
Owner

bitprophet commented Nov 23, 2011

Started taking another stab at this today (though I'm pausing again, found workaround in my local codebase. meh.)

The approach I'm going with for now:

  • Base everything on having network.connect() raise NetworkException (should probably rename to NetworkError?) for everything network related (timeout, resolution fail, low level).
  • Add a new env var dict, env.use_exceptions_for, starting out with just 'network': False. Idea is to pad this out as we start moving more things towards using custom exceptions re #277.
  • In state.connections (the connection cache), test env.use_exceptions_for['network'], and if it is not true, abort() with the exception's message attribute.
    • This results in nearly every valid/documented use of Fabric -- fab or API -- still seeing abort() behavior by default -- fully backwards compatible.
    • But one can flip that env var to True and it will raise the NetworkException, making it easier for callers to work around.
  • For example, the next step could be to patch execute() to try/except NetworkException and test for e.g. the abovementioned env.skip_bad_hosts flag to determine whether to continue or raise.

@bitprophet bitprophet added a commit that referenced this issue Jan 17, 2012

@bitprophet bitprophet Tests re #8 21cd576

Not sure if I can use this to not exit my python script if a password is incorrect. I am aware of the existence of pub/private key pairs. However I need to set them this way.

Can I just do:
env.use_exceptions_for['network']
And capture the exception? Or does that only work if the connection to the host is failing?

Thanks in advance for a quick response. Otherwise I'll have to fix this using an expect script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment