Allow skipping of "bad" host connections #8

bitprophet · 2011-08-19T01:56:55Z

Description

Make it possible for Fabric to "skip" hosts that timeout (either via our explicit timing out, or any other pre-existing timeout condition) instead of aborting. This should not be the default behavior, but it should be configurable.

We may also want to extend this to skip hosts that encounter any connection problem, such as authentication failures, and even down to actual run/sudo-failure issues (in that case, almost a sort of post-warn_only, continue-Python-keyword like behavior. See #448 for a PR for this angle.)

See the comments for some more detailed thoughts/approaches.

(Note: Ticket was originally created to deal with allowing control over network timeouts, which has been moved over to #249)

(Also note: There are some likely-applicable patches over in #189 which really belong here. Clean that up sometime; #189 will be closed with a more limited implementation but those patches will still be there.)

Originally submitted by Jeff Forcier (bitprophet) on 2009-07-20 at 05:26pm EDT

Relations

Related to Add remote-program timeout support #249: Add timeout support
Related to Retry support for put, run, sudo #348: Retry support for put, run, sudo
Duplicated by Allow graceful user-controlled handling of connection failures #131: Allow graceful user-controlled handling of connection failures
Related to Remote 3-strikes auth failure causes traceback #96: Remote 3-strikes auth failure causes traceback
Related to Add option: skip password prompting upon failure #189: Add option: skip password prompting upon failure

The text was updated successfully, but these errors were encountered:

bitprophet · 2011-08-19T02:03:11Z

Jeff Forcier (bitprophet) posted:

Took a crack at this tonight while thinking about it after a user request.

The basics of skipping over a bad connect() are easy (just declare env var, test for it to select abort vs warn function). What to do other than that is a bit tougher.

This is mostly because of the lazy connections -- if you have a task that makes no connections and you tell it to run against a host list, it'll happily run per host in the list, even if none of the hosts are valid. If you have a task that does some non cxn stuff and then calls run() halfway down, that will then blow up on bad hosts.

What should that latter type of task do? It's already partially executed; if we want the entire task to not run for bad hosts, then connections have to occur before the task is even executed -- but we can't detect whether a task even needs connections, other than the fact that it is being run against a host list.

Should this indicate a change in when connections happen? After all, it might be more intuitive anyway, to connect at task start -- it's always a tad jarring when you see a few lines go by and then get a connection prompt. But I'm sure there's at least one or two reasons why the lazy version is equally valid (or rather, I'm sure such a change would screw up other folks, even as it would fix the issue for a different set of users).

And otherwise, what if we just say that all run/sudo/put/get functions simply warn and continue? (Which again would be easy to code -- just have _execute() return None or something if it gets back a bad connection object, perhaps.) That strikes me as being very error-prone and not very intuitive, though it would solve the above problems re: lazy connections.

Will definitely have to think about this a bit more.

on 2010-09-08 at 09:36pm EDT

bitprophet · 2011-08-19T02:03:12Z

Axel Rutz (aexl) posted:

very good point.
wouldn't everything be easier if we first iterate hosts then iterate tasks (instead the other way round)?
for me this would be more intuitive anyway.

on 2010-09-09 at 01:47am EDT

bitprophet · 2011-08-19T02:03:12Z

Jeff Forcier (bitprophet) posted:

Axel -- IMO that's really just a special case of the "connect before executing" scenario, and if one assumes we're going down that road, then I'm not sure flipping the order around really makes a big difference.

What we really need to do is break out the "call X function on Y hosts" stuff (which has been mentioned many times) so that these sorts of execution scenarios are more easily swapped out (so that one could specify to do hosts-then-tasks, for example). Will take a bit of care to ensure nothing is seriously screwed up by doing so, though.

on 2010-09-09 at 09:54pm EDT

bitprophet · 2011-08-19T02:03:13Z

Jeff Forcier (bitprophet) posted:

Did some brainstorming with Gekitsuu on IRC and came up with the following ideas:

Add one new setting, env.test_hosts or env.preconnect or similar, that determines whether we preconnect/test hosts before running tasks.
- when False, we get the current lazy-connect behavior;
- when True, connections are initiated as early as possible, so usually when running a task on a host list.
- This allows users to set it to True to preserve atomicity of tasks so they don't run at all for a given host if that host is down.
Add another new setting, env.skip_bad_hosts or similar (as in earlier comment).
- Only really makes sense if env.preconnect is True (perhaps automatically sets the other/implies it?)
- Allows one to not fail-fast if some hosts aren't up.

I think that by combining both of these we can preserve backwards compatibility while allowing the overall skip-bad-hosts behavior to be implemented.

on 2010-09-09 at 10:13pm EDT

bitprophet · 2011-08-19T02:03:13Z

Jeff Forcier (bitprophet) posted:

So, I'm an idiot, and the skip-bad-hosts feature is related to, but not actually intimate with, the idea of allowing control over various timeouts. Since this ticket has more content in it for the former than the latter, I'm repurposing it and will spin off the timeout stuff to its own ticket. Sorry for any confusion, Axel!

on 2010-10-29 at 12:13pm EDT

bitprophet · 2011-11-23T00:07:14Z

Started taking another stab at this today (though I'm pausing again, found workaround in my local codebase. meh.)

The approach I'm going with for now:

Base everything on having network.connect() raise NetworkException (should probably rename to NetworkError?) for everything network related (timeout, resolution fail, low level).
Add a new env var dict, env.use_exceptions_for, starting out with just 'network': False. Idea is to pad this out as we start moving more things towards using custom exceptions re Use real exceptions for errors #277.
In state.connections (the connection cache), test env.use_exceptions_for['network'], and if it is not true, abort() with the exception's message attribute.
- This results in nearly every valid/documented use of Fabric -- fab or API -- still seeing abort() behavior by default -- fully backwards compatible.
- But one can flip that env var to True and it will raise the NetworkException, making it easier for callers to work around.
For example, the next step could be to patch execute() to try/except NetworkException and test for e.g. the abovementioned env.skip_bad_hosts flag to determine whether to continue or raise.

Re #8

th3penguinwhisperer · 2014-09-02T14:26:03Z

Not sure if I can use this to not exit my python script if a password is incorrect. I am aware of the existence of pub/private key pairs. However I need to set them this way.

Can I just do:
env.use_exceptions_for['network']
And capture the exception? Or does that only work if the connection to the host is failing?

Thanks in advance for a quick response. Otherwise I'll have to fix this using an expect script.

Fix UnicodeDecodeError when installing Django on a remote server

ghost assigned bitprophet Aug 19, 2011

bitprophet mentioned this issue Oct 24, 2011

Failures in parallel tasks do not stop execution #457

Closed

bitprophet mentioned this issue Nov 18, 2011

Add remote-program timeout support #249

Closed

bitprophet added a commit that referenced this issue Nov 23, 2011

WIP re #8

8a6388a

bitprophet added a commit that referenced this issue Jan 17, 2012

Switch network aborts to exceptions wrapping the real errors

bfd519b

Re #8

bitprophet added a commit that referenced this issue Jan 17, 2012

Tests re #8

21cd576

bitprophet closed this as completed in 7fe3f2d Jan 17, 2012

bitprophet added a commit that referenced this issue Jan 17, 2012

Document, changelog, add CLI flag re #8

3b9f2f3

bitprophet mentioned this issue Jan 19, 2012

Option for skipping auth failures in addition to connection ones #533

Open

This was referenced Feb 29, 2012

Remote 3-strikes auth failure causes traceback #96

Open

Allow graceful user-controlled handling of connection failures #131

Closed

Add option: skip password prompting upon failure #189

Closed

Retry support for put, run, sudo #348

Closed

antoniobarbuzzi mentioned this issue Feb 29, 2012

skip_on_failure #448

Closed

sirosen mentioned this issue Mar 25, 2013

Adding require-auth flag to fab #866

Closed

ploxiln referenced this issue in ploxiln/fab-classic Jun 5, 2018

Merge pull request #8 from mathspace/py3-pip-progressbar-unicode

b3ff39a

Fix UnicodeDecodeError when installing Django on a remote server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow skipping of "bad" host connections #8

Allow skipping of "bad" host connections #8

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Nov 23, 2011

th3penguinwhisperer commented Sep 2, 2014

Allow skipping of "bad" host connections #8

Allow skipping of "bad" host connections #8

Comments

bitprophet commented Aug 19, 2011

Description

Relations

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Aug 19, 2011

bitprophet commented Nov 23, 2011

th3penguinwhisperer commented Sep 2, 2014