Make it possible for Fabric to "skip" hosts that timeout (either via our explicit timing out, or any other pre-existing timeout condition) instead of aborting. This should not be the default behavior, but it should be configurable.
We may also want to extend this to skip hosts that encounter any connection problem, such as authentication failures, and even down to actual run/sudo-failure issues (in that case, almost a sort of post-warn_only, continue-Python-keyword like behavior. See #448 for a PR for this angle.)
See the comments for some more detailed thoughts/approaches.
(Note: Ticket was originally created to deal with allowing control over network timeouts, which has been moved over to #249)
(Also note: There are some likely-applicable patches over in #189 which really belong here. Clean that up sometime; #189 will be closed with a more limited implementation but those patches will still be there.)
Originally submitted by Jeff Forcier (bitprophet) on 2009-07-20 at 05:26pm EDT
Jeff Forcier (bitprophet) posted:
Took a crack at this tonight while thinking about it after a user request.
The basics of skipping over a bad connect() are easy (just declare env var, test for it to select abort vs warn function). What to do other than that is a bit tougher.
This is mostly because of the lazy connections -- if you have a task that makes no connections and you tell it to run against a host list, it'll happily run per host in the list, even if none of the hosts are valid. If you have a task that does some non cxn stuff and then calls run() halfway down, that will then blow up on bad hosts.
What should that latter type of task do? It's already partially executed; if we want the entire task to not run for bad hosts, then connections have to occur before the task is even executed -- but we can't detect whether a task even needs connections, other than the fact that it is being run against a host list.
Should this indicate a change in when connections happen? After all, it might be more intuitive anyway, to connect at task start -- it's always a tad jarring when you see a few lines go by and then get a connection prompt. But I'm sure there's at least one or two reasons why the lazy version is equally valid (or rather, I'm sure such a change would screw up other folks, even as it would fix the issue for a different set of users).
And otherwise, what if we just say that all run/sudo/put/get functions simply warn and continue? (Which again would be easy to code -- just have _execute() return None or something if it gets back a bad connection object, perhaps.) That strikes me as being very error-prone and not very intuitive, though it would solve the above problems re: lazy connections.
Will definitely have to think about this a bit more.
on 2010-09-08 at 09:36pm EDT
Axel Rutz (aexl) posted:
very good point.
wouldn't everything be easier if we first iterate hosts then iterate tasks (instead the other way round)?
for me this would be more intuitive anyway.
on 2010-09-09 at 01:47am EDT
Axel -- IMO that's really just a special case of the "connect before executing" scenario, and if one assumes we're going down that road, then I'm not sure flipping the order around really makes a big difference.
What we really need to do is break out the "call X function on Y hosts" stuff (which has been mentioned many times) so that these sorts of execution scenarios are more easily swapped out (so that one could specify to do hosts-then-tasks, for example). Will take a bit of care to ensure nothing is seriously screwed up by doing so, though.
on 2010-09-09 at 09:54pm EDT
Did some brainstorming with Gekitsuu on IRC and came up with the following ideas:
I think that by combining both of these we can preserve backwards compatibility while allowing the overall skip-bad-hosts behavior to be implemented.
on 2010-09-09 at 10:13pm EDT
So, I'm an idiot, and the skip-bad-hosts feature is related to, but not actually intimate with, the idea of allowing control over various timeouts. Since this ticket has more content in it for the former than the latter, I'm repurposing it and will spin off the timeout stuff to its own ticket. Sorry for any confusion, Axel!
on 2010-10-29 at 12:13pm EDT
WIP re #8
Started taking another stab at this today (though I'm pausing again, found workaround in my local codebase. meh.)
The approach I'm going with for now:
Switch network aborts to exceptions wrapping the real errors
Tests re #8
Document, changelog, add CLI flag re #8
Not sure if I can use this to not exit my python script if a password is incorrect. I am aware of the existence of pub/private key pairs. However I need to set them this way.
Can I just do:
And capture the exception? Or does that only work if the connection to the host is failing?
Thanks in advance for a quick response. Otherwise I'll have to fix this using an expect script.