-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Failures in parallel tasks do not stop execution #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No, that's not the intended workflow, unless I am mis-remembering the intent (paging @goosemo to aisle 457 -- he might remember more). What should happen is that I'll check it out -- thanks for the report. |
As it's written, the behavior showing is how it will happen. Return codes are essentially ignored, so that if a host's execution dies, it doesn't knock over the whole task. If there are dependancies to tasks, I've always kept them in a single task, or had the second subordinate task have a check to see if the requisite task did what was needed. |
Just chatted with @goosemo on IRC about this, @sdcooke. When he wrote the original implementation, his use case was of the "I'm running on a ton of hosts and don't want one or two failures to sink the entire run" variety. However, to be consistent with Fabric's original "fail fast" philosophy, the default behavior needs to be what you and I discussed above -- a parallelized task should We may need an additional setting to allow overriding this (i.e. at a higher level than |
Yes I suppose it's two different use cases. In our case if one host fails we'll end up with inconsistent code running between our servers, which we want to avoid. In the case I posted above test_thing2 actually ran on both hosts even though the previous task had failed. That makes it possible to end up in situations where steps 1 and 3 have run in a deploy even though step 2 didn't run. Thanks guys! |
I've patched this and if all looks good it'll look like this: also here is the branch: |
That's great - works exactly as I'd expect it to. Thanks guys. |
Hi guys, I would like to talk again about this issue (Yeah I'm ⛏). Here is my use case of the parallel paradigm of Fabric: @runs_once
def deploy
execute(upload)
execute(update)
@parallel
def upload
# upload app artifact
@parallel(pool_size=amount_servers/4)
def update
# stop and start app called like the following: > fab -R servers deploy I would expect Fabric to stop the parallel execution of the That's quite problematic if the app could not start in the Do you have an idea on how to solve this? |
ping @bitprophet @goosemo should I open a new issue with by comment above? |
Having a similar issue. +1 on a solution |
it turns out this PR fixes one of but not all of my problems. My issue is that the instances are attached to a load balancer on AWS, and can be terminated at random. In testing, it seems as though the terminated instance does not cause failure. Fabric just hangs into perpetuity, waiting for the now non-existent host to finish. |
@jsalva interesting, are they being killed in the middle of a workload? Or is it more that the host list is stale at the time of execution? |
Hi @goosemo, oh that's a nice PR. It does exactly what we need @captaintrain! Thanks! Here's the output without the PR: > fab -H local1,local2 deploy
[local1] Executing task 'deploy'
[local1] Executing task 'upload'
[local2] Executing task 'upload'
[localhost] local: # hello upload
[localhost] local: # hello upload
[local1] Executing task 'update'
[local2] Executing task 'update'
[localhost] local: # hello update
[localhost] local: # hello update < I DON'T want this second one to run as the first update fails
Fatal error: One or more hosts failed while executing task 'update'
Aborting. And now the output with the PR and the > fab --parallel-exit-on-errors -H local1,local2 deploy
[local1] Executing task 'deploy'
[local1] Executing task 'upload'
[local2] Executing task 'upload'
[localhost] local: # hello upload
[localhost] local: # hello upload
[local1] Executing task 'update'
[local2] Executing task 'update'
[localhost] local: # hello update < \o/ It ran only once!
Fatal error: One or more hosts failed while executing task 'update'
Aborting. |
@goosemo I'm intentionally terminating the hosts mid run()/sudo(), so mid-workload rather than stale hosts |
it looks like we could in principle detect the state where the transport.send_ignore() raises a ProxyCommandFailure and use that to signal that the connection is closed, terminating the queued process with a failed status code of some sort. |
The following test case runs
test.sh
which returns an error exit code. Without the@parallel
decorator fabric aborts and exits as expected after the first failure on the first host (there are two hosts in the 'web_workers' role). With the@parallel
decorator thetest_thing
task exits before runningdate
buttest_thing2
is then run.I'm assuming this isn't the intended workflow, if it is or I'm making a mistake I'd be interested to know what it is.
The text was updated successfully, but these errors were encountered: