Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul exec silently fails to wait for command to complete #2757

Closed
richardlarocque opened this issue Feb 17, 2017 · 2 comments
Closed

Consul exec silently fails to wait for command to complete #2757

richardlarocque opened this issue Feb 17, 2017 · 2 comments
Labels
type/bug Feature does not function as expected

Comments

@richardlarocque
Copy link

richardlarocque commented Feb 17, 2017

consul version for both Client and Server

Client: v0.6.4
Server: v0.6.4

consul info for both Client and Server

N/A.

Operating system and Environment details

Ubuntu 14.04

Description of the Issue (and unexpected/desired result)

We use consul exec to run jobs that sometimes take a long time (hours) to complete. We expect consul exec to block until this execution has completed on all nodes. This usually works as expected.

I wintessed one incident where a network hiccup caused Consul to lose quorum and elect a new leader while consul exec was in progress. (The flood of logs related to this event can be shared on request.)

Then consul exec emitted the following messages:

Session renew failed: Unexpected response code: 500
0 / 1 node(s) completed / acknowledged

and returned an exit status of 0.

Looking at the code, I see a plausible explanation. We hit the timeout branch in the big for loop. This lead to a break statement to end the loop. At the very end, the logic checks to see if any command had returned a non-zero exit status. But no commands had completed so there were no exit statuses, zero or otherwise.

I think that function should have another if branch to return a non-zero status if exitCount < ackCount.

Reproduction steps

N/A

Log Fragments or Link to gist

More details logs available on request.

@slackpad
Copy link
Contributor

Hi @richardlarocque thanks for the issue - seems like a legit bug.

@slackpad slackpad added the type/bug Feature does not function as expected label Apr 12, 2017
@slackpad slackpad added this to the Triaged milestone Apr 12, 2017
@deckarep
Copy link
Contributor

I have attempted to submit a patch which fixes this issue above by adding an if check during the timeout to check for an exitCount < ackCount.

@slackpad slackpad removed this from the Triaged milestone Apr 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants