-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"consul lock" process doesn't handle signals until lock is acquired #4439
Comments
I'm opening this issue because I'm using "consul lock" on a systemd service. If Consul has the lock, everything works fine. However, if I want to stop the service, and Consul doesn't currently hold the lock, Systemd will timeout and SIGTERM it |
@JohnKiller Thanks for the report. It sounds perfectly reasonable to expect |
I don't know GO very well, but I've noticed that in the Lines 20 to 24 in 8cdba96
which in Lines 64 to 68 in 8cdba96
However that variable is referenced here: Lines 207 to 212 in 8cdba96
So maybe it's just missing that. Thanks |
@JohnKiller Maybe you know more GO than you thought. That is a good catch and certainly looks suspect. |
So I looked into it a bit and that is part of the issue. Right now not having a shutdown chan means that it will just keep issuing the lock until it gets it. There is a second piece to note in that the blocking query issued to consul to gain the lock could block for up to 15 seconds. @JohnKiller do you know how long before systemd times out and issues a sigterm |
Default is 90s. Until timeout, it will just keep trying to get the lock, so I did try another thing:
This means that there is in fact a signal handler that just discards everything. |
Forked the repo, made the changes to have
Any suggestion on where to look? |
We are having the same problem. Any ideas when this will be fixed? (I'm unfortunately not fluent in GO myself) |
Sorry, i did not dig further since it's behind my possibilities. My workaround is a SIGKILL after a timeout. |
Fixed by #5909 |
Hi @freddygv is this backported to 1.6 or should I wait for 1.7? Thanks |
Hi @JohnKiller I just saw that it didn't get backported to 1.6.3. That means it will be in 1.7, which is coming very soon. |
OK, just upgraded to 1.7.0 and the fix is working.
Is there a way to abort immediately instead of waiting the lock timeout? It still took about 10 seconds to quit |
Overview of the Issue
If you start "consul lock" it will attach signal handlers so it can pass them to the child process. However, until the lock is acquired, every signal is discarded, so there isn't any graceful way for stopping it.
Reproduction Steps
Run "consul lock" on two nodes. On the node that didn't acquire the lock, try pressing CTRL+C for killing it. It won't close.
Operating system and Environment details
Ubuntu 18 LTS, nothing fancy
The text was updated successfully, but these errors were encountered: