Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"consul lock" process doesn't handle signals until lock is acquired #4439

Closed
JohnKiller opened this issue Jul 24, 2018 · 13 comments
Closed

"consul lock" process doesn't handle signals until lock is acquired #4439

JohnKiller opened this issue Jul 24, 2018 · 13 comments
Labels
type/bug Feature does not function as expected

Comments

@JohnKiller
Copy link

Overview of the Issue

If you start "consul lock" it will attach signal handlers so it can pass them to the child process. However, until the lock is acquired, every signal is discarded, so there isn't any graceful way for stopping it.

Reproduction Steps

Run "consul lock" on two nodes. On the node that didn't acquire the lock, try pressing CTRL+C for killing it. It won't close.

Operating system and Environment details

Ubuntu 18 LTS, nothing fancy

@JohnKiller
Copy link
Author

I'm opening this issue because I'm using "consul lock" on a systemd service. If Consul has the lock, everything works fine. However, if I want to stop the service, and Consul doesn't currently hold the lock, Systemd will timeout and SIGTERM it

@mkeeler mkeeler added the type/bug Feature does not function as expected label Jul 26, 2018
@mkeeler
Copy link
Member

mkeeler commented Jul 26, 2018

@JohnKiller Thanks for the report. It sounds perfectly reasonable to expect consul lock to still handle signals and exit even when the lock isn't held.

@JohnKiller
Copy link
Author

I don't know GO very well, but I've noticed that in the watch command there is this line:

func New(ui cli.Ui, shutdownCh <-chan struct{}) *cmd {
c := &cmd{UI: ui, shutdownCh: shutdownCh}
c.init()
return c
}

which in lock is missing:

func New(ui cli.Ui) *cmd {
c := &cmd{UI: ui}
c.init()
return c
}

However that variable is referenced here:

consul/command/lock/lock.go

Lines 207 to 212 in 8cdba96

// Monitor for shutdown, child termination, or lock loss
select {
case <-c.ShutdownCh:
if c.verbose {
c.UI.Info("Shutdown triggered, killing child")
}

So maybe it's just missing that. Thanks

@mkeeler
Copy link
Member

mkeeler commented Jul 26, 2018

@JohnKiller Maybe you know more GO than you thought. That is a good catch and certainly looks suspect.

@mkeeler mkeeler added the good first issue A well-defined bug or improvement with sufficient context which should be approachable for new contr label Jul 26, 2018
@mkeeler
Copy link
Member

mkeeler commented Jul 26, 2018

So I looked into it a bit and that is part of the issue. Right now not having a shutdown chan means that it will just keep issuing the lock until it gets it.

There is a second piece to note in that the blocking query issued to consul to gain the lock could block for up to 15 seconds.

@JohnKiller do you know how long before systemd times out and issues a sigterm

@JohnKiller
Copy link
Author

Default is 90s. Until timeout, it will just keep trying to get the lock, so I did try another thing:

  • Start lock 2 times
  • Press CTRL+C on the waiting node (no effect)
  • Press CTRL+C on the running node (closes process and correctly releases lock)
  • The other one gets the lock, but won't close
  • Press CTRL+C again, it now works

This means that there is in fact a signal handler that just discards everything.

@JohnKiller
Copy link
Author

Forked the repo, made the changes to have ShutdownCh with MakeShutdownCh() and now I observe two things:

  • On the process which helds the lock, 3 signals gets sent to the child process instead of one
  • On the other process, the shutdown works but only after the timeout you mentioned.

Any suggestion on where to look?

@roback
Copy link

roback commented Feb 25, 2019

We are having the same problem.

Any ideas when this will be fixed? (I'm unfortunately not fluent in GO myself)

@JohnKiller
Copy link
Author

Sorry, i did not dig further since it's behind my possibilities. My workaround is a SIGKILL after a timeout.

@freddygv freddygv removed the good first issue A well-defined bug or improvement with sufficient context which should be approachable for new contr label Oct 4, 2019
@freddygv
Copy link
Contributor

Fixed by #5909

@JohnKiller
Copy link
Author

Hi @freddygv is this backported to 1.6 or should I wait for 1.7? Thanks

@freddygv
Copy link
Contributor

freddygv commented Feb 4, 2020

Hi @JohnKiller I just saw that it didn't get backported to 1.6.3. That means it will be in 1.7, which is coming very soon.

@JohnKiller
Copy link
Author

OK, just upgraded to 1.7.0 and the fix is working.
However, this is the output:

Setting up lock at path: test/.lock
Attempting lock acquisition
^CShutdown triggered or timeout during lock acquisition

Is there a way to abort immediately instead of waiting the lock timeout? It still took about 10 seconds to quit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants