Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

Closed
Roasbeef opened this issue Nov 11, 2020 · 3 comments
Labels
beginner Issues suitable for new developers safety General label for issues/PRs related to the safety of using the software tls

Comments

@Roasbeef
Copy link
Member

In a recent version of lnd, we added the --tlsautorefresh option which'll rotate any certs on disk on start up, if we detect that the old one was expired, or if we're not quite in sync with the config (the specified extra domains, etc). In certain container set ups, it's also useful to optionally have lnd just shutdown if it detects that its certs are expired, as assuming there's a hypervisor to restart the container/pod, then upon restart, lnd will have fully up to date certs.

Implementation-wise, this can likely be implemented as a new predicate in the healthcheck package.

@Roasbeef Roasbeef added beginner Issues suitable for new developers safety General label for issues/PRs related to the safety of using the software tls labels Nov 11, 2020
@murtyjones
Copy link
Contributor

Seeking some feedback on this before I attempt a PR.

It sounds like this new healthcheck is configurable by the user? In which case I'm assuming we'd add a new flag, e.g. --tlsautoshutdown.

Then in server.go, optionally include the new check with the others:

	chainHealthCheck := healthcheck.NewObservation(
		...
	)

	diskCheck := healthcheck.NewObservation(
		...
	)

	tlsHealthCheck := healthcheck.NewObservation(
		"tls",
		func() {
			// logic to shut down if certs expired
		},
		cfg.HealthChecks.TlsCheck.Interval,
		cfg.HealthChecks.TlsCheck.Timeout,
		cfg.HealthChecks.TlsCheck.Backoff,
		cfg.HealthChecks.TlsCheck.Attempts,
	)

	checks := []*healthcheck.Observation{
		chainHealthCheck, diskCheck,
	}

	if s.cfg.TLSAutoShutdown {
		checks = append(checks, tlsHealthCheck)
	}

	// If we have not disabled all of our health checks, we create a
	// liveliness monitor with our configured checks.
	s.livelinessMonitor = healthcheck.NewMonitor(
		&healthcheck.Config{
			Checks:   checks,
			Shutdown: srvrLog.Criticalf,
		},
	)

@Roasbeef is that the right idea?

@guggero
Copy link
Collaborator

guggero commented Nov 21, 2020

Yeah, that looks pretty much how I'd approach this as well.
I don't think there needs to be an additional flag like --tlsautoshutdown, though, as you would enable/disable this by setting --healthcheck.tlscheck.attempts either to non-zero or zero, the same way the disk and backend checks work.

@Roasbeef
Copy link
Member Author

Roasbeef commented Dec 2, 2020

Fixed by #4792.

@Roasbeef Roasbeef closed this as completed Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginner Issues suitable for new developers safety General label for issues/PRs related to the safety of using the software tls
Projects
None yet
Development

No branches or pull requests

3 participants