rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

Roasbeef · 2020-11-11T01:40:05Z

In a recent version of lnd, we added the --tlsautorefresh option which'll rotate any certs on disk on start up, if we detect that the old one was expired, or if we're not quite in sync with the config (the specified extra domains, etc). In certain container set ups, it's also useful to optionally have lnd just shutdown if it detects that its certs are expired, as assuming there's a hypervisor to restart the container/pod, then upon restart, lnd will have fully up to date certs.

Implementation-wise, this can likely be implemented as a new predicate in the healthcheck package.

The text was updated successfully, but these errors were encountered:

murtyjones · 2020-11-21T15:00:27Z

Seeking some feedback on this before I attempt a PR.

It sounds like this new healthcheck is configurable by the user? In which case I'm assuming we'd add a new flag, e.g. --tlsautoshutdown.

Then in server.go, optionally include the new check with the others:

	chainHealthCheck := healthcheck.NewObservation(
		...
	)

	diskCheck := healthcheck.NewObservation(
		...
	)

	tlsHealthCheck := healthcheck.NewObservation(
		"tls",
		func() {
			// logic to shut down if certs expired
		},
		cfg.HealthChecks.TlsCheck.Interval,
		cfg.HealthChecks.TlsCheck.Timeout,
		cfg.HealthChecks.TlsCheck.Backoff,
		cfg.HealthChecks.TlsCheck.Attempts,
	)

	checks := []*healthcheck.Observation{
		chainHealthCheck, diskCheck,
	}

	if s.cfg.TLSAutoShutdown {
		checks = append(checks, tlsHealthCheck)
	}

	// If we have not disabled all of our health checks, we create a
	// liveliness monitor with our configured checks.
	s.livelinessMonitor = healthcheck.NewMonitor(
		&healthcheck.Config{
			Checks:   checks,
			Shutdown: srvrLog.Criticalf,
		},
	)

@Roasbeef is that the right idea?

guggero · 2020-11-21T16:59:33Z

Yeah, that looks pretty much how I'd approach this as well.
I don't think there needs to be an additional flag like --tlsautoshutdown, though, as you would enable/disable this by setting --healthcheck.tlscheck.attempts either to non-zero or zero, the same way the disk and backend checks work.

Roasbeef · 2020-12-02T02:34:50Z

Fixed by #4792.

Roasbeef added beginner Issues suitable for new developers safety General label for issues/PRs related to the safety of using the software tls labels Nov 11, 2020

murtyjones mentioned this issue Nov 21, 2020

tls: Add healthcheck to shutdown if certificate is expired #4792

Merged

Roasbeef closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

Roasbeef commented Nov 11, 2020

murtyjones commented Nov 21, 2020

guggero commented Nov 21, 2020

Roasbeef commented Dec 2, 2020

rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

rpc: add option to trigger a graceful shutdown if RPC certs expire while we're active #4761

Comments

Roasbeef commented Nov 11, 2020

murtyjones commented Nov 21, 2020

guggero commented Nov 21, 2020

Roasbeef commented Dec 2, 2020