Always set expiry on server resource in heartbeats #5008

awly · 2020-11-30T22:34:45Z

All servers (ssh/kube/app/db) should get cleaned up some time after
deletion. If a server implements GetServerInfo without populating
Expiry, default expiry to ServerTTL in the heartbeat code.

If we don't clean up deleted server objects, things like kube routing
will try to reach obsolete endpoints and users will forever see deleted
kube clusters in tsh.

lib/srv/heartbeat.go

russjones · 2020-12-01T01:48:39Z

lib/srv/heartbeat.go

@@ -313,6 +313,11 @@ func (h *Heartbeat) fetch() error {
 		h.reset(HeartbeatStateInit)
 		return trace.Wrap(err)
 	}
+	// Always set server expiry, no server resource should linger forever after
+	// deletion.
+	if server.Expiry().IsZero() {


Shouldn't this be logic internal to services.Server? Why not add this to CheckAndSetDefaults for a services.Server?

I only skimmed through the implementations of GetServerInfo but some do return references to internal values while others generate a new value instead, so there's a chance that this will race as the heartbeat's Run is executed in a separate goroutine.
Also, it looks like only the lib/kube/proxy#TLSServer.GetServerInfo does not update the TTL - should it be done there instead since it has more context and can protect the state if necessary?

Good point @a-palchikov, didn't think about GetServerInfo returning data with references to shared memory.

I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone fails.
I tried setting the expiry in services.Server.CheckAndSetDefaults, but it broke a bunch of tests and also resulted in logs of plumbing changes to pass clockwork.Clock in.

Changed the PR to only update GetServerInfo in lib/kube/proxy for now and set Expiry like all the other servers do. At some point, I do want to centralize this logic, since all servers use the same constant TTL.

I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone fails.

This might be related to this work. Which error did you get from proto.Clone?

The error was also reflect-related, but different.
I don't have it on-hand right now unfortunately.

Without this, deleted kube_services linger in the backend and show up as obsolete kubernetes clusters in tsh. Ideally, this TTL logic should be enforced centrally, but I'd like to fix the bug first, and do a larger refactoring later.

awly added the backport-required label Nov 30, 2020

awly added this to the 5.0.1 milestone Nov 30, 2020

awly requested review from a-palchikov, fspmarshall, klizhentas, r0mant, russjones and webvictim as code owners November 30, 2020 22:34

r0mant approved these changes Dec 1, 2020

View reviewed changes

lib/srv/heartbeat.go Outdated Show resolved Hide resolved

awly force-pushed the andrew/expire-kube-service branch 2 times, most recently from e630208 to c35db3c Compare December 1, 2020 01:42

russjones reviewed Dec 1, 2020

View reviewed changes

awly force-pushed the andrew/expire-kube-service branch from c35db3c to adf8f27 Compare December 1, 2020 21:25

awly requested a review from russjones December 1, 2020 21:29

awly mentioned this pull request Dec 3, 2020

InternalError when accessing kubernetes cluster with kubernetes service #5031

Closed

a-palchikov approved these changes Dec 3, 2020

View reviewed changes

Set TTL on kube_service resources

feb4ef0

Without this, deleted kube_services linger in the backend and show up as obsolete kubernetes clusters in tsh. Ideally, this TTL logic should be enforced centrally, but I'd like to fix the bug first, and do a larger refactoring later.

awly force-pushed the andrew/expire-kube-service branch from adf8f27 to feb4ef0 Compare December 3, 2020 23:35

awly merged commit 11f5dc6 into master Dec 3, 2020

awly deleted the andrew/expire-kube-service branch December 3, 2020 23:51

awly mentioned this pull request Dec 3, 2020

Backport: set TTL on kube_service resources #5049

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always set expiry on server resource in heartbeats #5008

Always set expiry on server resource in heartbeats #5008

awly commented Nov 30, 2020

russjones Dec 1, 2020

a-palchikov Dec 1, 2020

awly Dec 1, 2020

a-palchikov Dec 3, 2020

awly Dec 3, 2020

Always set expiry on server resource in heartbeats #5008

Always set expiry on server resource in heartbeats #5008

Conversation

awly commented Nov 30, 2020

russjones Dec 1, 2020

Choose a reason for hiding this comment

a-palchikov Dec 1, 2020

Choose a reason for hiding this comment

awly Dec 1, 2020

Choose a reason for hiding this comment

a-palchikov Dec 3, 2020

Choose a reason for hiding this comment

awly Dec 3, 2020

Choose a reason for hiding this comment