Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always set expiry on server resource in heartbeats #5008

Merged
merged 1 commit into from Dec 3, 2020

Conversation

awly
Copy link
Contributor

@awly awly commented Nov 30, 2020

All servers (ssh/kube/app/db) should get cleaned up some time after
deletion. If a server implements GetServerInfo without populating
Expiry, default expiry to ServerTTL in the heartbeat code.

If we don't clean up deleted server objects, things like kube routing
will try to reach obsolete endpoints and users will forever see deleted
kube clusters in tsh.

lib/srv/heartbeat.go Outdated Show resolved Hide resolved
@awly awly force-pushed the andrew/expire-kube-service branch 2 times, most recently from e630208 to c35db3c Compare December 1, 2020 01:42
@@ -313,6 +313,11 @@ func (h *Heartbeat) fetch() error {
h.reset(HeartbeatStateInit)
return trace.Wrap(err)
}
// Always set server expiry, no server resource should linger forever after
// deletion.
if server.Expiry().IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be logic internal to services.Server? Why not add this to CheckAndSetDefaults for a services.Server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed through the implementations of GetServerInfo but some do return references to internal values while others generate a new value instead, so there's a chance that this will race as the heartbeat's Run is executed in a separate goroutine.
Also, it looks like only the lib/kube/proxy#TLSServer.GetServerInfo does not update the TTL - should it be done there instead since it has more context and can protect the state if necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @a-palchikov, didn't think about GetServerInfo returning data with references to shared memory.

I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone fails.
I tried setting the expiry in services.Server.CheckAndSetDefaults, but it broke a bunch of tests and also resulted in logs of plumbing changes to pass clockwork.Clock in.

Changed the PR to only update GetServerInfo in lib/kube/proxy for now and set Expiry like all the other servers do. At some point, I do want to centralize this logic, since all servers use the same constant TTL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making a deep-copy on the heartbeat side, but for some reason proto.Clone fails.

This might be related to this work. Which error did you get from proto.Clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was also reflect-related, but different.
I don't have it on-hand right now unfortunately.

Without this, deleted kube_services linger in the backend and show up as
obsolete kubernetes clusters in tsh.

Ideally, this TTL logic should be enforced centrally, but I'd like to
fix the bug first, and do a larger refactoring later.
@awly awly force-pushed the andrew/expire-kube-service branch from adf8f27 to feb4ef0 Compare December 3, 2020 23:35
@awly awly merged commit 11f5dc6 into master Dec 3, 2020
@awly awly deleted the andrew/expire-kube-service branch December 3, 2020 23:51
awly pushed a commit that referenced this pull request Dec 3, 2020
Without this, deleted kube_services linger in the backend and show up as
obsolete kubernetes clusters in tsh.

Ideally, this TTL logic should be enforced centrally, but I'd like to
fix the bug first, and do a larger refactoring later.
awly pushed a commit that referenced this pull request Dec 8, 2020
Without this, deleted kube_services linger in the backend and show up as
obsolete kubernetes clusters in tsh.

Ideally, this TTL logic should be enforced centrally, but I'd like to
fix the bug first, and do a larger refactoring later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants