Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node leases not being GC'd #84981

Open
shyamjvs opened this issue Nov 8, 2019 · 6 comments
Assignees

Comments

@shyamjvs
Copy link
Member

@shyamjvs shyamjvs commented Nov 8, 2019

I recently observed that node lease objects are staying around even after nodes got deleted. It seems like there is no reconcile logic for deleting orphaned node lease objects anywhere in the k8s control-plane. This is leading to stale objects piling up in etcd over time. Even worse if nodes are being created/deleted at a high rate.

Wojtek pointed me to kubelet code here where we come close to identifying this situation, but there also no lease cleanup is happening. Nevertheless, we need some controller to be doing this cleanup outside of kubelet (as kubelet can go poof anytime and node be deleted). Maybe node-lifecycle-controller?

@wojtek-t - can you share thoughts you were suggesting in our offline discussion?

@kubernetes/sig-scalability-bugs

@wojtek-t

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t commented Nov 8, 2019

I think that at the kubelet level, we should introduce an invariant that Lease object is created after the Node object is created - i.e. by changing
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodelease/controller.go#L92

That said, given that we may have leaked some Leases already, we need to extend nodelifecyclecontroller to:

  • periodically list leases (from the local cache)
  • find those that doesn't have owner-refs set
  • for those if the corresponding Node exists, set the owner-refs, otherwise delete the Lease
@wojtek-t

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t commented Nov 8, 2019

How does it going to help? NodeLifecycleController is currrently not updating leases (by design), so changing anything in its internal state won't change anything.

@wojtek-t

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t commented Nov 8, 2019

Actually, I looked deeper into the code, and generally, even if we create the Lease without OwnerReferences, in the next update we will try to update it:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodelease/controller.go#L101

So the only situation when we will not try to update it is when going via the fallback path:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodelease/controller.go#L108

I'm not sure we really need to add the GC component, so the probability of it is really low
[we would have to always hit the "error path" for the whole lifetime of the kubelet]

I'm leaning towards adding a test that ensure that OwnerReferences are set and adding a condition that Lease will always be created with OwnerReferences.
I started working on that already in #84998

@wojtek-t

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t commented Nov 8, 2019

@shyamjvs #84998 is generally ready modulo unit test that I would like to add for it

@shyamjvs

This comment has been minimized.

Copy link
Member Author

@shyamjvs shyamjvs commented Nov 9, 2019

/release-blocking

@wojtek-t

This comment has been minimized.

Copy link
Member

@wojtek-t wojtek-t commented Nov 14, 2019

#84998 was merged - I don't think the other part is release blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.