-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apiserver lease garbage collector #95895
Conversation
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
943ab29
to
6aa7787
Compare
/triage accepted |
78533eb
to
5d45512
Compare
Actually, even in that case. I don't think the identity system should have external-to-apiserver dependencies. |
It only manages apiserver leases. The reason for running the GC in KCM was that nothing in KAS today runs leader election-- but that's totally achievable. Leases of apiservers other than KAS:
I like the idea that identity system should have external-to-apiserver dependencies. I will update the PR to do that. |
Actually I won't add leader election to KAS. This GC has no retry logic, since all it does is a few GET and DELETE every hour. Adding leader election to KAS will cost more leader election traffic. |
Yeah we shouldn't need leader election for this, it is (or should be) safe to be run multiple times concurrently. |
5d45512
to
258bcd8
Compare
leaseInformer := informers.NewFilteredLeaseInformer( | ||
clientset, | ||
metav1.NamespaceSystem, | ||
12*time.Hour, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to resync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. Changed to 0
} | ||
selector = selector.Add(*r) | ||
|
||
leases, err := c.leaseLister.Leases(metav1.NamespaceSystem).List(selector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The informer has already filtered with the labelselector, do you need the selector here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the selector and added test cases to make sure we don't GC leases that we don't intend to manage
continue | ||
} | ||
if errors.IsNotFound(err) || lease == nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any component other than this GC expected to delete a lease? If not, let's log a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple GC controllers can delete the lease simultaneously. Added comments
// Copied from controlplane.IdentityLeaseComponentLabelKey and controlplane.KubeAPIServer | ||
// to avoid import cycle. | ||
identityLeaseComponentLabelKey = "k8s.io/component" | ||
kubeAPIServer = "kube-apiserver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move them to a common package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made both the label selector and the namespace parameters to the gc controller. I think this way better reflects what the controller is capable of-- it operates on Leases in one namespace with a specific label set. Not saying we will make the code public/reusable, but we could if we want in future.
27111d3
to
ddaac64
Compare
possible github issue, please link if you run into this on other PR's kubernetes/test-infra#19910 |
@spiffxp Got it. Thanks |
@@ -117,6 +118,7 @@ func NewServerRunOptions() *ServerRunOptions { | |||
EndpointReconcilerType: string(reconcilers.LeaseEndpointReconcilerType), | |||
IdentityLeaseDurationSeconds: 3600, | |||
IdentityLeaseRenewIntervalSeconds: 10, | |||
IdentityLeaseGCCheckPeriodSeconds: 3600, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be separate from IdentityLeaseDurationSeconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. If we know all leases have the same duration (which is the case here), there is no point to configure the check period differently. Reverted
the knob made sense when/if:
- the controller lived in KCM-- not true anymore
- we want the controller to manage leases from aggregated servers (with different duration)-- the more I think about it, aggregated servers shouldn't rely on this GC controller:
a. they don't need heartbeat-based identity at all-- downward API plus owner reference is more efficient
b. if they really want, they can start a similar GC controller that operates their leases in a user namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just adding some thoughts why we don't need a separate flag) another benefit of having this flag is to make the GC more responsive-- the controller deletes a lease if the apiserver failed to refresh its lease in:
(leaseDuration, leaseDuration+checkPeriod]
seconds
instead of (without the flag)
(leaseDuration, 2*leaseDuration]
seconds
but that is not our goal. We prefer that expired Leases remain for a longer duration as opposed to collecting them quickly. That's why we choice a super long leaseDuration by default (1h, compared with 10s renewInterval). If people want to GC leases sooner after an apiserver is not active, they should start with shortening the leaseDuration. Having a separate flag for this level of control seems like an overkill for an alpha feature.
if err := c.kubeclientset.CoordinationV1().Leases(c.leaseNamespace).Delete(context.TODO(), lease.Name, metav1.DeleteOptions{}); err != nil && !errors.IsNotFound(err) { | ||
// If we get a 404, the lease was deleted by the same GC controller | ||
// in another apiserver | ||
if !errors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not needed to call IsNotFound twice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now I see why @caesarxuchao asks me to break the long lines.. this is what happens :)
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, roycaihw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit. Otherwise lgtm.
/lgtm
continue | ||
} | ||
if errors.IsNotFound(err) || lease == nil { | ||
// the lease was deleted by the same GC controller in another apiserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you state this could be the case if it's an HA-master? Also state that we don't expect other components to delete the lease?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a klog.V(4)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Also squashed the commits
cadc6c2
to
e363709
Compare
/lgtm |
Implements KEP. Split off from #95222.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig api-machinery
/assign @caesarxuchao