-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet creates and manages node leases #66257
Conversation
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process Pull Request Labels
|
pkg/kubelet/nodelease/controller.go
Outdated
break | ||
} | ||
// log error | ||
glog.Errorf("failed to ensure node lease exists, error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we calculate the sleep duration first, we can log the sleep time in the error message which is sometimes useful to see how long something is in a backoff loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/master/controller.go
Outdated
@@ -92,7 +92,7 @@ func (c *completedConfig) NewBootstrapController(legacyRESTStorage corerest.Lega | |||
EndpointReconciler: c.ExtraConfig.EndpointReconcilerConfig.Reconciler, | |||
EndpointInterval: c.ExtraConfig.EndpointReconcilerConfig.Interval, | |||
|
|||
SystemNamespaces: []string{metav1.NamespaceSystem, metav1.NamespacePublic}, | |||
SystemNamespaces: []string{metav1.NamespaceSystem, metav1.NamespacePublic, metav1.NamespaceNodeLease}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only if the feature is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -282,6 +282,8 @@ const ( | |||
NamespaceSystem string = "kube-system" | |||
// NamespacePublic is the namespace where we place public info (ConfigMaps) | |||
NamespacePublic string = "kube-public" | |||
// NamespaceNodeLease is the namespace where we place node lease objects (used for node heartbeats) | |||
NamespaceNodeLease string = "kube-node-lease" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we only put leases in this namespace? would this be a logical place to put configmaps with node config as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever name is selected here needs to be communicated well in advance, so cluster operators can reserve the namespace for system use. it does raise an interesting question about what namespaces the kube system components are entitled to claim in the future. anything prefixed with kube-
? anything prefixed with kube-system
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, this doesn't really belong in meta/v1. it's fundamental to kube-apiserver and kubelet, but not to apimachinery (same really goes for kube-system and kube-public, but we can deal with those separately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-node
would make sense to store node configmaps and leases. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we only put leases in this namespace?
My understanding is just leases. This is what the KEP says:
for each Node there will be a corresponding Lease object with Name equal to Node name in a newly created dedicated namespace (we considered using kube-system namespace but decided that it's already too overloaded). That namespace should be created automatically (similarly to "default" and "kube-system", probably by NodeController) and never be deleted (so that nodes don't require permission for it).
Which implies you just use it for node leases, because not prefixing the lease names means you can't cleanly add any other type of lease to that namespace in the future.
needs to be communicated well in advance
My understanding is that we had already reserved the kube-
prefix for namespaces? Maybe this isn't true? @thockin do you know?
this doesn't really belong in meta/v1
That's fair, but then where should we put the constants for standard system namespaces? Multiple consumers need them, so they should be in a common place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just core/v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the others are already in pkg/apis/core/types.go
... there or in core/v1/types.go is better than meta/v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @bgrant0607 about kube-
, given that this is the convention we think we should just explicitly reserve it: kubernetes/community#2379
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't the other ones in core/v1 in addition to core? Just because they're in meta/v1 instead (odd, as you said)?
Should we put a constant for the node lease namespace in both core and core/v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to both for now
pkg/kubelet/nodelease/controller.go
Outdated
// log error | ||
glog.Errorf("failed to ensure node lease exists, error: %v", err) | ||
// backoff wait | ||
time.Sleep(sleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we're not using the injected clock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/kubelet/nodelease/controller.go
Outdated
glog.Errorf("failed to ensure node lease exists, error: %v", err) | ||
// backoff wait | ||
time.Sleep(sleep) | ||
sleep = minDuration(2*sleep, 7*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 seconds hard-coded? this looks weird...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it a constant. I'm just keeping it consistent with what node registration does, for lack of a concrete principle to actually base the number on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure these need to match the node registration exactly, but if you want to, factoring them out to use in both places sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep them decoupled, but just set to the same number for now.
lease, err := c.client.Get(c.holderIdentity, metav1.GetOptions{}) | ||
if apierrors.IsNotFound(err) { | ||
// lease does not exist, create it | ||
lease, err := c.client.Create(c.newLease(nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this working today in a test cluster with authz enabled? I expected to see modifications to the Node authorizer and NodeRestriction admission plugin to let kubelets mess with their own leases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I forgot about that. I will add those changes to this PR, and it probably makes sense to ensure the e2e test also runs in the standard e2e suite, instead of just the e2e node suite.
We also don't currently run the node authorizer/node restriction in the e2e node tests. Maybe I should revisit #60172?
lease, err := c.client.Get(c.holderIdentity, metav1.GetOptions{}) | ||
if apierrors.IsNotFound(err) { | ||
// lease does not exist, create it | ||
lease, err := c.client.Create(c.newLease(nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these client calls include a request timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heartbeatClient is explicitly constructed with a timeout to avoid eternal hangs, and a distinct QPS to avoid starvation by other kube API calls, but is currently fixed to corev1. need to adapt/widen that for use here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is configured on the client before it is passed to the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want the timeout to match the heartbeat frequency, then we need a distinct client, since the Lease-based heartbeat potentially has a different frequency than the node status update, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the shorter of the two would probably be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/kubelet/kubelet.go
Outdated
@@ -833,6 +834,9 @@ func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration, | |||
klet.appArmorValidator = apparmor.NewValidator(containerRuntime) | |||
klet.softAdmitHandlers.AddPodAdmitHandler(lifecycle.NewAppArmorAdmitHandler(klet.appArmorValidator)) | |||
klet.softAdmitHandlers.AddPodAdmitHandler(lifecycle.NewNoNewPrivsAdmitHandler(klet.containerRuntime)) | |||
|
|||
klet.nodeLeaseController = nodelease.NewController(klet.clock, klet.kubeClient, string(klet.nodeName), kubeCfg.NodeLeaseDurationSeconds, klet.onRepeatedHeartbeatFailure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is expecting to use the node name as the name of the lease object. Is that compatible with the currently allowed names of the lease objects? I can't remember if the initial PR limited them to single dns labels (e.g. no .
characters). If so, we likely want to relax that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ValidateLease validates a Lease.
func ValidateLease(lease *coordination.Lease) field.ErrorList {
allErrs := validation.ValidateObjectMeta(&lease.ObjectMeta, true, validation.NameIsDNSSubdomain, field.NewPath("objectMeta"))
allErrs = append(allErrs, ValidateLeaseSpec(&lease.Spec, field.NewPath("spec"))...)
return allErrs
}
Looks like we just require the name to be a DNS subdomain, via validation.NameIsDNSSubdomain
.
59a0b33
to
4166d10
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mtaufen, wojtek-t, yujuhong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
re-add label after rebase |
@mtaufen - please rebase one more time. |
This extends the Kubelet to create and periodically update leases in a new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md), these leases can be used as a node health signal, and will allow us to reduce the load caused by over-frequent node status reporting. - add NodeLease feature gate - add kube-node-lease system namespace for node leases - add Kubelet option for lease duration - add Kubelet-internal lease controller to create and update lease - add e2e test for NodeLease feature - modify node authorizer and node restriction admission controller to allow Kubelets access to corresponding leases
New changes are detected. LGTM label has been removed. |
re-add label after rebase |
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
Great to see that merged. |
Make periodic NodeStatus updates cheaper cherry-pick master, create lease api, kubernetes#64246 master, node lifecycle controller, kubernetes#69241 kubelet, kubernetes#66257 See merge request !184866
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on KEP-0009,
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.
I would like to determine a standard policy for lease renewal frequency, based on the configured lease duration, so that we don't need to expose frequency as an additional knob. The renew interval is currently calculated as 1/3 of the lease duration.
/cc @wojtek-t @liggitt