Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet creates and manages node leases #66257

Merged
merged 1 commit into from
Aug 27, 2018

Conversation

mtaufen
Copy link
Contributor

@mtaufen mtaufen commented Jul 16, 2018

This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on KEP-0009,
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.

  • add NodeLease feature gate
  • add kube-node-lease system namespace for node leases
  • add Kubelet option for lease duration
  • add Kubelet-internal lease controller to create and update lease
  • add e2e test for NodeLease feature

I would like to determine a standard policy for lease renewal frequency, based on the configured lease duration, so that we don't need to expose frequency as an additional knob. The renew interval is currently calculated as 1/3 of the lease duration.

kubelet: Users can now enable the alpha NodeLease feature gate to have the Kubelet create and periodically renew a Lease in the kube-node-lease namespace. The lease duration defaults to 40s, and can be configured via the kubelet.config.k8s.io/v1beta1.KubeletConfiguration's NodeLeaseDurationSeconds field.

/cc @wojtek-t @liggitt

@mtaufen mtaufen added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. kind/feature Categorizes issue or PR as related to a new feature. status/approved-for-milestone labels Jul 16, 2018
@mtaufen mtaufen added this to the v1.12 milestone Jul 16, 2018
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 16, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 16, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@mtaufen @yujuhong

Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help

break
}
// log error
glog.Errorf("failed to ensure node lease exists, error: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we calculate the sleep duration first, we can log the sleep time in the error message which is sometimes useful to see how long something is in a backoff loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -92,7 +92,7 @@ func (c *completedConfig) NewBootstrapController(legacyRESTStorage corerest.Lega
EndpointReconciler: c.ExtraConfig.EndpointReconcilerConfig.Reconciler,
EndpointInterval: c.ExtraConfig.EndpointReconcilerConfig.Interval,

SystemNamespaces: []string{metav1.NamespaceSystem, metav1.NamespacePublic},
SystemNamespaces: []string{metav1.NamespaceSystem, metav1.NamespacePublic, metav1.NamespaceNodeLease},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if the feature is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -282,6 +282,8 @@ const (
NamespaceSystem string = "kube-system"
// NamespacePublic is the namespace where we place public info (ConfigMaps)
NamespacePublic string = "kube-public"
// NamespaceNodeLease is the namespace where we place node lease objects (used for node heartbeats)
NamespaceNodeLease string = "kube-node-lease"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we only put leases in this namespace? would this be a logical place to put configmaps with node config as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever name is selected here needs to be communicated well in advance, so cluster operators can reserve the namespace for system use. it does raise an interesting question about what namespaces the kube system components are entitled to claim in the future. anything prefixed with kube-? anything prefixed with kube-system?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this doesn't really belong in meta/v1. it's fundamental to kube-apiserver and kubelet, but not to apimachinery (same really goes for kube-system and kube-public, but we can deal with those separately)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-node would make sense to store node configmaps and leases. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we only put leases in this namespace?

My understanding is just leases. This is what the KEP says:

for each Node there will be a corresponding Lease object with Name equal to Node name in a newly created dedicated namespace (we considered using kube-system namespace but decided that it's already too overloaded). That namespace should be created automatically (similarly to "default" and "kube-system", probably by NodeController) and never be deleted (so that nodes don't require permission for it).

Which implies you just use it for node leases, because not prefixing the lease names means you can't cleanly add any other type of lease to that namespace in the future.

needs to be communicated well in advance

My understanding is that we had already reserved the kube- prefix for namespaces? Maybe this isn't true? @thockin do you know?

this doesn't really belong in meta/v1

That's fair, but then where should we put the constants for standard system namespaces? Multiple consumers need them, so they should be in a common place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just core/v1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the others are already in pkg/apis/core/types.go... there or in core/v1/types.go is better than meta/v1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @bgrant0607 about kube-, given that this is the convention we think we should just explicitly reserve it: kubernetes/community#2379

Copy link
Contributor Author

@mtaufen mtaufen Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't the other ones in core/v1 in addition to core? Just because they're in meta/v1 instead (odd, as you said)?
Should we put a constant for the node lease namespace in both core and core/v1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to both for now

// log error
glog.Errorf("failed to ensure node lease exists, error: %v", err)
// backoff wait
time.Sleep(sleep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're not using the injected clock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

glog.Errorf("failed to ensure node lease exists, error: %v", err)
// backoff wait
time.Sleep(sleep)
sleep = minDuration(2*sleep, 7*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 seconds hard-coded? this looks weird...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it a constant. I'm just keeping it consistent with what node registration does, for lack of a concrete principle to actually base the number on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these need to match the node registration exactly, but if you want to, factoring them out to use in both places sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep them decoupled, but just set to the same number for now.

lease, err := c.client.Get(c.holderIdentity, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
// lease does not exist, create it
lease, err := c.client.Create(c.newLease(nil))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this working today in a test cluster with authz enabled? I expected to see modifications to the Node authorizer and NodeRestriction admission plugin to let kubelets mess with their own leases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I forgot about that. I will add those changes to this PR, and it probably makes sense to ensure the e2e test also runs in the standard e2e suite, instead of just the e2e node suite.

We also don't currently run the node authorizer/node restriction in the e2e node tests. Maybe I should revisit #60172?

lease, err := c.client.Get(c.holderIdentity, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
// lease does not exist, create it
lease, err := c.client.Create(c.newLease(nil))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these client calls include a request timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heartbeatClient is explicitly constructed with a timeout to avoid eternal hangs, and a distinct QPS to avoid starvation by other kube API calls, but is currently fixed to corev1. need to adapt/widen that for use here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is configured on the client before it is passed to the controller.

Copy link
Contributor Author

@mtaufen mtaufen Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the timeout to match the heartbeat frequency, then we need a distinct client, since the Lease-based heartbeat potentially has a different frequency than the node status update, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the shorter of the two would probably be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -833,6 +834,9 @@ func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration,
klet.appArmorValidator = apparmor.NewValidator(containerRuntime)
klet.softAdmitHandlers.AddPodAdmitHandler(lifecycle.NewAppArmorAdmitHandler(klet.appArmorValidator))
klet.softAdmitHandlers.AddPodAdmitHandler(lifecycle.NewNoNewPrivsAdmitHandler(klet.containerRuntime))

klet.nodeLeaseController = nodelease.NewController(klet.clock, klet.kubeClient, string(klet.nodeName), kubeCfg.NodeLeaseDurationSeconds, klet.onRepeatedHeartbeatFailure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expecting to use the node name as the name of the lease object. Is that compatible with the currently allowed names of the lease objects? I can't remember if the initial PR limited them to single dns labels (e.g. no . characters). If so, we likely want to relax that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ValidateLease validates a Lease.
func ValidateLease(lease *coordination.Lease) field.ErrorList {
	allErrs := validation.ValidateObjectMeta(&lease.ObjectMeta, true, validation.NameIsDNSSubdomain, field.NewPath("objectMeta"))
	allErrs = append(allErrs, ValidateLeaseSpec(&lease.Spec, field.NewPath("spec"))...)
	return allErrs
}

Looks like we just require the name to be a DNS subdomain, via validation.NameIsDNSSubdomain.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 17, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 17, 2018
@mtaufen mtaufen force-pushed the node-lease branch 4 times, most recently from 59a0b33 to 4166d10 Compare July 17, 2018 23:44
@yujuhong
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mtaufen, wojtek-t, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2018
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 23, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 23, 2018

re-add label after rebase

@mtaufen mtaufen added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2018
@wojtek-t
Copy link
Member

@mtaufen - please rebase one more time.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2018
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.

- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 26, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 26, 2018

re-add label after rebase

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2018
@mtaufen mtaufen added lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 26, 2018
@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Aug 26, 2018
@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 27, 2018

/retest

@mtaufen mtaufen removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit aec2702 into kubernetes:master Aug 27, 2018
@wojtek-t
Copy link
Member

Great to see that merged.

mars1024 pushed a commit to mars1024/kubernetes that referenced this pull request Nov 26, 2019
Make periodic NodeStatus updates cheaper

cherry-pick 
master, create lease api, kubernetes#64246
master, node lifecycle controller, kubernetes#69241
kubelet, kubernetes#66257

See merge request !184866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubeadm area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet