Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce kubelet getting node lease #81174

Merged
merged 1 commit into from Aug 10, 2019

Conversation

@answer1991
Copy link
Contributor

commented Aug 8, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

In a large-scale cluster, kubelet get lease too much and too frequently, which may cause etcd and apiserver performance issue. In an 8k nodes cluster, kubelet get lease 50k/min .

Node's lease only update by kubelet itself, so kubelet do not need to get lease every time when try to update, just use the latest one kubelet updated.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?:

Fix kubelet NodeLease potential performance issues. Kubelet now will try to update lease using cached one instead of get from API Server every time.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

Hi @answer1991. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mattjmcnaughton

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

/ok-to-test

@answer1991

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

/test pull-kubernetes-e2e-gce

@answer1991

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

/retest

@ricky1993

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

/cc

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

@ricky1993: GitHub didn't allow me to request PR reviews from the following users: ricky1993.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wojtek-t
Copy link
Member

left a comment

I added couple comments, but in general I really like this.

@@ -87,7 +88,18 @@ func (c *controller) Run(stopCh <-chan struct{}) {
}

func (c *controller) sync() {
if c.latestLease != nil {
lease, err := c.leaseClient.Update(c.newLease(c.latestLease))

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

This has certain assumptions, i.e. the net result will be much worse if there would be another component frequently updating that lease object too.
kubernetes/enhancements#1116 is one proposal that will change that (though the updates to lease object by the controller would be very rare and shouldn't really matter).

So basically, this seems reasonable to me in the current state, but requires extensive comment about assumptions.
Something like:

// As long as node lease is not (or very rarely) updated by any other agent that Kubelet,
// we can optimistically assume it didn't change since our last update and try updating
// based on the version from that time. Thanks to it we avoid GET call and reduce load
// on etcd and kube-apiserver.
// If at some point other agents will also be frequently updating the Lease object, this
// can result in performance degradation, because we will end up with calling additional
// PUT - at this point this whole "if" should be removed.

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

thanks, done

pkg/kubelet/nodelease/controller.go Outdated Show resolved Hide resolved
},
}

gr := schema.GroupResource{Group: "v1", Resource: "lease"}

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

Group isn't actually v1.

import "k8s.io/kubernetes/pkg/apis/coordination"

notFoundErr := apierrors.NewNotFound(coordination.Resource("lease"), "foo")

coordination.Resource("lease")

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

should be coordinationv1?

import coordinationv1 "k8s.io/api/coordination/v1"

apierrors.NewNotFound(coordinationv1.Resource("lease"), "foo")

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

in this context both are fine actually (because version is dropped for group resource)

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

ok, thanks

updateReactor func(action clienttesting.Action) (bool, runtime.Object, error)
getReactor func(action clienttesting.Action) (bool, runtime.Object, error)
createReactor func(action clienttesting.Action) (bool, runtime.Object, error)
onRepeatedHeartbeatFailure func()

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

it seems to be nil in all cases - please remove it

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

done

return true, nil, notFoundErr
},
createReactor: func(action clienttesting.Action) (b bool, object runtime.Object, e error) {
return true, &coordinationv1.Lease{ObjectMeta: metav1.ObjectMeta{ResourceVersion: "1"}}, nil

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

Currently it doesn't change much, but I think we should set at least name and namespace.

Sp let's introduce a simple helper function
makeLease(name, resourceVersion) *coordinationv1.Lease
and use it here and below

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

done

pkg/kubelet/nodelease/controller_test.go Outdated Show resolved Hide resolved
case 2:
return true, &coordinationv1.Lease{ObjectMeta: metav1.ObjectMeta{ResourceVersion: "3"}}, nil
default:
return true, &coordinationv1.Lease{}, nil

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 9, 2019

Member

this should never happen - so instead of failing silently, it's better to do t.Fatalf(....)

This comment has been minimized.

Copy link
@answer1991

answer1991 Aug 9, 2019

Author Contributor

done

@dims

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from dims Aug 9, 2019

@answer1991 answer1991 force-pushed the answer1991:reduce-node-lease-get branch from a88d42c to acdac6e Aug 9, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

LGTM - but I will let others to also take a look before approving.

@alejandrox1

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

/cc

@k8s-ci-robot k8s-ci-robot requested a review from alejandrox1 Aug 9, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 10, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 10, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: answer1991, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit c0997e8 into kubernetes:master Aug 10, 2019

23 checks passed

cla/linuxfoundation answer1991 authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.