Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: change node-lease-renew-interval to 0.25 of lease-renew-duration #80429

Merged

Conversation

@gaorong
Copy link
Contributor

commented Jul 22, 2019

What type of PR is this?
/kind feature

Ref: kubernetes/enhancements#589

What this PR does / why we need it:
change node-lease-renew-interval to 0.25 of lease-renew-duration instead of hardcoded 10s

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubelet: change node-lease-renew-interval to 0.25 of lease-renew-duration

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@wojtek-t @liggitt @wangzhen127 @mtaufen

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

Hi @gaorong. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot requested review from dims and feiskyer Jul 22, 2019

@gaorong gaorong force-pushed the gaorong:expose-node-lease-renewal-frequency branch from fc62414 to 1cf7e41 Jul 22, 2019

@gaorong gaorong force-pushed the gaorong:expose-node-lease-renewal-frequency branch from 1cf7e41 to a37a299 Jul 22, 2019

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 22, 2019

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@mattjmcnaughton
Copy link
Contributor

left a comment

Thanks for your Pr :)

Have you chatted with anyone about whether we consider this change an API change? My guess is yes, in which case we will want to follow the API review process, which is documented here: https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md.

@@ -170,6 +170,8 @@ type KubeletConfiguration struct {
// frequency and post node status immediately if any change is detected. It is
// only used when node lease feature is enabled.
NodeStatusReportFrequency metav1.Duration
// nodeLeaseRenewFrequency is the frequency that kubelet renews its corresponding Lease.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 22, 2019

Member

when NodeLease feature is enabled:

NodeLease: {Default: true, PreRelease: featuregate.Beta},

// NodeLifecycleController.
// Default: "10s"
// +optional
NodeLeaseRenewFrequency metav1.Duration `json:"nodeLeaseRenewFrequency,omitempty"`

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Jul 22, 2019

Member

Yes - this formally requires api review.

@mtaufen mtaufen changed the title add a new filed: NodeLeaseRenewFrequency in kubelet config add a new field: NodeLeaseRenewFrequency in kubelet config Jul 22, 2019

@mtaufen

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@wojtek-t @bgrant0607 Didn't (a long time ago) we discuss that it would be better to determine a reasonable fraction of the lease duration to renew (say 1/2, or 3/4, or something) that works well in virtually all cases, probably based on operational data, and just derive the period between renewals from that?

@gaorong

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2019

As a reminder, there are some discussions about this at first PR regarding node lease(see here).

IMHO, cluste- config is a great suggestion in the long run, but may need a long time to come true and then be production available, we should not take that as a blocker of other functionality (such as node lease).

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 23, 2019

@wojtek-t @bgrant0607 Didn't (a long time ago) we discuss that it would be better to determine a reasonable fraction of the lease duration to renew (say 1/2, or 3/4, or something) that works well in virtually all cases, probably based on operational data, and just derive the period between renewals from that?

I can't remember it (though it doesn't mean we didn't).
I missed the fact the NodeLeaseDuration is already exposed via this config (I though it is hardcoded similarly to renew-interval).
In that case, that sounds like a reasonable option to me. I think 1/2 or way to much (one missed update would mean that we can loose the lease). But 1/4th I think it would actually make sense.

@gaorong

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2019

determine a reasonable fraction of the lease duration to renew

what about node-monitor-grace-period?
For now, the NodeLifecycleController in controller-manager marks node as "Ready: Unknown" if it does not get heartbeat message at every "node-monitor-grace-period" cycle, and it does not care about lease-duration at all.

If a user has a big value of lease duration(as we allow user specify this value), let's say 4min, then the lease interval is 1min (1/4 * 4min), if controller-manager has a node-monitor-grace-period whose value is 40s, In this case, the node will always be ready-unknown state.

If we want to determine lease interval based on lease duration, we must make NodeLifecycleController follow lease-duration and ignore node-monitor-grace-period value when node lease feature is enabled. I'm not sure if this is wanted.

aside: IMHO, before we want to GA node lease feature, we may want to implement node-duration at NodeLifecycleController side, for now, leaseDuriationSeconds is currently only propagated as part of Lease object and not really used in any meaningful way. this may be a good chance to make a change.

@dims

This comment has been minimized.

Copy link
Member

commented Jul 23, 2019

/ok-to-test
/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from dims Jul 23, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

If this makes sense to you, I'm going to push this forward. (open a PR to update the node lease KEP and let more people involved in to discuss)

I think it's worth starting discussion about.
Though it shouldn't be modification of existing KEP - it should be a new KEP.

BTW - based on the whole above discussion - I guess we want to change the logic to set renew-interval based on node-lease-duration - would you be able to change the PR to do that (or open a new one)?

@gaorong gaorong force-pushed the gaorong:expose-node-lease-renewal-frequency branch from a37a299 to d7c77e6 Jul 26, 2019

@gaorong gaorong force-pushed the gaorong:expose-node-lease-renewal-frequency branch 3 times, most recently from 702c1bb to 8924839 Jul 26, 2019

@spiffxp

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

/remove-area conformance

@gaorong

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2019

/retest

@gaorong

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2019

the CI is green now.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

@gaorong - also please update PR title and description

change node-lease-renew-interval to 0.25 of renew-duration
0.25 is a dedicated value to align before default value
of renew-interval but get more heuristic interval

@gaorong gaorong force-pushed the gaorong:expose-node-lease-renewal-frequency branch from 8924839 to cda7836 Jul 29, 2019

@gaorong gaorong changed the title add a new field: NodeLeaseRenewFrequency in kubelet config kubelet: change node-lease-renew-interval to 0.25 of lease-renew-duration Jul 29, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 29, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaorong, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f18833e into kubernetes:master Jul 29, 2019

22 of 23 checks passed

tide Not mergeable. Needs approved label.
Details
cla/linuxfoundation gaorong authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Jul 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.