New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move frequent Kubelet heartbeats to Lease API #589

Open
mtaufen opened this Issue Jul 17, 2018 · 26 comments

Comments

@mtaufen
Copy link
Contributor

mtaufen commented Jul 17, 2018

Feature Description

  • One-line feature description (can be used as a release note): Kubelet creates and periodically renews a Lease on the node; node lifecycle controller treats this lease as a health signal
  • Primary contact (assignee): @wangzhen127
  • Responsible SIGs: sig-node
  • Design proposal link (community repo): KEP-0009
  • Link to e2e and/or unit tests: (coming soon)
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred: @yujuhong @dchen1107 @wojtek-t
  • Approver (likely from SIG/area to which feature belongs): @yujuhong
  • Feature target (which target equals to which milestone):

@mtaufen mtaufen added this to the v1.12 milestone Jul 17, 2018

@mtaufen mtaufen self-assigned this Jul 17, 2018

@mtaufen mtaufen added the tracked/yes label Jul 17, 2018

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 18, 2018

@mtaufen --

It looks like this feature is currently in the Kubernetes 1.12 Milestone.

If that is still accurate, please ensure that this issue is up-to-date with ALL of the following information:

  • One-line feature description (can be used as a release note):
  • Primary contact (assignee):
  • Responsible SIGs:
  • Design proposal link (community repo):
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
  • Approver (likely from SIG/area to which feature belongs):
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y)
    • Beta release target (x.y)
    • Stable release target (x.y)

Set the following:

  • Description
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

Once this feature is appropriately updated, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that the Features Freeze is July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented Jul 19, 2018

@justaugustus this is ready to be included in the v1.12 milestone.

Thanks!

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 22, 2018

Added. Thanks for confirming, @mtaufen!

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented Aug 15, 2018

We decided to push this to 1.13, given the limited time we have left.
Additionally, @wangzhen127 is going to own this feature going forward, as I have been pulled into other work.

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented Aug 15, 2018

We will still try to merge kubernetes/kubernetes#66257 for 1.12, since it is almost done and we don't want it to rot.

@kacole2

This comment has been minimized.

Copy link
Member

kacole2 commented Aug 15, 2018

thanks for the update @mtaufen this has been removed from the 1.12 tracking sheet and added to the tab Removed from Milestone.

cc @justaugustus

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Aug 15, 2018

@mtaufen -- thanks for being proactive about updating us!
@kacole2 -- thanks for getting the sheet updated

/unassign @mtaufen
/assign @wangzhen127

@justaugustus justaugustus added tracked/no and removed tracked/yes labels Aug 15, 2018

justaugustus pushed a commit to justaugustus/enhancements that referenced this issue Sep 3, 2018

Kubernetes Submit Queue
Merge pull request kubernetes#589 from jsafrane/containerized-mount
Automatic merge from submit-queue.

Proposal: containerized mount utilities in pods

@kubernetes/sig-storage-proposals  @kubernetes/sig-node-proposals

@kacole2 kacole2 added the tracked/yes label Oct 8, 2018

@kacole2

This comment has been minimized.

Copy link
Member

kacole2 commented Nov 8, 2018

@wangzhen127 looks like there is steady progress on these k/k issues. Is kubernetes/test-infra#9942 the last one? Any other issues that should be tracked to make sure everything is firm going into code slush tomorrow 11/9?

@wangzhen127

This comment has been minimized.

Copy link
Member

wangzhen127 commented Nov 8, 2018

kubernetes/kubernetes#70034 is the last one. Should be able to submit by 11/9.

kubernetes/test-infra#9942 is to update the CI job to cover node lease tests. It is not blocking the feature and I don't think it needs to be submitted by 11/9. I plan to submit this test-infra PR next week.

And I will submit the doc PR kubernetes/website#10699 next week, too.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 19, 2018

@kacole2

This comment has been minimized.

Copy link
Member

kacole2 commented Nov 19, 2018

/reopen

@wangzhen127 we keep issues open until they have graduated to stable. This has 2 more release cycles if this is an Alpha implementation.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 19, 2018

@kacole2: Reopened this issue.

In response to this:

/reopen

@wangzhen127 we keep issues open until they have graduated to stable. This has 2 more release cycles if this is an Alpha implementation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@claurence

This comment has been minimized.

Copy link

claurence commented Jan 14, 2019

@wangzhen127 Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP - it looks like a KEP already exists for this issue here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md - let me know if that is not the correct KEP

@wangzhen127

This comment has been minimized.

Copy link
Member

wangzhen127 commented Jan 14, 2019

That KEP is the correct one. And the implementation is complete. @wojtek-t knows the most updated enhancement needs if there is any.

@luxas

This comment has been minimized.

Copy link
Member

luxas commented Jan 15, 2019

And the implementation is complete

For what level of stability? alpha/beta or GA for the next release?

@dchen1107 dchen1107 added stage/beta and removed stage/alpha labels Jan 15, 2019

@dchen1107 dchen1107 modified the milestones: v1.13, v1.14 Jan 15, 2019

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Jan 15, 2019

The plan is promoted the feature to beta in 1.14 release cycle.

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Jan 16, 2019

@claurence - is there a list of things somewhere that has to happen for Beta?
We believe that we are code-complete for Beta already.
The docs have been created for Alpha, so it may only require some update about the stage.

Is there anything else that is missing?

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Feb 20, 2019

@wojtek-t The current KEP is lacking the following information:

  • what is the implementation history? has this already been launched as alpha?
  • what is the test plan? ie: what jobs or test cases can we look to for CI signal to know that this is being tested and is working?
  • what graduation criteria are you using to verify this is ready for beta? ie: what behaviors or metrics are you looking for? should this be exercised by scalability tests? what meaningfully makes this "beta" and not "alpha"? there is an unchecked checkbox in this issue description that mentions "tuning", has that been done? etc.
  • what upgrade/downgrade issues might arise when using this? ie: what is the migration path to use this? how does this interoperate with LeaderElectionRecord?
@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Feb 20, 2019

#846 should hopefully address those.

In short:

  • yes it was launched to Alpha in 1.13
  • there are a couple dedicated e2e tests, but in general since enabling this feature in December (right at the beginning of 1.14), all tests are implicitly testing that feature (if this feature didn't work properly, we wouldn't have proper healthiness signal from nodes)
  • For Beta we expecting to see reduce of etcd size (2x+ in large clusters) and at least no drop in scalability SLIs - this is confirmed now
  • risks and mitigations are described in the KEP already - there is no migration, we simply switch the feature on - first in controller manager, and then you can start using it in Kubelet; it doesn't interoperate with LeaderElectionRecord - those are two separate things.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment