Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate leader election to lease API #81030

Merged
merged 1 commit into from Sep 26, 2019

Conversation

ricky1993
Copy link
Contributor

Change-Id: I21fd5cdc1af59e456628cf15fc84b2d79db2eda0

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Re-implement #80508, due to the suggestion at #80508 (comment)

If user uses endpoint lock and want to migrate to lease lock, he can switch all components(kcm, scheduler, etc) to "endpointlease" lock and then switch to lease lock safely. Note that the old endpoint lock will not be clean.
Which issue(s) this PR fixes:

Ref #80289

Special notes for your reviewer:
Implement two composite resource locks for migration between lease and endpoint and configmap.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig scalability
/assign @wojtek-t
/cc @mikedanese @timothysc any suggestions?

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 6, 2019
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 6, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @ricky1993. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 6, 2019
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Aug 6, 2019
@wojtek-t
Copy link
Member

wojtek-t commented Aug 7, 2019

/ok-to-test

This looks reasonable to me.
@mikedanese @timothysc - thoughts?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2019
@timothysc
Copy link
Member

I'm still confused by your motivations for this change, and imo release notes are required:

If user uses endpoint lock and want to migrate to lease lock, he can switch all components(kcm, scheduler, etc) to "endpointlease" lock and then switch to lease lock safely. Note that the old endpoint lock will not be clean.

^ What is the user story driving this? You want to switch the locking on the fly, that seems like a really bad idea? What am I missing here?

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2019
@ricky1993
Copy link
Contributor Author

^ What is the user story driving this? You want to switch the locking on the fly, that seems like a really bad idea? What am I missing here?

The ref issue.
"Given that both of these are watched by different components, this is generating a lot of unnecessary load.

We should migrate all leader-election to use Lease API (that was designed exactly for this case)."

@wojtek-t
Copy link
Member

wojtek-t commented Aug 7, 2019

You want to switch the locking on the fly, that seems like a really bad idea? What am I missing here?

@timothysc - endpoints or configmaps are not the objects that we should be doing leader election against. We have an API that was designed exactly for this purpose, which is coordination.Lease.
We would like to use in our components what we recommend others.
In addition to that, using Endpoints object for leader election is bad from performance/scalability POV - Endpoints are watched by kube-proxy on every single node, which means we don't really want to use Endpoints object.

And we exactly don't want to change it randomly but in two phases:
(1) acquire old object lock, once acquired, acquire new object lock; only when both acquired we have a lock and can proceed
(2) (next release) since now master has to acquire the lock on the new object (lease) to be master, we can now delete the need to acquire the old object (endpoints or configmap)
And now we're using lease as it should be.

Is that more clear now?

@ricky1993
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce

@fedebongio
Copy link
Contributor

/cc @liggitt

@timothysc timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 12, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 12, 2019
@timothysc
Copy link
Member

We have an API that was designed exactly for this purpose, which is coordination.Lease.
We would like to use in our components what we recommend others.

lol, we argued for this years ago but briant grant said no way. I don't know how you slid this through, but works for me.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2019
@timothysc
Copy link
Member

/assign @liggitt
^ I wasn't involved in this api creation so deferring to api-approver.

@ricky1993 ricky1993 force-pushed the leader_election_migrate branch 3 times, most recently from 66455ac to 496ac23 Compare August 25, 2019 03:45
@ricky1993
Copy link
Contributor Author

@wojtek-t @mikedanese PTAL. I will add some unittest for multilock at "staging/src/k8s.io/client-go/tools/leaderelection/leaderelection_test.go" soon.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 4, 2019
@ricky1993
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't carefully reviewed the test yet, but the non-test logic looks reasonable for me.
But before I will review the test, I would prefer someone else to also review this.

@mikedanese - can you please take a look?

@ricky1993
Copy link
Contributor Author

friendly ping~ @mikedanese

@timothysc timothysc removed their request for review September 10, 2019 20:59
@mikedanese
Copy link
Member

I would drop RawRecord from the public resource lock API. I don't think you need it. Other than that, this is what I would expect.

@ricky1993 ricky1993 force-pushed the leader_election_migrate branch 4 times, most recently from fb1c1e6 to 944bddd Compare September 26, 2019 07:49
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor comments - other than that lgtm

Change-Id: I21fd5cdc1af59e456628cf15fc84b2d79db2eda0
@wojtek-t
Copy link
Member

LGTM - will let @mikedanese to make a final look

@mikedanese
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikedanese, ricky1993

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2019
@k8s-ci-robot k8s-ci-robot merged commit d14943b into kubernetes:master Sep 26, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Sep 26, 2019
@ricky1993 ricky1993 deleted the leader_election_migrate branch September 27, 2019 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants