Use separate client for leader election in scheduler #53793

wojtek-t · 2017-10-12T11:20:22Z

@kubernetes/sig-scheduling-bugs @bsalamat @davidopp

Use separate client for leader election in scheduler to avoid starving leader election by regular scheduler operations.

wojtek-t · 2017-10-12T11:30:46Z

/approve

shyamjvs · 2017-10-12T11:31:22Z

plugin/cmd/kube-scheduler/app/configurator.go

 	}

 	kubeconfig.ContentType = s.ContentType
 	// Override kubeconfig qps/burst settings from flags
 	kubeconfig.QPS = s.KubeAPIQPS
 	kubeconfig.Burst = int(s.KubeAPIBurst)
-
-	cli, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "leader-election"))
+	kubeClient, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "scheduler"))


Use NewForConfigOrDie here too?

No - this is by design.
With the first call here, I want to catch an error if any.
With the second call, I already validated the config (via the first call). So it's safe to call ...OrDie, because something really bad need to happen so that it will actually die.

Interesting.. Thanks for the explanation.

shyamjvs · 2017-10-12T11:31:56Z

plugin/cmd/kube-scheduler/app/configurator.go

@@ -56,22 +56,22 @@ func createRecorder(kubecli *clientset.Clientset, s *options.SchedulerServer) re
 	return eventBroadcaster.NewRecorder(api.Scheme, v1.EventSource{Component: s.SchedulerName})
 }

-func createClient(s *options.SchedulerServer) (*clientset.Clientset, error) {
+func createClient(s *options.SchedulerServer) (*clientset.Clientset, *clientset.Clientset, error) {


s/createClient/createClients?

shyamjvs · 2017-10-12T11:32:56Z

Couple of nits and lgtm.

shyamjvs · 2017-10-12T11:55:42Z

/lgtm

k8s-github-robot · 2017-10-12T11:55:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shyamjvs, wojtek-t

Associated issue: 53327

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~plugin/cmd/kube-scheduler/OWNERS~~ [wojtek-t]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-10-12T12:39:01Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-10-12T13:36:12Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

…93-upstream-release-1.7 Automatic merge from submit-queue. Automated cherry pick of #53793 upstream release 1.7 Cherry pick of #53793 on release-1.7. #53793: Use separate client for leader election in scheduler

k8s-cherrypick-bot · 2017-10-13T14:19:23Z

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

timothysc · 2017-10-13T15:31:00Z

This totally seems like a hack.

I'm just getting through my backlog, why exactly did you do this? LE should really switch to configmaps for scale concerns. Otherwise were just shuffling around the problem.

wojtek-t · 2017-10-13T18:56:17Z

I'm just getting through my backlog, why exactly did you do this? LE should really switch to configmaps for scale concerns. Otherwise were just shuffling around the problem.

I completely don't understand. It doesn't matter if we are using configmaps, endpoints or a dedicated object. The client used for it should be a separate client. Otherwise if because of some reason scheduler (or any other component) will generate too many API calls, they may starve leader election and thus cause unnecessary leader reelection.
BTW - a separate leader election client is exactly what we are doing in other components (like controller manager).

timothysc · 2017-10-13T19:45:00Z

Part of the reason for my config map comment, is LE is a source of endpoint spamming whose fanout still causes the api-server to notify all the watchers.

I don't see how the LE client effects the api-server in any significant way.

…93-upstream-release-1.8 Automatic merge from submit-queue. Automated cherry pick of #53793 upstream release 1.8 Cherry pick of #53793 on release-1.8. #53793: Use separate client for leader election in scheduler

wojtek-t · 2017-10-15T16:39:29Z

Part of the reason for my config map comment, is LE is a source of endpoint spamming whose fanout still causes the api-server to notify all the watchers.
I don't see how the LE client effects the api-server in any significant way.

The fact whether it is the same or a different client doesn't impact apiserver at all.

As I wrote above, if we are using the same client for regular client api calls and for leader election, then generating too many api calls (due to how the client library is implemented), if we are generating them faster then qps limit, they are accumulating, and this may block refreshing leader lock. Which mean lead into loosing a lock and unnecessary leader reelection.

timothysc · 2017-10-17T02:20:08Z

I've seen this a couple of times before in ~ year ago before we did all the api-server improvements.

FWIW You can always change the default leader election time too.

wojtek-t · 2017-10-17T07:53:44Z

I've seen this a couple of times before in ~ year ago before we did all the api-server improvements.

It's not about apiserver at all. If you do a bug in any component (and try to send to many apicalls), the QPS limit on client side may starve leader election heartbeats even if apiserver would be infinitely fast and efficient.

FWIW You can always change the default leader election time too.

That's not a good workaround because if affect the legitimate reelections that should happen.

BTW - I don't understand your concerns around this. Is there any scenario you have in mind that this PR is making worse?

timothysc · 2017-10-18T18:38:44Z

I don't understand your concerns around this. Is there any scenario you have in mind that this PR is making worse?

I think I misread the original problem, but it still leaves me unsettled b/c it looks like it's tap-dancing around the root of the problem, and makes the code cryptic to anyone reading it for the 1st time.

1 We should have updated the QPS limits a while ago, and used back-pressure, but we pushed it off...
2 Why is there now a spike in traffic, to the point where it exhausts QPS, and why isn't the QPS limit adjusted appropriately?

shyamjvs · 2017-10-18T20:14:30Z

Why is there now a spike in traffic, to the point where it exhausts QPS, and why isn't the QPS limit adjusted appropriately?

I don't think we saw a spike in the traffic. This change is more of a correctness fix iiuc from @wojtek-t 's comment.

wojtek-t · 2017-10-19T06:59:39Z

@timothysc - this isn't solving any problem we've seen. It's just something that i think we may potentially see at some point.

Regarding using backpressure - I agree. But this won't fully solve this problem. There is no way we can guarantee that apiserver will reject other requests, not these one.

The feature that we would need (and would solve this problem) is priority of api request. Those coming from leader election should have highest priority and should alwyas be processed before others. The trick will separate client is a bit of workaround of this problem.

So in general, I agree that this isn't the perfect solution. But to solve it properly we will need:

backpressure in apiserver
priority of api calls
Those two features are both quite big and noone really has capacity to do them. That's why we're doing things like this.

OTOH, those changes are pretty local and not polluting the code, so it's not that big deal in my opinion.

@sjenning

Automatic merge from submit-queue. UPSTREAM: 53989: Remove repeated random string generations in scheduler volume predicate @sjenning @smarterclayton Though the upstream PR 53793 has been backported to kube 1.7 branch (53884). I am not sure if we have a plan for another origin rebase to latest kube 1.7, and if we would want to wait for that. So this backports following 3 PRs: kubernetes/kubernetes#53793 kubernetes/kubernetes#53720 (partial) kubernetes/kubernetes#53989

wojtek-t added cherrypick-candidate release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Oct 12, 2017

wojtek-t added this to the v1.7 milestone Oct 12, 2017

wojtek-t assigned shyamjvs Oct 12, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2017

shyamjvs reviewed Oct 12, 2017

View reviewed changes

User separate client for leader election in scheduler

234e20b

wojtek-t force-pushed the separate_leader_election_in_scheduler branch from 7b3bb54 to 234e20b Compare October 12, 2017 11:44

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2017

k8s-github-robot merged commit 9c1796a into kubernetes:master Oct 12, 2017

wojtek-t changed the title ~~User separate client for leader election in scheduler~~ Use separate client for leader election in scheduler Oct 13, 2017

wojtek-t mentioned this pull request Oct 13, 2017

Automated cherry pick of #53793 upstream release 1.7 #53884

Merged

wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 13, 2017

wojtek-t mentioned this pull request Oct 13, 2017

Automated cherry pick of #53793 upstream release 1.8 #53885

Merged

k8s-cherrypick-bot removed the cherrypick-candidate label Oct 13, 2017

aveshagarwal mentioned this pull request Oct 13, 2017

UPSTREAM: 53989: Remove repeated random string generations in scheduler volume predicate openshift/origin#16864

Merged

kevin-wangzefeng mentioned this pull request Oct 16, 2017

scheduler panic at leader election fail when there a lot of failed-scheduling pods #49947

Closed

wojtek-t deleted the separate_leader_election_in_scheduler branch February 1, 2018 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate client for leader election in scheduler #53793

Use separate client for leader election in scheduler #53793

wojtek-t commented Oct 12, 2017 •

edited

wojtek-t commented Oct 12, 2017

shyamjvs Oct 12, 2017

wojtek-t Oct 12, 2017

shyamjvs Oct 12, 2017

shyamjvs Oct 12, 2017

wojtek-t Oct 12, 2017

shyamjvs commented Oct 12, 2017

shyamjvs commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-cherrypick-bot commented Oct 13, 2017

timothysc commented Oct 13, 2017

wojtek-t commented Oct 13, 2017

timothysc commented Oct 13, 2017

wojtek-t commented Oct 15, 2017

timothysc commented Oct 17, 2017

wojtek-t commented Oct 17, 2017

timothysc commented Oct 18, 2017

shyamjvs commented Oct 18, 2017

wojtek-t commented Oct 19, 2017

Use separate client for leader election in scheduler #53793

Use separate client for leader election in scheduler #53793

Conversation

wojtek-t commented Oct 12, 2017 • edited

wojtek-t commented Oct 12, 2017

shyamjvs Oct 12, 2017

Choose a reason for hiding this comment

wojtek-t Oct 12, 2017

Choose a reason for hiding this comment

shyamjvs Oct 12, 2017

Choose a reason for hiding this comment

shyamjvs Oct 12, 2017

Choose a reason for hiding this comment

wojtek-t Oct 12, 2017

Choose a reason for hiding this comment

shyamjvs commented Oct 12, 2017

shyamjvs commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-github-robot commented Oct 12, 2017

k8s-cherrypick-bot commented Oct 13, 2017

timothysc commented Oct 13, 2017

wojtek-t commented Oct 13, 2017

timothysc commented Oct 13, 2017

wojtek-t commented Oct 15, 2017

timothysc commented Oct 17, 2017

wojtek-t commented Oct 17, 2017

timothysc commented Oct 18, 2017

shyamjvs commented Oct 18, 2017

wojtek-t commented Oct 19, 2017

wojtek-t commented Oct 12, 2017 •

edited