Fix incorrect call to 'bind' in scheduler #50028

Merged
merged 1 commit into from Aug 4, 2017

Conversation

Contributor

julia-stripe commented Aug 2, 2017

I previously submitted #49661 -- I'm not sure if that PR is too big or what, but this is an attempt at a smaller PR that makes progress on the same issue and is easier to review.

What this PR does / why we need it:

In this refactor (ecb962e#diff-67f2b61521299ca8d8687b0933bbfb19R223) the scheduler code was refactored into separate bind and assume functions. When that happened, bind was called with pod as an argument. The argument to bind should be the assumed pod, not the original pod. Evidence that assumedPod is the correct argument bind and not pod:

if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil {
glog.Errorf("scheduler cache FinishBinding failed: %v", err)
}
if err != nil {
glog.V(1).Infof("Failed to bind pod: %v/%v", pod.Namespace, pod.Name)
if err := sched.config.SchedulerCache.ForgetPod(&assumed); err != nil {
. (and it says assumed in the function signature for bind, even though it's not called with the assumed pod as an argument).

This is an issue (and causes #49314, where pods that fail to bind to a node get stuck indefinitely) in the following scenario:

  1. The pod fails to bind to the node
  2. bind calls ForgetPod with the pod argument
  3. since ForgetPod is expecting the assumed pod as an argument (because that's what's in the scheduler cache), it fails with an error like scheduler cache ForgetPod failed: pod test-677550-rc-edit-namespace/nginx-jvn09 state was assumed on a different node
  4. The pod gets lost forever because of some incomplete error handling (which I haven't addressed here in the interest of making a simpler PR)

In this PR I've fixed the call to bind and modified the tests to make sure that ForgetPod gets called with the correct argument (the assumed pod) when binding fails.

Which issue this PR fixes: fixes #49314

Special notes for your reviewer:

Release note:

Fix bug in scheduler that caused initially unschedulable pods to stuck in Pending state forever.
Collaborator

k8s-ci-robot commented Aug 2, 2017

Hi @julia-stripe. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Member

cblecker commented Aug 2, 2017

/ok-to-test

Contributor

julia-stripe commented Aug 2, 2017

/retest

Owner

dchen1107 commented Aug 3, 2017

From the original issue description, it is a serial regression for 1.7 release, and we should patch it. I marked this pr for cherrypick-candidate.

@julia-stripe Looks like there are two prs (this one and #49661) addressing the same issue. I didn't see much difference from both prs. Could you please close one?

@dchen1107 dchen1107 added this to the v1.7 milestone Aug 3, 2017

Owner

dchen1107 commented Aug 3, 2017

The pr looks good to me based on the problem described, and I prefer someone from @kubernetes/sig-scheduling-pr-reviews to take a look too. @bsalamat

Member

k82cn commented Aug 3, 2017

/assign

if err != nil {
return
}
// bind the pod to its host asynchronously (we can do this b/c of the assumption step above).
go func() {
- err := sched.bind(pod, &v1.Binding{
- ObjectMeta: metav1.ObjectMeta{Namespace: pod.Namespace, Name: pod.Name, UID: pod.UID},
+ err := sched.bind(&assumedPod, &v1.Binding{
@bsalamat

bsalamat Aug 3, 2017

Contributor

Isn't this the same as before.

@k82cn

k82cn Aug 3, 2017

Member

we used pod directly before the fix; and it was assumedPod before the refactor.

@wojtek-t

wojtek-t Aug 3, 2017

Member

It's not the same - assume() method above is modifying the given pod.

@bsalamat

bsalamat Aug 3, 2017

Contributor

I meant this piece of code is logically the same in this PR.

@bsalamat

bsalamat Aug 3, 2017

Contributor

Is sched.assume() going to modify assumedPod?

@wojtek-t

wojtek-t Aug 3, 2017

Member

@bsalamat I don't understand this question. Yes it is modifying. Yes it is what we are trying to achieve.

@bsalamat

bsalamat Aug 3, 2017

Contributor

Thanks, @wojtek-t! I don't know why I missed it. I guess github UI still confuses me.

@@ -264,15 +263,16 @@ func (sched *Scheduler) scheduleOne() {
// Tell the cache to assume that a pod now is running on a given node, even though it hasn't been bound yet.
// This allows us to keep scheduling without waiting on binding to occur.
- err = sched.assume(pod, suggestedHost)
+ assumedPod := *pod
@wanghaoran1988

wanghaoran1988 Aug 3, 2017

Contributor

Why you make a copy here? seems you moved the copy from inside the assume func here.

@k82cn

k82cn Aug 3, 2017

Member

we should not use obj from cache directly.

@julia-stripe

julia-stripe Aug 3, 2017

Contributor

Basically we need to make a copy somewhere and @k82cn suggested it would be more clear if we made the copy outside of the assume function. (which I agree with)

Contributor

wanghaoran1988 commented Aug 3, 2017

@k8s-ci-robot k8s-ci-robot requested a review from aveshagarwal Aug 3, 2017

@davidopp davidopp assigned davidopp and unassigned thockin Aug 3, 2017

Member

wojtek-t commented Aug 3, 2017

Thanks a lot for this PR. I added some minor nits but this overall looks good to me.

Member

k82cn commented Aug 3, 2017

@wojtek-t , can you also help to review #49661 ? I think FinishBinding & ForgetPod is also important to 'cherrypick'.

plugin/pkg/scheduler/scheduler.go
@@ -185,14 +185,13 @@ func (sched *Scheduler) schedule(pod *v1.Pod) (string, error) {
}
// assume signals to the cache that a pod is already in the cache, so that binding can be asnychronous.
-func (sched *Scheduler) assume(pod *v1.Pod, host string) error {
+func (sched *Scheduler) assume(assumed *v1.Pod, host string) error {
@wojtek-t

wojtek-t Aug 3, 2017

Member

Please add an explicit comment that assumed pod is being modified by this function.

plugin/pkg/scheduler/scheduler.go
@@ -264,15 +263,16 @@ func (sched *Scheduler) scheduleOne() {
// Tell the cache to assume that a pod now is running on a given node, even though it hasn't been bound yet.
// This allows us to keep scheduling without waiting on binding to occur.
- err = sched.assume(pod, suggestedHost)
+ assumedPod := *pod
+ err = sched.assume(&assumedPod, suggestedHost)
@wojtek-t

wojtek-t Aug 3, 2017

Member

Please add a comment that assume() is modifying the pod here (by setting the suggestHost to it).

if err != nil {
return
}
// bind the pod to its host asynchronously (we can do this b/c of the assumption step above).
go func() {
- err := sched.bind(pod, &v1.Binding{
- ObjectMeta: metav1.ObjectMeta{Namespace: pod.Namespace, Name: pod.Name, UID: pod.UID},
+ err := sched.bind(&assumedPod, &v1.Binding{
@wojtek-t

wojtek-t Aug 3, 2017

Member

It's not the same - assume() method above is modifying the given pod.

Contributor

julia-stripe commented Aug 3, 2017

@wojtek-t I've added the comments you requested!

Contributor

bsalamat commented Aug 3, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 3, 2017

Contributor

julia-stripe commented Aug 3, 2017

/retest

Member

wojtek-t commented Aug 3, 2017

@julia-stripe - please squash commits and I will approve.

Contributor

julia-stripe commented Aug 3, 2017

done!

@k8s-merge-robot k8s-merge-robot removed the lgtm label Aug 3, 2017

Contributor

julia-stripe commented Aug 3, 2017

/retest

Member

wojtek-t commented Aug 4, 2017

/lgtm

/retest

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 4, 2017

Collaborator

k8s-merge-robot commented Aug 4, 2017

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, julia-stripe, wojtek-t

Associated issue: 49314

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Collaborator

k8s-merge-robot commented Aug 4, 2017

Automatic merge from submit-queue

@k8s-merge-robot k8s-merge-robot merged commit 898b1b3 into kubernetes:master Aug 4, 2017

9 of 10 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce-etcd3
Details
cla/linuxfoundation julia-stripe authorized
Details
pull-kubernetes-bazel Job succeeded.
Details
pull-kubernetes-e2e-gce-etcd3 Jenkins job succeeded.
Details
pull-kubernetes-e2e-kops-aws Jenkins job succeeded.
Details
pull-kubernetes-federation-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-node-e2e Jenkins job succeeded.
Details
pull-kubernetes-unit Jenkins job succeeded.
Details
pull-kubernetes-verify Jenkins job succeeded.
Details
Contributor

bsalamat commented Aug 4, 2017

Thanks a lot, @julia-stripe for debugging this issue and the fix!

Owner

davidopp commented Aug 4, 2017

+1

@julia-stripe julia-stripe deleted the julia-stripe:fix-incorrect-scheduler-bind-call branch Aug 4, 2017

Owner

davidopp commented Aug 4, 2017

BTW I assume we should cherrypick this into 1.7, right?

Owner

davidopp commented Aug 4, 2017

Ah, @dchen1107 had already marked this as cherypick-candidate.

Member

wojtek-t commented Aug 7, 2017

Yes - this will be cherrypicked to 1.7. I will take care of it.

Member

wojtek-t commented Aug 7, 2017

Cherrypick in #50240

k8s-merge-robot added a commit that referenced this pull request Aug 7, 2017

Merge pull request #50240 from wojtek-t/automated-cherry-pick-of-#50028-
#50106-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #50028 #50106 upstream release 1.7

Cherry pick of #50028 and #50106 on release-1.7.

#50028: Fix incorrect call to 'bind' in scheduler
#50106: Retry scheduling pods after errors more consistently in scheduler

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment