Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect call to 'bind' in scheduler #50028

Conversation

@julia-stripe
Copy link
Contributor

commented Aug 2, 2017

I previously submitted #49661 -- I'm not sure if that PR is too big or what, but this is an attempt at a smaller PR that makes progress on the same issue and is easier to review.

What this PR does / why we need it:

In this refactor (ecb962e#diff-67f2b61521299ca8d8687b0933bbfb19R223) the scheduler code was refactored into separate bind and assume functions. When that happened, bind was called with pod as an argument. The argument to bind should be the assumed pod, not the original pod. Evidence that assumedPod is the correct argument bind and not pod:

if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil {
glog.Errorf("scheduler cache FinishBinding failed: %v", err)
}
if err != nil {
glog.V(1).Infof("Failed to bind pod: %v/%v", pod.Namespace, pod.Name)
if err := sched.config.SchedulerCache.ForgetPod(&assumed); err != nil {
. (and it says assumed in the function signature for bind, even though it's not called with the assumed pod as an argument).

This is an issue (and causes #49314, where pods that fail to bind to a node get stuck indefinitely) in the following scenario:

  1. The pod fails to bind to the node
  2. bind calls ForgetPod with the pod argument
  3. since ForgetPod is expecting the assumed pod as an argument (because that's what's in the scheduler cache), it fails with an error like scheduler cache ForgetPod failed: pod test-677550-rc-edit-namespace/nginx-jvn09 state was assumed on a different node
  4. The pod gets lost forever because of some incomplete error handling (which I haven't addressed here in the interest of making a simpler PR)

In this PR I've fixed the call to bind and modified the tests to make sure that ForgetPod gets called with the correct argument (the assumed pod) when binding fails.

Which issue this PR fixes: fixes #49314

Special notes for your reviewer:

Release note:

Fix bug in scheduler that caused initially unschedulable pods to stuck in Pending state forever.
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2017

Hi @julia-stripe. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cblecker

This comment has been minimized.

Copy link
Member

commented Aug 2, 2017

/ok-to-test

@julia-stripe

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2017

/retest

@dchen1107

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

From the original issue description, it is a serial regression for 1.7 release, and we should patch it. I marked this pr for cherrypick-candidate.

@julia-stripe Looks like there are two prs (this one and #49661) addressing the same issue. I didn't see much difference from both prs. Could you please close one?

@dchen1107

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

The pr looks good to me based on the problem described, and I prefer someone from @kubernetes/sig-scheduling-pr-reviews to take a look too. @bsalamat

@k82cn

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

/assign

if err != nil {
return
}

// bind the pod to its host asynchronously (we can do this b/c of the assumption step above).
go func() {
err := sched.bind(pod, &v1.Binding{
ObjectMeta: metav1.ObjectMeta{Namespace: pod.Namespace, Name: pod.Name, UID: pod.UID},
err := sched.bind(&assumedPod, &v1.Binding{

This comment has been minimized.

Copy link
@bsalamat

bsalamat Aug 3, 2017

Member

Isn't this the same as before.

This comment has been minimized.

Copy link
@k82cn

k82cn Aug 3, 2017

Member

we used pod directly before the fix; and it was assumedPod before the refactor.

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 3, 2017

Member

It's not the same - assume() method above is modifying the given pod.

This comment has been minimized.

Copy link
@bsalamat

bsalamat Aug 3, 2017

Member

I meant this piece of code is logically the same in this PR.

This comment has been minimized.

Copy link
@bsalamat

bsalamat Aug 3, 2017

Member

Is sched.assume() going to modify assumedPod?

This comment has been minimized.

Copy link
@wojtek-t

wojtek-t Aug 3, 2017

Member

@bsalamat I don't understand this question. Yes it is modifying. Yes it is what we are trying to achieve.

This comment has been minimized.

Copy link
@bsalamat

bsalamat Aug 3, 2017

Member

Thanks, @wojtek-t! I don't know why I missed it. I guess github UI still confuses me.

@@ -264,15 +263,16 @@ func (sched *Scheduler) scheduleOne() {

// Tell the cache to assume that a pod now is running on a given node, even though it hasn't been bound yet.
// This allows us to keep scheduling without waiting on binding to occur.
err = sched.assume(pod, suggestedHost)
assumedPod := *pod

This comment has been minimized.

Copy link
@wanghaoran1988

wanghaoran1988 Aug 3, 2017

Member

Why you make a copy here? seems you moved the copy from inside the assume func here.

This comment has been minimized.

Copy link
@k82cn

k82cn Aug 3, 2017

Member

we should not use obj from cache directly.

This comment has been minimized.

Copy link
@julia-stripe

julia-stripe Aug 3, 2017

Author Contributor

Basically we need to make a copy somewhere and @k82cn suggested it would be more clear if we made the copy outside of the assume function. (which I agree with)

@wanghaoran1988

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

@k8s-ci-robot k8s-ci-robot requested a review from aveshagarwal Aug 3, 2017

@davidopp

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

@davidopp davidopp assigned davidopp and unassigned thockin Aug 3, 2017

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

Thanks a lot for this PR. I added some minor nits but this overall looks good to me.

@k82cn

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

@wojtek-t , can you also help to review #49661 ? I think FinishBinding & ForgetPod is also important to 'cherrypick'.

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 3, 2017

@julia-stripe

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2017

/retest

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

@julia-stripe - please squash commits and I will approve.

@julia-stripe julia-stripe force-pushed the julia-stripe:fix-incorrect-scheduler-bind-call branch from b65bb0a to d584bf4 Aug 3, 2017

@julia-stripe

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2017

done!

@k8s-github-robot k8s-github-robot removed the lgtm label Aug 3, 2017

@julia-stripe

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2017

/retest

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

/lgtm

/retest

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, julia-stripe, wojtek-t

Associated issue: 49314

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

commented Aug 4, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

commented Aug 4, 2017

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 898b1b3 into kubernetes:master Aug 4, 2017

9 of 10 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce-etcd3
Details
cla/linuxfoundation julia-stripe authorized
Details
pull-kubernetes-bazel Job succeeded.
Details
pull-kubernetes-e2e-gce-etcd3 Jenkins job succeeded.
Details
pull-kubernetes-e2e-kops-aws Jenkins job succeeded.
Details
pull-kubernetes-federation-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Jenkins job succeeded.
Details
pull-kubernetes-node-e2e Jenkins job succeeded.
Details
pull-kubernetes-unit Jenkins job succeeded.
Details
pull-kubernetes-verify Jenkins job succeeded.
Details
@bsalamat

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

Thanks a lot, @julia-stripe for debugging this issue and the fix!

@davidopp

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

+1

@julia-stripe julia-stripe deleted the julia-stripe:fix-incorrect-scheduler-bind-call branch Aug 4, 2017

@davidopp

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

BTW I assume we should cherrypick this into 1.7, right?

@davidopp

This comment has been minimized.

Copy link
Member

commented Aug 4, 2017

Ah, @dchen1107 had already marked this as cherypick-candidate.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 7, 2017

Yes - this will be cherrypicked to 1.7. I will take care of it.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 7, 2017

Cherrypick in #50240

k8s-github-robot pushed a commit that referenced this pull request Aug 7, 2017

Kubernetes Submit Queue
Merge pull request #50240 from wojtek-t/automated-cherry-pick-of-#50028-
#50106-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #50028 #50106 upstream release 1.7

Cherry pick of #50028 and #50106 on release-1.7.

#50028: Fix incorrect call to 'bind' in scheduler
#50106: Retry scheduling pods after errors more consistently in scheduler
@k8s-cherrypick-bot

This comment has been minimized.

Copy link

commented Aug 7, 2017

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.