Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect call to 'bind' in scheduler #50028

Conversation

julia-stripe
Copy link
Contributor

@julia-stripe julia-stripe commented Aug 2, 2017

I previously submitted #49661 -- I'm not sure if that PR is too big or what, but this is an attempt at a smaller PR that makes progress on the same issue and is easier to review.

What this PR does / why we need it:

In this refactor (ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19R223) the scheduler code was refactored into separate bind and assume functions. When that happened, bind was called with pod as an argument. The argument to bind should be the assumed pod, not the original pod. Evidence that assumedPod is the correct argument bind and not pod:

if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil {
glog.Errorf("scheduler cache FinishBinding failed: %v", err)
}
if err != nil {
glog.V(1).Infof("Failed to bind pod: %v/%v", pod.Namespace, pod.Name)
if err := sched.config.SchedulerCache.ForgetPod(&assumed); err != nil {
. (and it says assumed in the function signature for bind, even though it's not called with the assumed pod as an argument).

This is an issue (and causes #49314, where pods that fail to bind to a node get stuck indefinitely) in the following scenario:

  1. The pod fails to bind to the node
  2. bind calls ForgetPod with the pod argument
  3. since ForgetPod is expecting the assumed pod as an argument (because that's what's in the scheduler cache), it fails with an error like scheduler cache ForgetPod failed: pod test-677550-rc-edit-namespace/nginx-jvn09 state was assumed on a different node
  4. The pod gets lost forever because of some incomplete error handling (which I haven't addressed here in the interest of making a simpler PR)

In this PR I've fixed the call to bind and modified the tests to make sure that ForgetPod gets called with the correct argument (the assumed pod) when binding fails.

Which issue this PR fixes: fixes #49314

Special notes for your reviewer:

Release note:

Fix bug in scheduler that caused initially unschedulable pods to stuck in Pending state forever.

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 2, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @julia-stripe. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2017
@cblecker
Copy link
Member

cblecker commented Aug 2, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2017
@julia-stripe
Copy link
Contributor Author

/retest

@dchen1107
Copy link
Member

From the original issue description, it is a serial regression for 1.7 release, and we should patch it. I marked this pr for cherrypick-candidate.

@julia-stripe Looks like there are two prs (this one and #49661) addressing the same issue. I didn't see much difference from both prs. Could you please close one?

@dchen1107 dchen1107 added cherrypick-candidate release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 3, 2017
@dchen1107 dchen1107 added this to the v1.7 milestone Aug 3, 2017
@dchen1107
Copy link
Member

The pr looks good to me based on the problem described, and I prefer someone from @kubernetes/sig-scheduling-pr-reviews to take a look too. @bsalamat

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Aug 3, 2017
@k82cn
Copy link
Member

k82cn commented Aug 3, 2017

/assign

if err != nil {
return
}

// bind the pod to its host asynchronously (we can do this b/c of the assumption step above).
go func() {
err := sched.bind(pod, &v1.Binding{
ObjectMeta: metav1.ObjectMeta{Namespace: pod.Namespace, Name: pod.Name, UID: pod.UID},
err := sched.bind(&assumedPod, &v1.Binding{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we used pod directly before the fix; and it was assumedPod before the refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same - assume() method above is modifying the given pod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this piece of code is logically the same in this PR.

Copy link
Member

@bsalamat bsalamat Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sched.assume() going to modify assumedPod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat I don't understand this question. Yes it is modifying. Yes it is what we are trying to achieve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @wojtek-t! I don't know why I missed it. I guess github UI still confuses me.

@@ -264,15 +263,16 @@ func (sched *Scheduler) scheduleOne() {

// Tell the cache to assume that a pod now is running on a given node, even though it hasn't been bound yet.
// This allows us to keep scheduling without waiting on binding to occur.
err = sched.assume(pod, suggestedHost)
assumedPod := *pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you make a copy here? seems you moved the copy from inside the assume func here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use obj from cache directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we need to make a copy somewhere and @k82cn suggested it would be more clear if we made the copy outside of the assume function. (which I agree with)

@wanghaoran1988
Copy link
Contributor

/cc @aveshagarwal

@davidopp
Copy link
Member

davidopp commented Aug 3, 2017

@wojtek-t

@davidopp davidopp assigned davidopp and unassigned thockin Aug 3, 2017
@wojtek-t
Copy link
Member

wojtek-t commented Aug 3, 2017

Thanks a lot for this PR. I added some minor nits but this overall looks good to me.

@k82cn
Copy link
Member

k82cn commented Aug 3, 2017

@wojtek-t , can you also help to review #49661 ? I think FinishBinding & ForgetPod is also important to 'cherrypick'.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017
@julia-stripe
Copy link
Contributor Author

/retest

@wojtek-t
Copy link
Member

wojtek-t commented Aug 3, 2017

@julia-stripe - please squash commits and I will approve.

@julia-stripe julia-stripe force-pushed the fix-incorrect-scheduler-bind-call branch from b65bb0a to d584bf4 Compare August 3, 2017 20:55
@julia-stripe
Copy link
Contributor Author

done!

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017
@julia-stripe
Copy link
Contributor Author

/retest

@wojtek-t
Copy link
Member

wojtek-t commented Aug 4, 2017

/lgtm

/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 4, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, julia-stripe, wojtek-t

Associated issue: 49314

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 898b1b3 into kubernetes:master Aug 4, 2017
@bsalamat
Copy link
Member

bsalamat commented Aug 4, 2017

Thanks a lot, @julia-stripe for debugging this issue and the fix!

@davidopp
Copy link
Member

davidopp commented Aug 4, 2017

+1

@julia-stripe julia-stripe deleted the fix-incorrect-scheduler-bind-call branch August 4, 2017 18:52
@davidopp
Copy link
Member

davidopp commented Aug 4, 2017

BTW I assume we should cherrypick this into 1.7, right?

@davidopp
Copy link
Member

davidopp commented Aug 4, 2017

Ah, @dchen1107 had already marked this as cherypick-candidate.

@wojtek-t
Copy link
Member

wojtek-t commented Aug 7, 2017

Yes - this will be cherrypicked to 1.7. I will take care of it.

@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 7, 2017
@wojtek-t
Copy link
Member

wojtek-t commented Aug 7, 2017

Cherrypick in #50240

k8s-github-robot pushed a commit that referenced this pull request Aug 7, 2017
#50106-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #50028 #50106 upstream release 1.7

Cherry pick of #50028 and #50106 on release-1.7.

#50028: Fix incorrect call to 'bind' in scheduler
#50106: Retry scheduling pods after errors more consistently in scheduler
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod that failed to bind, stuck in Pending state forever