Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: check "ok" first to avoid panic #44152

Merged
merged 1 commit into from
Apr 11, 2017

Conversation

NickrenREN
Copy link
Contributor

@NickrenREN NickrenREN commented Apr 6, 2017

Check "ok" and then check if "currState.pod.Spec.NodeName != pod.Spec.NodeName", here if currState is nil, it will panic.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 6, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 6, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@@ -219,9 +219,6 @@ func (sched *Scheduler) scheduleOne() {
// If binding succeeded then PodScheduled condition will be updated in apiserver so that
// it's atomic with setting host.
err := sched.config.Binder.Bind(b)
if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil {
glog.Errorf("scheduler cache FinishBinding failed: %v", err)
}
if err != nil {
glog.V(1).Infof("Failed to bind pod: %v/%v", pod.Namespace, pod.Name)
if err := sched.config.SchedulerCache.ForgetPod(&assumed); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take some time to find a case to trigger the following case:
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/schedulercache/cache.go#L164

If it's triggered, the assumed pod will not be deleted after this PR.

Copy link
Contributor Author

@NickrenREN NickrenREN Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. But i doubt if we should delete the assumed pod if the case triggered as you said.

@NickrenREN
Copy link
Contributor Author

NickrenREN commented Apr 7, 2017

Another bug. we should check "ok" and then check if "currState.pod.Spec.NodeName != pod.Spec.NodeName", here if currState is nil, it will panic.
Modified . Correct me if my understanding is wrong.
ping @k82cn @bsalamat

@@ -236,6 +233,9 @@ func (sched *Scheduler) scheduleOne() {
})
return
}
if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we shouldn't call FinishBinding when Bind fails. FinishBinding marks the fact that binding has finished in the state of the pod. This is used later to clean up the pod. I think we should clean up these pods even if their binding has failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explaining. But if currState.pod.Spec.NodeName == pod.Spec.NodeName, then the pod will be deleted in ForgetPod() and I want to know on what condition the case "currState.pod.Spec.NodeName != pod.Spec.NodeName" will be triggered.
Anyway, i moved the FinishBinding(...) back

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay :). We should delete the assumedPod if binding failed; the assumed pod was added into cache and account into "PodFitResource" predicates, there'll be resources "lost" if failed binding pod not deleted.

@NickrenREN NickrenREN changed the title Put FinishBinding() after the err check Fix: check "ok" first to avoid panic Apr 8, 2017
@@ -161,7 +161,7 @@ func (cache *schedulerCache) ForgetPod(pod *v1.Pod) error {
defer cache.mu.Unlock()

currState, ok := cache.podStates[key]
if currState.pod.Spec.NodeName != pod.Spec.NodeName {
if ok && currState.pod.Spec.NodeName != pod.Spec.NodeName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this code improvement, although did not get time for this case/logic :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@k82cn
Copy link
Member

k82cn commented Apr 9, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2017
@bsalamat
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: NickrenREN, bsalamat, k82cn
We suggest the following additional approver: @davidopp

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@NickrenREN
Copy link
Contributor Author

Thanks @k82cn @bsalamat
and kindly ping @timothysc @davidopp for approval

@timothysc timothysc added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 11, 2017
@timothysc timothysc self-assigned this Apr 11, 2017
@timothysc
Copy link
Member

lgtm, green lighting.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 43900, 44152, 44324)

@k8s-github-robot k8s-github-robot merged commit 3c46109 into kubernetes:master Apr 11, 2017
@NickrenREN NickrenREN deleted the scheduler-bind branch April 12, 2017 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants