-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: check "ok" first to avoid panic #44152
Conversation
plugin/pkg/scheduler/scheduler.go
Outdated
@@ -219,9 +219,6 @@ func (sched *Scheduler) scheduleOne() { | |||
// If binding succeeded then PodScheduled condition will be updated in apiserver so that | |||
// it's atomic with setting host. | |||
err := sched.config.Binder.Bind(b) | |||
if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil { | |||
glog.Errorf("scheduler cache FinishBinding failed: %v", err) | |||
} | |||
if err != nil { | |||
glog.V(1).Infof("Failed to bind pod: %v/%v", pod.Namespace, pod.Name) | |||
if err := sched.config.SchedulerCache.ForgetPod(&assumed); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take some time to find a case to trigger the following case:
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/schedulercache/cache.go#L164
If it's triggered, the assumed pod will not be deleted after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. But i doubt if we should delete the assumed pod if the case triggered as you said.
53a8e25
to
976aec3
Compare
plugin/pkg/scheduler/scheduler.go
Outdated
@@ -236,6 +233,9 @@ func (sched *Scheduler) scheduleOne() { | |||
}) | |||
return | |||
} | |||
if err := sched.config.SchedulerCache.FinishBinding(&assumed); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we shouldn't call FinishBinding when Bind fails. FinishBinding marks the fact that binding has finished in the state of the pod. This is used later to clean up the pod. I think we should clean up these pods even if their binding has failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for explaining. But if currState.pod.Spec.NodeName == pod.Spec.NodeName, then the pod will be deleted in ForgetPod() and I want to know on what condition the case "currState.pod.Spec.NodeName != pod.Spec.NodeName" will be triggered.
Anyway, i moved the FinishBinding(...) back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay :). We should delete the assumedPod if binding failed; the assumed pod was added into cache and account into "PodFitResource" predicates, there'll be resources "lost" if failed binding pod not deleted.
976aec3
to
26482e3
Compare
@@ -161,7 +161,7 @@ func (cache *schedulerCache) ForgetPod(pod *v1.Pod) error { | |||
defer cache.mu.Unlock() | |||
|
|||
currState, ok := cache.podStates[key] | |||
if currState.pod.Spec.NodeName != pod.Spec.NodeName { | |||
if ok && currState.pod.Spec.NodeName != pod.Spec.NodeName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this code improvement, although did not get time for this case/logic :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: NickrenREN, bsalamat, k82cn
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Thanks @k82cn @bsalamat |
lgtm, green lighting. |
Automatic merge from submit-queue (batch tested with PRs 43900, 44152, 44324) |
Check "ok" and then check if "currState.pod.Spec.NodeName != pod.Spec.NodeName", here if currState is nil, it will panic.
Release note: