New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preempting: do not delete the victim if it just exits in WaitingPods #100325
Preempting: do not delete the victim if it just exits in WaitingPods #100325
Conversation
@cwdsuzhou: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @Huang-Wei |
/sig scheduling |
} else { | ||
if err := util.DeletePod(cs, victim); err != nil { | ||
klog.ErrorS(err, "preempting pod", "pod", klog.KObj(victim)) | ||
return framework.AsStatus(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preempting ->Preempting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
/retest |
1 similar comment
/retest |
629c046
to
4a8852a
Compare
4a8852a
to
19295ca
Compare
/retest |
/assign @Huang-Wei |
Logic makes sense to me, but leaving review to Wei. |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cwdsuzhou . Some comments below.
} else { | ||
if err := util.DeletePod(cs, victim); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
} else { | |
if err := util.DeletePod(cs, victim); err != nil { | |
} else if err := util.DeletePod(cs, victim); err != nil { |
} else { | ||
if err := util.DeletePod(cs, victim); err != nil { | ||
klog.ErrorS(err, "preempting pod", "pod", klog.KObj(victim)) | ||
return framework.AsStatus(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
}); err != nil { | ||
t.Error("Expected the waiting pod to get preempted and deleted") | ||
t.Error("Expected the waiting pod to get preempted") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a step to call API server to verify the waited Pod is not deleted physically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another test for non-permit case were we do verify that the pod is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add another low-priority regular Pod and let it be running, so that the preemptor needs to preempt two pods, then we can verify one is removed from waitingMap, and the other is deleted physically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test added
w := false | ||
permitPlugin.fh.IterateOverWaitingPods(func(wp framework.WaitingPod) { w = true }) | ||
return !w, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Probably a bit picky) Can we change the timeout of TestPermitPlugin
from 10 seconds to 30 seconds; otherwise if the preemption logic lasts more than 10 seconds, we cannot tell if the waitingPod is rejected by preemption logic or the timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I checked the UT...
19295ca
to
68fc0f6
Compare
PR addressed |
/retest |
And this : #100325 (comment) :) |
Have added a check |
if waitingPod := fh.GetWaitingPod(victim.UID); waitingPod != nil { | ||
waitingPod.Reject(pluginName, "preempted") | ||
} else if err := util.DeletePod(cs, victim); err != nil { | ||
klog.ErrorS(err, "Preempting pod", "pod", klog.KObj(victim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add the preemptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
}); err != nil { | ||
t.Error("Expected the waiting pod to get preempted and deleted") | ||
t.Error("Expected the waiting pod to get preempted") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another test for non-permit case were we do verify that the pod is deleted?
if _, err := getPod(testCtx.ClientSet, waitingPod.Name, waitingPod.Namespace); err != nil { | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to move this to L1896 as we don't need to run the API check inside this loop - only check it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
68fc0f6
to
9a05872
Compare
4f52cd6
to
f780d03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits. LGTM otherwise.
wait.Poll(100*time.Millisecond, 30*time.Second, func() (bool, error) { | ||
pod, err := getPod(testCtx.ClientSet, runningPod.Name, runningPod.Namespace) | ||
if err != nil { | ||
return false, nil | ||
} | ||
if len(pod.Spec.NodeName) != 0 { | ||
return true, nil | ||
} | ||
return false, nil | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
wait.Poll(100*time.Millisecond, 30*time.Second, func() (bool, error) { | |
pod, err := getPod(testCtx.ClientSet, runningPod.Name, runningPod.Namespace) | |
if err != nil { | |
return false, nil | |
} | |
if len(pod.Spec.NodeName) != 0 { | |
return true, nil | |
} | |
return false, nil | |
}) | |
wait.Poll(100*time.Millisecond, 30*time.Second, podScheduled(testCtx.ClientSet, runningPod.Name, runningPod.Namespace)) |
t.Error("Expected the waiting pod to get preempted and deleted") | ||
t.Error("Expected the waiting pod to get preempted") | ||
} | ||
// check waitingPod not deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check waitingPod not deleted | |
// Expect the waitingPod to be still present. |
if _, err := getPod(testCtx.ClientSet, waitingPod.Name, waitingPod.Namespace); err != nil { | ||
t.Error("Get waiting pod in waiting pod failed.") | ||
} | ||
// check runningPod deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check runningPod deleted | |
// Expect the runningPod to be deleted physically. |
f780d03
to
850759e
Compare
Done, thanks |
Thanks @cwdsuzhou ! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cwdsuzhou, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Do not delete the victim if it just exits in WaitingPods.
We do
deleting pods
to free resources. But if a pod exists in WaitingPods, it actually does not scheduled. So we just need reject it to free the resources fromNodeInfo
.Which issue(s) this PR fixes:
Fixes #100235
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: