Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preempting: do not delete the victim if it just exits in WaitingPods #100325

Merged
merged 1 commit into from Apr 9, 2021

Conversation

cwdsuzhou
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

Do not delete the victim if it just exits in WaitingPods.

We do deleting pods to free resources. But if a pod exists in WaitingPods, it actually does not scheduled. So we just need reject it to free the resources from NodeInfo.

Which issue(s) this PR fixes:

Fixes #100235

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 17, 2021
@k8s-ci-robot
Copy link
Contributor

@cwdsuzhou: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 17, 2021
@cwdsuzhou
Copy link
Member Author

/assign @Huang-Wei

@cwdsuzhou
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Mar 17, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 17, 2021
} else {
if err := util.DeletePod(cs, victim); err != nil {
klog.ErrorS(err, "preempting pod", "pod", klog.KObj(victim))
return framework.AsStatus(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preempting ->Preempting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@cwdsuzhou
Copy link
Member Author

/retest

1 similar comment
@cwdsuzhou
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 17, 2021
@cwdsuzhou
Copy link
Member Author

/retest

@alculquicondor
Copy link
Member

/assign @Huang-Wei

@alculquicondor
Copy link
Member

Logic makes sense to me, but leaving review to Wei.

@cwdsuzhou
Copy link
Member Author

/retest

1 similar comment
@cwdsuzhou
Copy link
Member Author

/retest

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cwdsuzhou . Some comments below.

Comment on lines 701 to 702
} else {
if err := util.DeletePod(cs, victim); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
} else {
if err := util.DeletePod(cs, victim); err != nil {
} else if err := util.DeletePod(cs, victim); err != nil {

} else {
if err := util.DeletePod(cs, victim); err != nil {
klog.ErrorS(err, "preempting pod", "pod", klog.KObj(victim))
return framework.AsStatus(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}); err != nil {
t.Error("Expected the waiting pod to get preempted and deleted")
t.Error("Expected the waiting pod to get preempted")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a step to call API server to verify the waited Pod is not deleted physically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there another test for non-permit case were we do verify that the pod is deleted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add another low-priority regular Pod and let it be running, so that the preemptor needs to preempt two pods, then we can verify one is removed from waitingMap, and the other is deleted physically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added

Comment on lines +1887 to +1905
w := false
permitPlugin.fh.IterateOverWaitingPods(func(wp framework.WaitingPod) { w = true })
return !w, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Probably a bit picky) Can we change the timeout of TestPermitPlugin from 10 seconds to 30 seconds; otherwise if the preemption logic lasts more than 10 seconds, we cannot tell if the waitingPod is rejected by preemption logic or the timer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I checked the UT...

@cwdsuzhou
Copy link
Member Author

PR addressed

@cwdsuzhou
Copy link
Member Author

/retest

@Huang-Wei
Copy link
Member

PR addressed

And this : #100325 (comment) :)

@cwdsuzhou
Copy link
Member Author

PR addressed

And this : #100325 (comment) :)

Have added a check

if waitingPod := fh.GetWaitingPod(victim.UID); waitingPod != nil {
waitingPod.Reject(pluginName, "preempted")
} else if err := util.DeletePod(cs, victim); err != nil {
klog.ErrorS(err, "Preempting pod", "pod", klog.KObj(victim))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add the preemptor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

}); err != nil {
t.Error("Expected the waiting pod to get preempted and deleted")
t.Error("Expected the waiting pod to get preempted")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there another test for non-permit case were we do verify that the pod is deleted?

Comment on lines 1889 to 1891
if _, err := getPod(testCtx.ClientSet, waitingPod.Name, waitingPod.Namespace); err != nil {
return false, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to move this to L1896 as we don't need to run the API check inside this loop - only check it once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 21, 2021
@cwdsuzhou cwdsuzhou force-pushed the donot_delete_waitingpod branch 5 times, most recently from 4f52cd6 to f780d03 Compare March 21, 2021 08:54
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. LGTM otherwise.

Comment on lines 1873 to 1882
wait.Poll(100*time.Millisecond, 30*time.Second, func() (bool, error) {
pod, err := getPod(testCtx.ClientSet, runningPod.Name, runningPod.Namespace)
if err != nil {
return false, nil
}
if len(pod.Spec.NodeName) != 0 {
return true, nil
}
return false, nil
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
wait.Poll(100*time.Millisecond, 30*time.Second, func() (bool, error) {
pod, err := getPod(testCtx.ClientSet, runningPod.Name, runningPod.Namespace)
if err != nil {
return false, nil
}
if len(pod.Spec.NodeName) != 0 {
return true, nil
}
return false, nil
})
wait.Poll(100*time.Millisecond, 30*time.Second, podScheduled(testCtx.ClientSet, runningPod.Name, runningPod.Namespace))

t.Error("Expected the waiting pod to get preempted and deleted")
t.Error("Expected the waiting pod to get preempted")
}
// check waitingPod not deleted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check waitingPod not deleted
// Expect the waitingPod to be still present.

if _, err := getPod(testCtx.ClientSet, waitingPod.Name, waitingPod.Namespace); err != nil {
t.Error("Get waiting pod in waiting pod failed.")
}
// check runningPod deleted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check runningPod deleted
// Expect the runningPod to be deleted physically.

@cwdsuzhou
Copy link
Member Author

Done, thanks

@Huang-Wei
Copy link
Member

Thanks @cwdsuzhou !

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cwdsuzhou, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4b94216 into kubernetes:master Apr 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 9, 2021
@cwdsuzhou cwdsuzhou deleted the donot_delete_waitingpod branch April 15, 2021 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not delete pod if pod is in waiting pods but not scheduled.
5 participants