Enforce timeout for podsReady #498

mimowo · 2022-12-23T09:42:17Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of: #349

Special notes for your reviewer:

k8s-ci-robot · 2022-12-23T09:42:19Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

kerthcet · 2022-12-26T02:44:07Z

/ok-to-test
In case you want to test it.

apis/config/v1alpha2/configuration_types.go

apis/config/v1alpha2/defaults.go

main_test.go

pkg/controller/core/core.go

test/integration/scheduler/podsready/suite_test.go

test/integration/scheduler/podsready/scheduler_test.go

mimowo · 2023-01-16T08:03:24Z

@ahg-g @alculquicondor please do another pass on the PR, all the comments are addressed for now.

alculquicondor · 2023-01-16T16:05:38Z

main_test.go

 waitForPodsReady:
  enable: true
-webhook:
-  port: 9443
+`), os.FileMode(0600)); err != nil {


nit: I think testing every API field in this package is an overkill. A test in apis/config/v1alpha2 should be enough.

The test in this file is to make sure the defaulting pipeline is properly enabled in main.go, but it shouldn't get into the details of all possible defaults.

I see, it makes sense. Now I deleted one test here on waitForPodsReady and covered the defaulting logic in defaults_test.go.

alculquicondor · 2023-01-16T16:14:26Z

pkg/controller/core/workload_controller.go

+		return ctrl.Result{RequeueAfter: recheckAfter}, nil
+	} else {
+		klog.V(2).InfoS("Cancelling admission of the workload due to exceeding the PodsReady timeout", "workload", req.NamespacedName.String())
+		wl.Spec.Admission = nil


Don't we need to clone first? I can't remember if the controller-runtime client does a copy.

Although I have a different proposal for how to patch admission that doesn't require cloning https://github.com/kubernetes-sigs/kueue/pull/514/files#diff-a7490e7f5efa2a59ab508f592b67dd6e7530b9d2e1b19a1b6a7438e6c2386905

WDYT?

I think that updating the workload in-place without a copy was working ok, because a new instance is created at the beginning of Reconcile and it is not used after cancelling admission.

However, I agree it is better to avoid in-place modification and I like the approach you suggested so I've applied it here too.

test/integration/scheduler/podsready/scheduler_test.go

alculquicondor · 2023-01-16T16:26:01Z

test/integration/scheduler/podsready/scheduler_test.go

+	var _ = ginkgo.Context("Short PodsReady timeout", func() {
+
+		ginkgo.BeforeEach(func() {
+			beforeEachWithTimeout(3 * time.Second)


maybe 5s to be on the safe side.

Sometimes the bots can be very slow to schedule.

I was thinking about stability so tested it locally with 50 repeats and 10 repeats under stress (N being equal to my number of cores), and also tested under 1s locally, and 10 passed again: #498 (comment).

However, I agree that there is a chance it could flake on infra. Still it seems easier to increase it in the future if it turns out flaky, rather than ever decreasing (thus paying unnecessary 2s on every build).

test/integration/scheduler/podsready/scheduler_test.go

alculquicondor · 2023-01-16T16:35:31Z

test/integration/scheduler/podsready/scheduler_test.go

+				return prodWl1.Spec.Admission
+			}, util.Timeout, util.Interval).Should(gomega.BeNil())
+
+			ginkgo.By("verify the 'prod2' workload gets admitted and the 'prod1' is waiting")


Could this be flaky?
Because prod1 would be added back to the queue and it has an old StartTime. Maybe there should be a grace period in which prod1 doesn't enter the queue. Or we could delete the Workload object so that the Job controller recreates it with a newer StartTime. This would be good because we know this workload couldn't schedule, so we should minimize the chances of it clogging the queue again.

However, this behavior might make it harder to reset the node selector. WDYT?

The test isn't flaky because it first asserts that the other workload is waiting (util.ExpectWorkloadsToBeWaiting(ctx, k8sClient, prodWl2). The waiting workload is admitted first as it completes the previous scheduler cycle.

As for the clogging of the queue, deleting a workload sounds unnecessary and does not seem to reflect the intention, IMO.

I guess another relatively simple solution would be to adjust the queues logic to use max of CreationTimestamp and LastTransitionTime for Admitted=False condition. Then, the re-admitted workloads would be last again. Let me know what you think, we could also defer it to a follow up PR.

That could work, and I'm fine having it in a follow up PR.

Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?

Ah right, there is a wait inside the scheduling cycle. Good.

For this test, the two workloads have different priorities, so the timestamps don't have an effect on ordering in the queue because we first order by priority.

Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?

I am not sure. I think preemption and exceeding ready timeout should probably be treated differently. The workload is not at "fault" in the former, but it is in the latter, and so it seems unfair to "punish" workloads in the preemption case.

Another two criteria we can incorporate in the order:

Job size for the preemption case, smaller preempted jobs gets preference over larger ones.

Backoffs for the not ready case

test/integration/scheduler/podsready/scheduler_test.go

pkg/controller/core/workload_controller.go

ahg-g · 2023-01-13T00:41:22Z

pkg/controller/core/workload_controller.go

+		cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)
+		if cqOk {


Suggested change

cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)

if cqOk {

if cqName, cqOk := r.queues.ClusterQueueForWorkload(wl); cqOk {

This comment appears to be outdated. I now adjust the QueueAssociatedInadmissibleWorkloadsAfter and DeleteWorkload methods to deal with workloads with Spec.Admission=nil, as suggested by @alculquicondor here: #498 (comment)

pkg/queue/manager.go

pkg/controller/core/workload_controller.go

test/integration/scheduler/podsready/suite_test.go

alculquicondor · 2023-01-17T15:50:30Z

Generally LGTM, but I'll leave the last pass to @ahg-g

ahg-g

two nits

pkg/controller/core/workload_controller.go

ahg-g · 2023-01-17T16:31:37Z

test/integration/scheduler/podsready/scheduler_test.go

+			gomega.Eventually(func() bool {
+				gomega.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl1), prodWl1)).Should(gomega.Succeed())
+				return apimeta.IsStatusConditionTrue(prodWl1.Status.Conditions, kueue.WorkloadAdmitted)
+			}, util.Timeout, util.Interval).Should(gomega.BeTrue())


is it possible that prod1 admission get cancelled by the time we do this test?

for example the test process gets evicted and the 3 seconds passes before we get a chance to do this check

Unlikely, because adding the condition is the next thing workload_controller would do in the Reconcile function after admission!=nil. Also, parking the other workload as waiting is the next thing the scheduler would do.

Still, it is possible if the testing infra is under load, but hasn't happened in my testing, I guess I could bump the timeout to 4s or 5s proactively, but maybe we bump once there is some evidence this is needed. Just now I tested it in a loop 30 times with stress --cpu ${number of my cores} and no flakes.

I would add a comment here then: "We assume that the test will get to this check before the timeout expires and the kueue cancels the admission. Mentioning this in case this test flakes in the future."

Still, it is possible if the testing infra is under load, but hasn't happened in my testing,

Did you test with a smaller timeout?

Pushed adding the comment.

Did you test with a smaller timeout?

Just now did 15 passes with the timeout=1s and all passed, all with the same stress. Got failing 1 out of 2 with 200ms.

ahg-g · 2023-01-17T17:26:18Z

/lgtm
/approve

k8s-ci-robot · 2023-01-17T17:26:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 23, 2022

k8s-ci-robot requested review from denkensk and kerthcet December 23, 2022 09:42

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 23, 2022

mimowo changed the title ~~WIF: Implementation of enforcing timeout for podsReady [placeholder PR]~~ WIP: Enforce timeout for podsReady [placeholder PR] Dec 23, 2022

mimowo changed the title ~~WIP: Enforce timeout for podsReady [placeholder PR]~~ WIP: Enforce timeout for podsReady Dec 23, 2022

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 26, 2022

mimowo force-pushed the pods-ready-timeout branch from abd55da to 2bd729c Compare January 2, 2023 15:47

mimowo marked this pull request as ready for review January 2, 2023 15:49

k8s-ci-robot requested a review from ahg-g January 2, 2023 15:49

mimowo force-pushed the pods-ready-timeout branch 12 times, most recently from 9f1fb33 to 54dca06 Compare January 3, 2023 14:17

alculquicondor reviewed Jan 3, 2023

View reviewed changes

mimowo force-pushed the pods-ready-timeout branch 3 times, most recently from d3027d1 to 6c0c0d4 Compare January 4, 2023 08:51

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2023

mimowo force-pushed the pods-ready-timeout branch 2 times, most recently from 1e474b2 to 54042c1 Compare January 16, 2023 07:55

alculquicondor reviewed Jan 16, 2023

View reviewed changes

ahg-g reviewed Jan 16, 2023

View reviewed changes

mimowo force-pushed the pods-ready-timeout branch 8 times, most recently from 40ac862 to 2f66414 Compare January 17, 2023 09:40

alculquicondor reviewed Jan 17, 2023

View reviewed changes

test/integration/scheduler/podsready/suite_test.go Outdated Show resolved Hide resolved

ahg-g reviewed Jan 17, 2023

View reviewed changes

pkg/controller/core/workload_controller.go Show resolved Hide resolved

pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved

ahg-g reviewed Jan 17, 2023

View reviewed changes

mimowo force-pushed the pods-ready-timeout branch 3 times, most recently from 26e3a55 to d1183e9 Compare January 17, 2023 16:40

Enforce PodsReady timeout

9d9336b

mimowo force-pushed the pods-ready-timeout branch from d1183e9 to 9d9336b Compare January 17, 2023 17:16

k8s-ci-robot assigned ahg-g Jan 17, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2023

k8s-ci-robot merged commit 406cc73 into kubernetes-sigs:main Jan 17, 2023

mimowo mentioned this pull request Jan 18, 2023

An option to unsuspend jobs only after previously unsuspended ones are assigned nodes #349

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce timeout for podsReady #498

Enforce timeout for podsReady #498

mimowo commented Dec 23, 2022 •

edited

Loading

k8s-ci-robot commented Dec 23, 2022

kerthcet commented Dec 26, 2022

mimowo commented Jan 16, 2023

alculquicondor Jan 16, 2023

mimowo Jan 17, 2023 •

edited

Loading

alculquicondor Jan 16, 2023

mimowo Jan 17, 2023

alculquicondor Jan 16, 2023

mimowo Jan 17, 2023

alculquicondor Jan 16, 2023

mimowo Jan 17, 2023

alculquicondor Jan 17, 2023

alculquicondor Jan 17, 2023

ahg-g Jan 17, 2023 •

edited

Loading

ahg-g Jan 13, 2023

mimowo Jan 17, 2023

alculquicondor commented Jan 17, 2023

ahg-g left a comment

ahg-g Jan 17, 2023

ahg-g Jan 17, 2023

mimowo Jan 17, 2023

ahg-g Jan 17, 2023

mimowo Jan 17, 2023

ahg-g commented Jan 17, 2023

k8s-ci-robot commented Jan 17, 2023

		cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)
		if cqOk {

	cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)
	if cqOk {
	if cqName, cqOk := r.queues.ClusterQueueForWorkload(wl); cqOk {

Enforce timeout for podsReady #498

Enforce timeout for podsReady #498

Conversation

mimowo commented Dec 23, 2022 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

k8s-ci-robot commented Dec 23, 2022

kerthcet commented Dec 26, 2022

mimowo commented Jan 16, 2023

Choose a reason for hiding this comment

mimowo Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Jan 17, 2023

ahg-g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Jan 17, 2023

k8s-ci-robot commented Jan 17, 2023

mimowo commented Dec 23, 2022 •

edited

Loading

mimowo Jan 17, 2023 •

edited

Loading

ahg-g Jan 17, 2023 •

edited

Loading