Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce timeout for podsReady #498

Merged
merged 1 commit into from
Jan 17, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Dec 23, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of: #349

Special notes for your reviewer:

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 23, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 23, 2022
@mimowo mimowo changed the title WIF: Implementation of enforcing timeout for podsReady [placeholder PR] WIP: Enforce timeout for podsReady [placeholder PR] Dec 23, 2022
@mimowo mimowo changed the title WIP: Enforce timeout for podsReady [placeholder PR] WIP: Enforce timeout for podsReady Dec 23, 2022
@kerthcet
Copy link
Contributor

/ok-to-test
In case you want to test it.

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 26, 2022
@mimowo mimowo marked this pull request as ready for review January 2, 2023 15:49
@mimowo mimowo force-pushed the pods-ready-timeout branch 12 times, most recently from 9f1fb33 to 54dca06 Compare January 3, 2023 14:17
apis/config/v1alpha2/configuration_types.go Outdated Show resolved Hide resolved
apis/config/v1alpha2/defaults.go Outdated Show resolved Hide resolved
main_test.go Outdated Show resolved Hide resolved
main_test.go Outdated Show resolved Hide resolved
pkg/controller/core/core.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/suite_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/suite_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
@mimowo mimowo force-pushed the pods-ready-timeout branch 3 times, most recently from d3027d1 to 6c0c0d4 Compare January 4, 2023 08:51
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2023
@mimowo mimowo force-pushed the pods-ready-timeout branch 2 times, most recently from 1e474b2 to 54042c1 Compare January 16, 2023 07:55
@mimowo
Copy link
Contributor Author

mimowo commented Jan 16, 2023

@ahg-g @alculquicondor please do another pass on the PR, all the comments are addressed for now.

waitForPodsReady:
enable: true
webhook:
port: 9443
`), os.FileMode(0600)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think testing every API field in this package is an overkill. A test in apis/config/v1alpha2 should be enough.

The test in this file is to make sure the defaulting pipeline is properly enabled in main.go, but it shouldn't get into the details of all possible defaults.

Copy link
Contributor Author

@mimowo mimowo Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes sense. Now I deleted one test here on waitForPodsReady and covered the defaulting logic in defaults_test.go.

return ctrl.Result{RequeueAfter: recheckAfter}, nil
} else {
klog.V(2).InfoS("Cancelling admission of the workload due to exceeding the PodsReady timeout", "workload", req.NamespacedName.String())
wl.Spec.Admission = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to clone first? I can't remember if the controller-runtime client does a copy.

Although I have a different proposal for how to patch admission that doesn't require cloning https://github.com/kubernetes-sigs/kueue/pull/514/files#diff-a7490e7f5efa2a59ab508f592b67dd6e7530b9d2e1b19a1b6a7438e6c2386905

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that updating the workload in-place without a copy was working ok, because a new instance is created at the beginning of Reconcile and it is not used after cancelling admission.

However, I agree it is better to avoid in-place modification and I like the approach you suggested so I've applied it here too.

test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
var _ = ginkgo.Context("Short PodsReady timeout", func() {

ginkgo.BeforeEach(func() {
beforeEachWithTimeout(3 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 5s to be on the safe side.

Sometimes the bots can be very slow to schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about stability so tested it locally with 50 repeats and 10 repeats under stress (N being equal to my number of cores), and also tested under 1s locally, and 10 passed again: #498 (comment).

However, I agree that there is a chance it could flake on infra. Still it seems easier to increase it in the future if it turns out flaky, rather than ever decreasing (thus paying unnecessary 2s on every build).

test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
return prodWl1.Spec.Admission
}, util.Timeout, util.Interval).Should(gomega.BeNil())

ginkgo.By("verify the 'prod2' workload gets admitted and the 'prod1' is waiting")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be flaky?
Because prod1 would be added back to the queue and it has an old StartTime. Maybe there should be a grace period in which prod1 doesn't enter the queue. Or we could delete the Workload object so that the Job controller recreates it with a newer StartTime. This would be good because we know this workload couldn't schedule, so we should minimize the chances of it clogging the queue again.

However, this behavior might make it harder to reset the node selector. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test isn't flaky because it first asserts that the other workload is waiting (util.ExpectWorkloadsToBeWaiting(ctx, k8sClient, prodWl2). The waiting workload is admitted first as it completes the previous scheduler cycle.

As for the clogging of the queue, deleting a workload sounds unnecessary and does not seem to reflect the intention, IMO.

I guess another relatively simple solution would be to adjust the queues logic to use max of CreationTimestamp and LastTransitionTime for Admitted=False condition. Then, the re-admitted workloads would be last again. Let me know what you think, we could also defer it to a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, and I'm fine having it in a follow up PR.

Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, there is a wait inside the scheduling cycle. Good.

Copy link
Contributor

@ahg-g ahg-g Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, the two workloads have different priorities, so the timestamps don't have an effect on ordering in the queue because we first order by priority.

Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?

I am not sure. I think preemption and exceeding ready timeout should probably be treated differently. The workload is not at "fault" in the former, but it is in the latter, and so it seems unfair to "punish" workloads in the preemption case.

Another two criteria we can incorporate in the order:

  1. Job size for the preemption case, smaller preempted jobs gets preference over larger ones.
  2. Backoffs for the not ready case

test/integration/scheduler/podsready/scheduler_test.go Outdated Show resolved Hide resolved
pkg/controller/core/workload_controller.go Show resolved Hide resolved
Comment on lines 290 to 300
cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)
if cqOk {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cqName, cqOk := r.queues.ClusterQueueForWorkload(wl)
if cqOk {
if cqName, cqOk := r.queues.ClusterQueueForWorkload(wl); cqOk {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to be outdated. I now adjust the QueueAssociatedInadmissibleWorkloadsAfter and DeleteWorkload methods to deal with workloads with Spec.Admission=nil, as suggested by @alculquicondor here: #498 (comment)

pkg/queue/manager.go Show resolved Hide resolved
pkg/controller/core/workload_controller.go Show resolved Hide resolved
@mimowo mimowo force-pushed the pods-ready-timeout branch 8 times, most recently from 40ac862 to 2f66414 Compare January 17, 2023 09:40
@alculquicondor
Copy link
Contributor

Generally LGTM, but I'll leave the last pass to @ahg-g

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits

pkg/controller/core/workload_controller.go Show resolved Hide resolved
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
gomega.Eventually(func() bool {
gomega.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl1), prodWl1)).Should(gomega.Succeed())
return apimeta.IsStatusConditionTrue(prodWl1.Status.Conditions, kueue.WorkloadAdmitted)
}, util.Timeout, util.Interval).Should(gomega.BeTrue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that prod1 admission get cancelled by the time we do this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example the test process gets evicted and the 3 seconds passes before we get a chance to do this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely, because adding the condition is the next thing workload_controller would do in the Reconcile function after admission!=nil. Also, parking the other workload as waiting is the next thing the scheduler would do.

Still, it is possible if the testing infra is under load, but hasn't happened in my testing, I guess I could bump the timeout to 4s or 5s proactively, but maybe we bump once there is some evidence this is needed. Just now I tested it in a loop 30 times with stress --cpu ${number of my cores} and no flakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment here then: "We assume that the test will get to this check before the timeout expires and the kueue cancels the admission. Mentioning this in case this test flakes in the future."

Still, it is possible if the testing infra is under load, but hasn't happened in my testing,

Did you test with a smaller timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed adding the comment.

Did you test with a smaller timeout?

Just now did 15 passes with the timeout=1s and all passed, all with the same stress. Got failing 1 out of 2 with 200ms.

@mimowo mimowo force-pushed the pods-ready-timeout branch 3 times, most recently from 26e3a55 to d1183e9 Compare January 17, 2023 16:40
@ahg-g
Copy link
Contributor

ahg-g commented Jan 17, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2023
@k8s-ci-robot k8s-ci-robot merged commit 406cc73 into kubernetes-sigs:main Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants