-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce timeout for podsReady #498
Conversation
Skipping CI for Draft Pull Request. |
/ok-to-test |
abd55da
to
2bd729c
Compare
9f1fb33
to
54dca06
Compare
d3027d1
to
6c0c0d4
Compare
1e474b2
to
54042c1
Compare
@ahg-g @alculquicondor please do another pass on the PR, all the comments are addressed for now. |
waitForPodsReady: | ||
enable: true | ||
webhook: | ||
port: 9443 | ||
`), os.FileMode(0600)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think testing every API field in this package is an overkill. A test in apis/config/v1alpha2 should be enough.
The test in this file is to make sure the defaulting pipeline is properly enabled in main.go, but it shouldn't get into the details of all possible defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it makes sense. Now I deleted one test here on waitForPodsReady and covered the defaulting logic in defaults_test.go
.
return ctrl.Result{RequeueAfter: recheckAfter}, nil | ||
} else { | ||
klog.V(2).InfoS("Cancelling admission of the workload due to exceeding the PodsReady timeout", "workload", req.NamespacedName.String()) | ||
wl.Spec.Admission = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to clone first? I can't remember if the controller-runtime client does a copy.
Although I have a different proposal for how to patch admission that doesn't require cloning https://github.com/kubernetes-sigs/kueue/pull/514/files#diff-a7490e7f5efa2a59ab508f592b67dd6e7530b9d2e1b19a1b6a7438e6c2386905
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that updating the workload in-place without a copy was working ok, because a new instance is created at the beginning of Reconcile
and it is not used after cancelling admission.
However, I agree it is better to avoid in-place modification and I like the approach you suggested so I've applied it here too.
var _ = ginkgo.Context("Short PodsReady timeout", func() { | ||
|
||
ginkgo.BeforeEach(func() { | ||
beforeEachWithTimeout(3 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 5s to be on the safe side.
Sometimes the bots can be very slow to schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about stability so tested it locally with 50 repeats and 10 repeats under stress (N being equal to my number of cores), and also tested under 1s locally, and 10 passed again: #498 (comment).
However, I agree that there is a chance it could flake on infra. Still it seems easier to increase it in the future if it turns out flaky, rather than ever decreasing (thus paying unnecessary 2s on every build).
return prodWl1.Spec.Admission | ||
}, util.Timeout, util.Interval).Should(gomega.BeNil()) | ||
|
||
ginkgo.By("verify the 'prod2' workload gets admitted and the 'prod1' is waiting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be flaky?
Because prod1 would be added back to the queue and it has an old StartTime. Maybe there should be a grace period in which prod1 doesn't enter the queue. Or we could delete the Workload object so that the Job controller recreates it with a newer StartTime. This would be good because we know this workload couldn't schedule, so we should minimize the chances of it clogging the queue again.
However, this behavior might make it harder to reset the node selector. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test isn't flaky because it first asserts that the other workload is waiting
(util.ExpectWorkloadsToBeWaiting(ctx, k8sClient, prodWl2)
. The waiting workload is admitted first as it completes the previous scheduler cycle.
As for the clogging of the queue, deleting a workload sounds unnecessary and does not seem to reflect the intention, IMO.
I guess another relatively simple solution would be to adjust the queues logic to use max of CreationTimestamp
and LastTransitionTime
for Admitted=False condition. Then, the re-admitted workloads would be last again. Let me know what you think, we could also defer it to a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work, and I'm fine having it in a follow up PR.
Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, there is a wait inside the scheduling cycle. Good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test, the two workloads have different priorities, so the timestamps don't have an effect on ordering in the queue because we first order by priority.
Maybe it could be an option about what to do after preemption. WDYT @ahg-g ?
I am not sure. I think preemption and exceeding ready timeout should probably be treated differently. The workload is not at "fault" in the former, but it is in the latter, and so it seems unfair to "punish" workloads in the preemption case.
Another two criteria we can incorporate in the order:
- Job size for the preemption case, smaller preempted jobs gets preference over larger ones.
- Backoffs for the not ready case
cqName, cqOk := r.queues.ClusterQueueForWorkload(wl) | ||
if cqOk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cqName, cqOk := r.queues.ClusterQueueForWorkload(wl) | |
if cqOk { | |
if cqName, cqOk := r.queues.ClusterQueueForWorkload(wl); cqOk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to be outdated. I now adjust the QueueAssociatedInadmissibleWorkloadsAfter
and DeleteWorkload
methods to deal with workloads with Spec.Admission=nil
, as suggested by @alculquicondor here: #498 (comment)
40ac862
to
2f66414
Compare
Generally LGTM, but I'll leave the last pass to @ahg-g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two nits
gomega.Eventually(func() bool { | ||
gomega.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl1), prodWl1)).Should(gomega.Succeed()) | ||
return apimeta.IsStatusConditionTrue(prodWl1.Status.Conditions, kueue.WorkloadAdmitted) | ||
}, util.Timeout, util.Interval).Should(gomega.BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that prod1 admission get cancelled by the time we do this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example the test process gets evicted and the 3 seconds passes before we get a chance to do this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely, because adding the condition is the next thing workload_controller would do in the Reconcile
function after admission!=nil
. Also, parking the other workload as waiting
is the next thing the scheduler would do.
Still, it is possible if the testing infra is under load, but hasn't happened in my testing, I guess I could bump the timeout to 4s or 5s proactively, but maybe we bump once there is some evidence this is needed. Just now I tested it in a loop 30 times with stress --cpu ${number of my cores}
and no flakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment here then: "We assume that the test will get to this check before the timeout expires and the kueue cancels the admission. Mentioning this in case this test flakes in the future."
Still, it is possible if the testing infra is under load, but hasn't happened in my testing,
Did you test with a smaller timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed adding the comment.
Did you test with a smaller timeout?
Just now did 15 passes with the timeout=1s and all passed, all with the same stress. Got failing 1 out of 2 with 200ms.
26e3a55
to
d1183e9
Compare
d1183e9
to
9d9336b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Part of: #349
Special notes for your reviewer: