Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger the workloads eviction on admission check rejection. #1562

Merged

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Jan 9, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

Allow the workload reconciler to trigger the eviction of a workload that has Rejected admission checks and is Admitted, before making it as Finished.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Trigger an eviction for an admitted Job after an admission check changed state to Rejected.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. labels Jan 9, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 9, 2024
Copy link

netlify bot commented Jan 9, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit fbca3bf
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65a13596e0fa3000085df840

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 9, 2024
@trasc
Copy link
Contributor Author

trasc commented Jan 9, 2024

/cc @alculquicondor

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note if this change makes a user-facing impact (which seems to be the case)

pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/approve
/hold
Leaving LGTM to @mimowo

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 9, 2024
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2024
@alculquicondor
Copy link
Contributor

/approve cancel

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2024
@trasc trasc force-pushed the evict-workloads-with-rejected-AC branch 2 times, most recently from d7c44c3 to 89c647e Compare January 11, 2024 08:50
pkg/workload/workload.go Show resolved Hide resolved
cqName, cqOk := r.queues.ClusterQueueForWorkload(&wl)
if cqOk {
if updated, err := r.reconcileSyncAdmissionChecks(ctx, &wl, cqName); updated || err != nil {
return ctrl.Result{}, err
}
}

if workload.SyncAdmittedCondition(&wl) {
// If the workload is not admitted update its admitted condition if necessary.
if !workload.IsAdmitted(&wl) && workload.SyncAdmittedCondition(&wl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed, or just a performance optimization? If needed, does the order matter?

Also, is it correct, I'm asking because if the workload was already admitted, then (IIUC) we don't enter the branch, we also skip if workload.HasQuotaReservation(&wl) {, then we enter if rejectedChecks := workload.GetRejectedChecks(&wl); len(rejectedChecks) > 0 {. So we would mark the workload as finished.
I think the idea was to go via eviction rather than marking the workload as finished? Can you explain what is the code path leading to workload eviction in that scenario now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed, and the order is important since SyncAdmittedCondition can update the Admitted cind. in the wl.

If the workload is admitted, it will skip this and just check if the eviction is needed.

Copy link
Contributor

@mimowo mimowo Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that the integration test passes when the check !workload.IsAdmitted(&wl) is removed. Can we add / extend the integration test to demonstrate its importance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(If we don't keep this the Admitted condition is cleared before the eviction starts)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If we don't keep this the Admitted condition is cleared before the eviction starts)

Why does that matter?

To be clear I'm fine with having Admitted=True and Evicted=True, I'm also fine with clearing the Admitted before adding Evicted=True. Trying to understand what is the difference, to avoid unnecessary code complications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment updated.

To be clear I'm fine with having Admitted=True and Evicted=True, I'm also fine with clearing the Admitted before adding Evicted=True. Trying to understand what is the difference, to avoid unnecessary code complications.

It's more logical in my opinion and is the same flow as in case of premonition based eviction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, I buy the argument of consistent flow with preemption-based eviction

@trasc trasc force-pushed the evict-workloads-with-rejected-AC branch from c42c52e to 598ba28 Compare January 11, 2024 15:30
@trasc trasc force-pushed the evict-workloads-with-rejected-AC branch from 598ba28 to 683d38c Compare January 11, 2024 15:52
@mimowo
Copy link
Contributor

mimowo commented Jan 11, 2024

/lgtm
I don't consider the remaining comments as blocking, but nice-to-haves.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fd03579ded1ca0a3f9f48e31766aaf8683edb151

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2024
@trasc trasc force-pushed the evict-workloads-with-rejected-AC branch from 9edd7f2 to b10409e Compare January 12, 2024 09:00
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we also need a release note, I suggest "Fix the bug that a job would continue execution if the admission check state is changed to Rejected during execution."

pkg/controller/core/workload_controller.go Outdated Show resolved Hide resolved
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 12, 2024
@trasc
Copy link
Contributor Author

trasc commented Jan 12, 2024

LGTM, I think we also need a release note, I suggest "Fix the bug that a job would continue execution if the admission check state is changed to Rejected during execution."

done

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: dba641e2cb63a7d9e8df71013d46316a634496cc

@alculquicondor
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2024
@alculquicondor
Copy link
Contributor

/release-note-edit

Trigger an eviction for an admitted Job after an admission check changed state to Rejected.

@k8s-ci-robot k8s-ci-robot merged commit ea5bf98 into kubernetes-sigs:main Jan 12, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Jan 12, 2024
@trasc trasc deleted the evict-workloads-with-rejected-AC branch January 15, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants