Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workload when job changes before admission #1223

Merged

Conversation

yaroslava-serdiuk
Copy link
Contributor

@yaroslava-serdiuk yaroslava-serdiuk commented Oct 18, 2023

What type of PR is this?

/kind feature

Which issue(s) this PR fixes:

Fixes #1038

Does this PR introduce a user-facing change?

Jobs preserve their position in the queue if the number of pods change before being admitted

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 18, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 18, 2023
@netlify
Copy link

netlify bot commented Oct 18, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 9df2318
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/653fc8cb0ab3aa0008e041be

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 18, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @yaroslava-serdiuk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 18, 2023
@alculquicondor
Copy link
Contributor

/ok-to-test
/assign @mimowo

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 18, 2023
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just a question regarding the scenario when the job is running.

pkg/controller/jobframework/reconciler.go Show resolved Hide resolved
@mimowo
Copy link
Contributor

mimowo commented Oct 19, 2023

/lgtm
/assign @alculquicondor

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2023
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to modify the workload_webhook, as it currently has PodSets as immutable.

Also please add a integration test for Job that starts with one that doesn't fit in the quota, (record the creationTimestmap of the Worklad at this point) and gets admitted after being updated to a smaller size (ensure that the creationTimestamp didn't change)

pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 26, 2023
@yaroslava-serdiuk yaroslava-serdiuk force-pushed the workload-update branch 2 times, most recently from f2debd9 to 07bbe48 Compare October 26, 2023 10:48
@yaroslava-serdiuk
Copy link
Contributor Author

@alculquicondor
I added a test that modifies parallelism and verify the creation stamp doesn't changed. The pod template requests are immutable fields, so I wanted to write the test you suggested in a way that job doesn't tolerate spot taint, so the job can't fit and with the toleration job can run. However I hit that adding toleration to the job does't update workload (or recreate in the old/current logic): #1264

@yaroslava-serdiuk yaroslava-serdiuk force-pushed the workload-update branch 3 times, most recently from 3e1243f to 4a3546e Compare October 27, 2023 08:41
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ae7a2bf2a9c60f07070d2f22969c82beea38bf60

@alculquicondor
Copy link
Contributor

/release-note-edit

Jobs preserve their position in the queue if their resource requests change before being admitted

@@ -874,6 +874,26 @@ func TestReconciler(t *testing.T) {
},
wantErr: jobframework.ErrNoMatchingWorkloads,
},
"non-matching non-admitted workload is updated": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a case for when the Workload has a quota reservation. In that case we would remove the existing Workload, right?

We might want to change what to do with a Workload that is admitted when the requests change, but that's for #77

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the case for that just above current test ("non-matching admitted workload is deleted")

@yaroslava-serdiuk
Copy link
Contributor Author

Jobs preserve their position in the queue if their resource requests change before being admitted

Resources requests are immutable, the user can't change them

@alculquicondor
Copy link
Contributor

/release-note-edit

Jobs preserve their position in the queue if the number of pods change before being admitted

allErrs = append(allErrs, apivalidation.ValidateImmutableField(newObj.Spec.PriorityClassSource, oldObj.Spec.PriorityClassSource, specPath.Child("priorityClassSource"))...)
allErrs = append(allErrs, apivalidation.ValidateImmutableField(newObj.Spec.PriorityClassName, oldObj.Spec.PriorityClassName, specPath.Child("priorityClassName"))...)

if workload.HasQuotaReservation(oldObj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for only checking oldObj?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to update the workload that doesn't have a quota reservation and this is oldObj.
I think if the update function is called for new object that has quota reservation and the old doesn't (not sure what the real case for that), it's should be the same as we call update first for new obj without quota reservation and then reserve the quota.
My understanding why we check the quota is that we can't guarantee the quota for new object, so we want to do updates when the quota is not reserved. If the quota is fine for new object, than we are fine.
Do I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's fine.
I just remembered that it is not possible to update Spec and Status at the same time.

pkg/webhooks/workload_webhook_test.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2023
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a1a77f08b74d82f96dfe792cdb45de5e582e1ff1

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, yaroslava-serdiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
@k8s-ci-robot k8s-ci-robot merged commit 0adc8c4 into kubernetes-sigs:main Oct 31, 2023
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Oct 31, 2023
return nil, fmt.Errorf("can't construct workload for update: %w", err)
}
wl.Spec = newWl.Spec
if err = r.client.Update(ctx, wl); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the key point here is we'll replace the unadmitted workload with the newest spec. right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. However note, that currently we can update only the amount of pods/containers: #1264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update workload when job changes before admission
5 participants