-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove finalizer if workload finished #1454
fix: remove finalizer if workload finished #1454
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Welcome @woehrl01! |
Hi @woehrl01. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
309db57
to
816ec06
Compare
cc @achernevskii |
Please add unit tests |
Actually Can you debug why this logic didn't remove the finalizer in this case?
UPDATE: #1450 (comment) |
func (r *WorkloadReconciler) removeFinalizer(ctx context.Context, wl *kueue.Workload) error { | ||
if controllerutil.RemoveFinalizer(wl, kueue.ResourceInUseFinalizerName) { | ||
return r.client.Update(ctx, wl) | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a similar method in pkg/controller/jobframework/reconciler.go
.
kueue/pkg/controller/jobframework/reconciler.go
Lines 661 to 666 in e12a871
func (r *JobReconciler) removeFinalizer(ctx context.Context, wl *kueue.Workload) error { | |
if controllerutil.RemoveFinalizer(wl, kueue.ResourceInUseFinalizerName) { | |
return r.client.Update(ctx, wl) | |
} | |
return nil | |
} |
We could convert that method into an util function and use it in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hold from this, as I believe that the fix should be in the reconciler only #1450 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I just pushed a change which deletes the finalizer in the case that the stopJob results in the error that the object is not found. In any other case (and failure of removing the finalizer) results in returning an error on reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmmm is the not found the case that you faced in production?
If somehow removing the finalizer fails, I think the next reconcile wouldn't be able to reach to this piece of the code, would it?
Maybe a better solution is closer what you had initially. But we shouldn't try to remove finalizers from two places. So perhaps just doing it in the workload controller only is better.
Whatever you do, add a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the back and forth, but I think I have a better understanding of the problem now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the problem I faced is that the Pod is already deleted, but the finalizer is still on the workload resource.
Alright, let me think of an alternative implementation. If meanwhile you have some ideas, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, I think the solution should be:
- Do finalizer removal from Workload controller
- Do NOT do any workload finalizer removal from the job reconciler, for better separation of concerns and to avoid the controllers from stumbling into each other.
4063476
to
ee2a5f7
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: woehrl01 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
88556d8
to
81b8ab1
Compare
@woehrl01: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@alculquicondor I'm encountering issues with the final unit tests. Although the logic in the changes seems correct, the jobs need an additional reconciliation loop to be marked as completed, setting the 'JobCompleted' condition on the workload to true. Do you have suggestions for an elegant solution to implement this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, in unit tests you should just test the effect of one reconcile loop.
If something requires more than one reconcile loop, you can consider them 2 different test cases. And if the interaction could be complex enough, it's worth considering an integration test.
@@ -0,0 +1,80 @@ | |||
package util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an additional mockClient? Prefer to use the fake client provided by controller-runtime
github.com/prometheus/common v0.44.0 // indirect | ||
github.com/prometheus/procfs v0.11.1 // indirect | ||
github.com/sirupsen/logrus v1.9.3 // indirect | ||
github.com/spf13/cobra v1.7.0 // indirect | ||
github.com/spf13/pflag v1.0.5 // indirect | ||
github.com/stoewer/go-strcase v1.3.0 // indirect | ||
github.com/stretchr/objx v0.5.0 // indirect | ||
github.com/stretchr/testify v1.8.4 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor because of the upcoming season I have to abandon this PR for a few weeks. Feel free to pick it up in the meantime, I see that there are already some PR depending on this and I don't want to introduce a blocker. Thanks! |
Superseded by #1523 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
It removes the finalizer of workloads after it's completed, if the finalizer is not deleted by the job deletion (e.g. because of controller restart)
Which issue(s) this PR fixes:
Fixes #1450
Special notes for your reviewer:
Does this PR introduce a user-facing change?