New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3 #93
Bug 1791022: Ensure removal of 4.4-only resources on downgrade to 4.3 #93
Conversation
@marun: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@marun: This pull request references Bugzilla bug 1791022, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@marun: This pull request references Bugzilla bug 1791022, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
32939ed
to
f7323cb
Compare
/bugzilla refresh |
@marun: This pull request references Bugzilla bug 1791022, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold pending #89 |
/bugzilla refresh |
@marun: This pull request references Bugzilla bug 1791022, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold cancel |
@marun: This pull request references Bugzilla bug 1791022, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f7323cb
to
192ce3e
Compare
/bugzilla refresh |
@marun: This pull request references Bugzilla bug 1791022, which is valid. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
/retest |
pkg/operator/starter.go
Outdated
if err != nil { | ||
klog.Warningf("Failed to retrieve 4.4 deployment: %v", err) | ||
} | ||
err = deployClient.Delete(deployName, &metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not delete directly, why a GET?
Shouldn't we stop the loop as soon as it succeeds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original code is still in place, did you add your changes to your latest commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get is still there. But I don't care too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I would prefer to see the loop stop when the deployment is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the changes as per @deads2k. iirc his assertion is that continual reconciliation is required to avoid confusing users when behavior is inconsistent (removes a deployment on startup not subsequently).
As to why get, I thought it would be preferable to see 'get' in the log vs 'delete' if continually reconciling but maybe that's specious logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am still confused why the deployment is different from the other resources in cleanupUnifiedDeployment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I asked in #93 (comment). Those other resources don't have an operational impact for the operator, so deleting them isn't so critical, but the principle of consistent behavior suggests continual reconciliation for them too.
I guess I've answered my own question. Maybe reconcile on a longer timeline - 20m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. 20m is also fine, so is 1m.
ca636ac
to
4a632a4
Compare
4a632a4
to
b91b80d
Compare
pkg/operator/sync.go
Outdated
// | ||
// The 4.4 deployment does have an operational impact, and is continually monitored | ||
// for removal via a goroutine started in RunOperator. | ||
if err := once.Do(func() error { return cleanupUnifiedDeployment(c) }); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there no loop around these like it is in the other case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enclosing function is the operator sync loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combination with sync.Once
confused me. Anyway, seems to be good enough.
/hold Need to revert the deployment cleanup to be continuous rather than once as per @deads2k |
b91b80d
to
b2e457c
Compare
/hold cancel Updated to be in keeping with openshift/cluster-openshift-apiserver-operator#313, PTAL. |
b2e457c
to
5cdc9d2
Compare
pkg/operator/sync.go
Outdated
@@ -14,6 +36,16 @@ func syncControllers(c serviceCAOperator, operatorConfig *operatorv1.ServiceCA) | |||
return err | |||
} | |||
|
|||
// Remove resources related to the 4.4 controller deployment at most once. These |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k Is it acceptable to remove the resources related to the deployment at most once rather than continually, since the resources in question have no operational impact without the deployment (which is continually removed by the starter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care too much. It's good enough to do it once.
/retest |
/hold |
5cdc9d2
to
766a703
Compare
766a703
to
8afc576
Compare
/hold cancel Updated to attempt removal of 4.4 resources every 20 minutes. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marun, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@marun: All pull requests linked via external trackers have merged. Bugzilla bug 1791022 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is in support of a 4.4 PR that unifies all service ca controllers into a single deployment to simplify maintenance and administration: #89