New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1822943: return sync error to add to queue #15
Conversation
@sallyom: This pull request references Bugzilla bug 1822943, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -94,12 +94,7 @@ func (c *TargetController) sync() error { | |||
default: | |||
} | |||
|
|||
forceRequeue, err := c.syncKubeStorageVersionMigrator(spec, status, objectMetaGeneration) | |||
if forceRequeue && err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't just changing &&
to ||
fix the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i can try that instead but will also need to bubble up the errors from syncKubeStorageVersionMigrator - as/is forceRequeue is only true when err is nil, and also, the error from syncKubeStorageVersionMigrator that happens when, for example, the operand ns is deleted, does not get returned, it's buried in the operator status message, while the fn returns nil - so errors like that will never trigger a requeue - that's what I've been observing when I delete the operand ns, the operator gets stuck, only recovers when i refresh the operator pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, ptal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks wrong. Instead we should return the error if non-nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, updated, thanks
manageOperatorStatusDegraded(syncErr, operatorStatus) | ||
manageOperatorStatusProgressing(nil, syncErr, operatorStatus, generation) | ||
return syncErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry here that we might of also gone Available==False
and we would not be reflecting that if we don't check the deployment also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, yes, i'll look at that in the morning : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this PR, made it a much smaller diff-still fixed the issue of operand ns disappearing forever, ptal, thanks!
aeba19a
to
e7810a9
Compare
… resources apply errors
/approve Leaving final lgtm to @sanchezl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sallyom, sanchezl, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sallyom: All pull requests linked via external trackers have merged: openshift/cluster-kube-storage-version-migrator-operator#15. Bugzilla bug 1822943 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When sync errors encountered, nothing is added to queue, so operator is stuck until operator pod refreshed. This PR ensures resync upon error.