New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: prevent upgrades on degraded pools #2231
operator: prevent upgrades on degraded pools #2231
Conversation
|
WIP last commit (9d8ca3b) untested. up to and incl commit (d8cf99c) gets upgradeable false on master pool |
|
/test e2e-gcp-op |
Currently we never get to the state of Upgradeable=False in MCO, even for a master pool. Add a check here to pick up if any pool is degraded.
9d8ca3b
to
35c4cb9
Compare
|
/test e2e-agnostic-upgrade |
|
/skip |
|
💯 @sinnykumari @yuqi-zhang ptal |
|
/test e2e-agnostic-upgrade |
|
just a not harmful comment #2231 (comment) which we can drop if that's the case later on /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@kikisdeliveryservice: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
c6e2fa8 (add Upgradeable = False to syncUpgradeableStatus, 2020-11-20, openshift#2231) added this check, but Reason is supposed to be a machine-readable slug, while the human-oriented string goes in Message [1]. https://github.com/openshift/api/blob/4b79815405ec40f1d72c3a74bae0ae7da543e435/config/v1/types_cluster_operator.go#L128-L136
c6e2fa8 (add Upgradeable = False to syncUpgradeableStatus, 2020-11-20, openshift#2231) added this check, but Reason is supposed to be a machine-readable slug, while the human-oriented string goes in Message [1]. https://github.com/openshift/api/blob/4b79815405ec40f1d72c3a74bae0ae7da543e435/config/v1/types_cluster_operator.go#L128-L136
Finally using our Upgradeable operator status which is currently always at true..
One of the biggest problems and compounding issues we see are upgrades rolled out by users when there are already degraded pools. This makes troubleshooting more difficult, makes users think that a previous upgrade is finished an successful and compounds any existings bugs/problems.
This pr finally leverages the upgradeable status so that users can fix their degraded pools before upgrading (which is the easier time) and have the MCO's status be a better reflection of the pools that it manages.
So we end up with Upgradeable = False when pools are degraded and toggle back to Upgradeable = True when no pools are degraded.