Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: prevent upgrades on degraded pools #2231

Merged

Conversation

kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Nov 16, 2020

Finally using our Upgradeable operator status which is currently always at true..

One of the biggest problems and compounding issues we see are upgrades rolled out by users when there are already degraded pools. This makes troubleshooting more difficult, makes users think that a previous upgrade is finished an successful and compounds any existings bugs/problems.

This pr finally leverages the upgradeable status so that users can fix their degraded pools before upgrading (which is the easier time) and have the MCO's status be a better reflection of the pools that it manages.

So we end up with Upgradeable = False when pools are degraded and toggle back to Upgradeable = True when no pools are degraded.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2020
@kikisdeliveryservice kikisdeliveryservice added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2020
@kikisdeliveryservice
Copy link
Contributor Author

WIP last commit (9d8ca3b) untested. up to and incl commit (d8cf99c) gets upgradeable false on master pool

  - lastTransitionTime: "2020-11-17T02:51:01Z"
    reason: One or more machine config pool is degraded, please see `oc get mcp` for
      further details and resolve before upgrading
    status: "False"
    type: Upgradeable
  extension:
    master: 'pool is degraded because rendering fails with "": "Failed to render configuration
      for pool master: parsing Ignition config failed: unknown version. Supported
      spec versions: 2.2, 3.0, 3.1"'
    worker: all 3 nodes are at latest configuration rendered-worker-e28c7ac562ab60d143ec4afbde4e8393
  - lastTransitionTime: "2020-11-17T02:52:48Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension:
    master: all 3 nodes are at latest configuration rendered-master-77519c04d1536ccb2cb303acb5a7106c
    worker: all 3 nodes are at latest configuration rendered-worker-e28c7ac562ab60d143ec4afbde4e8393

@kikisdeliveryservice
Copy link
Contributor Author

/test e2e-gcp-op

@kikisdeliveryservice kikisdeliveryservice removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2020
@kikisdeliveryservice
Copy link
Contributor Author

/test e2e-agnostic-upgrade

@kikisdeliveryservice
Copy link
Contributor Author

/skip

@kikisdeliveryservice kikisdeliveryservice removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2020
@kikisdeliveryservice kikisdeliveryservice changed the title WIP: sync experiments operator: prevent upgrades on degraded pools Nov 24, 2020
@runcom
Copy link
Member

runcom commented Nov 24, 2020

💯 @sinnykumari @yuqi-zhang ptal
/approve

@kikisdeliveryservice
Copy link
Contributor Author

/test e2e-agnostic-upgrade

@runcom
Copy link
Member

runcom commented Nov 25, 2020

just a not harmful comment #2231 (comment) which we can drop if that's the case later on

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot
Copy link
Contributor

openshift-merge-robot commented Nov 26, 2020

@kikisdeliveryservice: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws 35c4cb9 link /test okd-e2e-aws
ci/prow/e2e-ovn-step-registry 35c4cb9 link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 58779e4 into openshift:master Nov 26, 2020
wking added a commit to wking/machine-config-operator that referenced this pull request Mar 15, 2021
c6e2fa8 (add Upgradeable = False to syncUpgradeableStatus,
2020-11-20, openshift#2231) added this check, but Reason is supposed to be a
machine-readable slug, while the human-oriented string goes in Message
[1].

https://github.com/openshift/api/blob/4b79815405ec40f1d72c3a74bae0ae7da543e435/config/v1/types_cluster_operator.go#L128-L136
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/machine-config-operator that referenced this pull request Mar 20, 2021
c6e2fa8 (add Upgradeable = False to syncUpgradeableStatus,
2020-11-20, openshift#2231) added this check, but Reason is supposed to be a
machine-readable slug, while the human-oriented string goes in Message
[1].

https://github.com/openshift/api/blob/4b79815405ec40f1d72c3a74bae0ae7da543e435/config/v1/types_cluster_operator.go#L128-L136
@kikisdeliveryservice kikisdeliveryservice deleted the block-on-degraded branch May 10, 2021 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. team-mco
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants