Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1999556: annotate rendered config with OCP version #2918

Conversation

kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Jan 19, 2022

We need to ensure that all rendered configs were generated in the proper version of the OCP release version before allowing the MCO to report upgraded. Previously, we added checks to verify MCs were generated with the correct MCO commit/version.Hash and verify the osImageURL. However, on rare occasions (and usually just in ci) there will be an OCP release that neither includes a new MCO commit nor a new rhcos image, and because we aren't checking against the OCP version, we prematurely report upgraded before the new MCO/MCC/Rendered MCs are generated.

         "metadata": {
            "annotations": {
                "machineconfiguration.openshift.io/release-image-version": "4.10.0-0.ci.test-2022-01-20-003840-ci-op-mnyl122f-latest"
            }

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 19, 2022
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2022
@kikisdeliveryservice kikisdeliveryservice force-pushed the sync-release-image-version branch 2 times, most recently from d7fb906 to f8a5118 Compare January 20, 2022 02:56
@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] ignore [WIP] [ignore] Bug 1999556 Jan 20, 2022
@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] [ignore] Bug 1999556 [WIP] Bug 1999556: [ignore] Jan 20, 2022
@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@kikisdeliveryservice: This pull request references Bugzilla bug 1999556, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

[WIP] Bug 1999556: [ignore]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 24, 2022

@kikisdeliveryservice: This pull request references Bugzilla bug 1999556, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

[WIP] Bug 1999556: [ignore]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor Author

/test e2e-aws-disruptive

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@kikisdeliveryservice: This pull request references Bugzilla bug 1999556, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

[WIP] Bug 1999556: [ignore]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice kikisdeliveryservice force-pushed the sync-release-image-version branch 5 times, most recently from 17332ce to 5ba686b Compare January 25, 2022 22:50
@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] Bug 1999556: [ignore] [WIP] Bug 1999556: annotate rendered config with OCP version Jan 25, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 25, 2022

@kikisdeliveryservice: This pull request references Bugzilla bug 1999556, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

[WIP] Bug 1999556: annotate rendered config with OCP version

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor Author

mco/pr looking good, but some unrelatedblocked passing

``multus-validating-config.k8s.io: x509: certificate signed by unknown authority"```

/test e2e-agnostic-upgrade

@kikisdeliveryservice
Copy link
Contributor Author

aws not doing great
/test e2e-aws

@kikisdeliveryservice
Copy link
Contributor Author

error: error creating buildah builder: Error reading blob ...invalid status code from registry 503 (Service Unavailable) on both retests =/

/retest-required

@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] Bug 1999556: annotate rendered config with OCP version Bug 1999556: annotate rendered config with OCP version Jan 26, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 26, 2022
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks sane to me and I think we could ship it as is. Some comments below which could be considered all optionals/informational.

pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/sync.go Show resolved Hide resolved
pkg/controller/common/constants.go Show resolved Hide resolved
@cgwalters
Copy link
Member

cgwalters commented Jan 26, 2022

Previous related PRs:

@kikisdeliveryservice
Copy link
Contributor Author

A couple of changes are incoming

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2022
@kikisdeliveryservice kikisdeliveryservice force-pushed the sync-release-image-version branch 2 times, most recently from f63d071 to 82ed8fd Compare January 26, 2022 23:37
@kikisdeliveryservice
Copy link
Contributor Author

Updated PR after doing some testing to implement some of @cgwalters suggestions. This is now ready. 😸

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2022
@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented Jan 27, 2022

Code generally lgtm. I have a question on the BZ: for situations that would trigger the bugzilla error
the "master" pool should be updated before the CVO reports available at the new version occurred

The master pool would have to be marked as Updating when the MCO operator reports updated right? In the case where neither the MCO commit changes nor the OSImage changes, wouldn't the master pool not roll out any update? i.e. the pool never goes to updating and we wouldn't even see the error in the bugzilla?

Looking through CI search (https://search.ci.openshift.org/?search=pool+should+be+updated+before+the+CVO+reports+available+at+the+new+version&maxAge=336h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job), there was only 2 hits of this in the past 2 weeks. Looking at the first link, there was a new rendered-master being generated:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-e2e-metal-ipi-upgrade/1483817445192896512/artifacts/e2e-metal-ipi-upgrade/gather-extra/artifacts/oc_cmds/machineconfigs

Comparing those 2 in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-e2e-metal-ipi-upgrade/1483817445192896512/artifacts/e2e-metal-ipi-upgrade/gather-extra/artifacts/machineconfigs.json

it seems that the upgrade only contained a diff in
/etc/kubernetes/manifests/keepalived.yaml

Which seems to imply that to trigger this error, a template change would be the only diff between the two OCP release versions...? Or is that due to another clusteroperator change that changed the dynamic variables populating that file?

Would this PR be covering this scenario? Sorry, am a bit confused as to how we can trigger this error after your previous fixes, so trying to think this through.

pkg/operator/status.go Outdated Show resolved Hide resolved
Controllerconfig also annotated to plumb version to render controller.

Also added check to verify that annotations match ocp release image
version to prevent MCO reported upgraded before a new rendered cfg
is rolled out in cases where there is no new MCO commit or new
osImageURL.

Related-to: BZ 1999556
@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jan 27, 2022

@yuqi-zhang Intention for this PR is to fix the known issue of when there is an OCP upgrade that contains no new MCO commit and no new MCO osImageURL. Have had known failures there in the past, so closing the loop on that case via this pr by waiting for a new renderconfig to always roll out on all upgrades.

In the run that you pointed to, osimageurl is same for both releases and old rendered config and new rendered config both point to same controller version 51dc0801ed7d705820f557fcabf04eff023bf568. So, this PR should cover that case (ofc ymmv since that cluster hit many problems and didn't actually upgrade) :)

@yuqi-zhang
Copy link
Contributor

Yeah ok that makes sense, so we don't report before the newest render gets generated.

/lgtm

To give it a safer chance of making it in

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,kikisdeliveryservice,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2022

@kikisdeliveryservice: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node c5b9bf0 link false /test e2e-aws-single-node
ci/prow/e2e-aws-disruptive c5b9bf0 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-upgrade-single-node c5b9bf0 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-gcp-op-single-node c5b9bf0 link false /test e2e-gcp-op-single-node
ci/prow/e2e-aws-serial c5b9bf0 link false /test e2e-aws-serial
ci/prow/e2e-aws-workers-rhel7 c5b9bf0 link false /test e2e-aws-workers-rhel7
ci/prow/e2e-vsphere-upgrade c5b9bf0 link false /test e2e-vsphere-upgrade
ci/prow/okd-e2e-aws c5b9bf0 link false /test okd-e2e-aws
ci/prow/e2e-aws-workers-rhel8 c5b9bf0 link false /test e2e-aws-workers-rhel8

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 310c053 into openshift:master Jan 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2022

@kikisdeliveryservice: All pull requests linked via external trackers have merged:

Bugzilla bug 1999556 has been moved to the MODIFIED state.

In response to this:

Bug 1999556: annotate rendered config with OCP version

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor Author

/cherry-pick release-4.9

@openshift-cherrypick-robot

@kikisdeliveryservice: #2918 failed to apply on top of branch "release-4.9":

Applying: mco/mcc: annotate rendered config with OCP version
Using index info to reconstruct a base tree...
M	pkg/controller/common/constants.go
M	pkg/controller/render/render_controller.go
M	pkg/operator/status.go
M	pkg/operator/sync.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/operator/sync.go
CONFLICT (content): Merge conflict in pkg/operator/sync.go
Auto-merging pkg/operator/status.go
Auto-merging pkg/controller/render/render_controller.go
Auto-merging pkg/controller/common/constants.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 mco/mcc: annotate rendered config with OCP version
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants