Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let payload provider render feature gate yaml #2664

Merged

Conversation

enxebre
Copy link
Member

@enxebre enxebre commented Jun 8, 2023

Currently we render the feature gate at the HCP level which results in a discrepancy when mco consume it along with a given NodePool payload-version that differs from the one used by the HCP to generate it. Related #2643

CI Failure example
bootstrap.go:47] error running MCC[BOOTSTRAP]: error creating feature gate access: unable to determine features: missing desired version "4.14.0-0.ci.test-2023-06-08-101141-ci-op-gvin17ww-initial" in featuregates.config.openshift.io/cluster\n"}

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3724/pull-ci-openshift-machine-config-operator-master-e2e-hypershift/1666749702499995648

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3724/pull-ci-openshift-machine-config-operator-master-e2e-hypershift/1666749702499995648/artifacts/e2e-hypershift/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-wvxmd-example-fmxk8/core/pods/logs/ignition-server-768699fd7b-w49lq-ignition-server.log

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci openshift-ci bot requested review from csrwng and sjenning June 8, 2023 16:47
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Jun 8, 2023
Copy link
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, but if it fixes the current issue, we can definitely follow up.

@@ -437,7 +437,7 @@ func reconcileDeployment(deployment *appsv1.Deployment,
},
Args: []string{
"-c",
invokeFeatureGateRenderScript("/shared", releaseVersion, string(featureGateYAML)),
invokeFeatureGateRenderScript("/shared", string(featureGateYAML)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all this container is doing is writing out the feature gate yaml, we can likely do without it. The local ignition provider could read the hcp directly and create the yaml.

Copy link
Member Author

@enxebre enxebre Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'm wanting to keep this PR scope to the minimum.
In follow up we can also consider to let the NodePool controller store the featureGate yaml into the token secret so then we just pass it through the getPayload signature.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@csrwng
Copy link
Contributor

csrwng commented Jun 8, 2023

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2023
@enxebre
Copy link
Member Author

enxebre commented Jun 9, 2023

cluster config operator script failed with

{"level":"info","ts":"2023-06-08T17:11:00Z","logger":"get-payload","msg":"cluster-config-operator process completed","time":"0s","output":"/bin/bash: line 19: /usr/bin/cluster-config-operator: No such file or directory\n"}

and

{"level":"info","ts":"2023-06-08T17:11:32Z","logger":"get-payload","msg":"cluster-config-operator process completed","time":"0s","output":"mkdir: cannot create directory 'input': File exists\nmkdir: cannot create directory 'output': File exists\nmkdir: cannot create directory 'manifests': File exists\n"}

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2664/pull-ci-openshift-hypershift-main-e2e-aws/1666849350090756096/artifacts/e2e-aws/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/namespaces/e2e-clusters-gtp7b-example-jqvp2/core/pods/logs/ignition-server-7b587d498c-vslgl-ignition-server.log
updated

@enxebre enxebre force-pushed the feature-gate-nodepools branch 4 times, most recently from aab9280 to ae768a3 Compare June 9, 2023 12:35
@enxebre
Copy link
Member Author

enxebre commented Jun 9, 2023

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2664/pull-ci-openshift-hypershift-main-e2e-aws/1667120840149307392/artifacts/e2e-aws/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-55smd-example-498fs/core/pods/logs/ignition-server-7b9886b8b7-cf9d6-ignition-server.log

{"level":"info","ts":"2023-06-09T11:10:02Z","logger":"get-payload","msg":"cluster-config-operator process completed","time":"0s","output":"F0609 11:10:02.959705      33 render.go:53] failed rendering assets: lstat /usr/share/bootkube/manifests/bootstrap-manifests: no such file or directory\n"}

@netlify
Copy link

netlify bot commented Jun 13, 2023

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit 8093e27
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/64888c3454879e0008f603c0
😎 Deploy Preview https://deploy-preview-2664--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

we are back to only upgrade failing, now with
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2664/pull-ci-openshift-hypershift-main-e2e-aws/1668534680921575424/artifacts/e2e-aws/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-kzpgc-example-sqbgk/core/pods/logs/ignition-server-567875cf5b-z8rws-ignition-server.log

{"level":"info","ts":"2023-06-13T08:52:01Z","logger":"get-payload","msg":"cluster-config-operator process completed","time":"0s","output":"Error: unknown flag: --rendered-manifest-files\n"}

@JoelSpeed
Copy link
Contributor

we are back to only upgrade failing, now with https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2664/pull-ci-openshift-hypershift-main-e2e-aws/1668534680921575424/artifacts/e2e-aws/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-kzpgc-example-sqbgk/core/pods/logs/ignition-server-567875cf5b-z8rws-ignition-server.log

{"level":"info","ts":"2023-06-13T08:52:01Z","logger":"get-payload","msg":"cluster-config-operator process completed","time":"0s","output":"Error: unknown flag: --rendered-manifest-files\n"}

This looks like a mismatch between the payload version and the flags, is the test upgrading from a 4.14 to 4.14 or from 4.13 to 4.14? If it is upgrading from 4.13, we need to make sure it only passes the --rendered-manifest-files flag to the 4.14 payload not the 4.13 payload

@enxebre enxebre force-pushed the feature-gate-nodepools branch 4 times, most recently from 482658f to 07d031b Compare June 13, 2023 12:49
%[1]s render \
--config-output-file config \
--asset-input-dir %[2]s/input \
--asset-output-dir %[2]s/output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to pass --payload-version here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, only for >= 4.14

@muraee
Copy link
Contributor

muraee commented Jun 13, 2023

lgtm

@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

/test e2e-aws-4-12

Ignore, this should make no difference as this cpo is not running there.

@csrwng
Copy link
Contributor

csrwng commented Jun 13, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2023
@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2023
Currently we render the feature gate at the HCP level which results in a discrepancy when mco consume it along with a given NodePool payload-version that differs from the one used by the HCP to generate it.
Related openshift#2643

CI Failure example
bootstrap.go:47] error running MCC[BOOTSTRAP]: error creating feature gate access: unable to determine features: missing desired version \"4.14.0-0.ci.test-2023-06-08-101141-ci-op-gvin17ww-initial\" in featuregates.config.openshift.io/cluster\n"}

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3724/pull-ci-openshift-machine-config-operator-master-e2e-hypershift/1666749702499995648

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3724/pull-ci-openshift-machine-config-operator-master-e2e-hypershift/1666749702499995648/artifacts/e2e-hypershift/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-wvxmd-example-fmxk8/core/pods/logs/ignition-server-768699fd7b-w49lq-ignition-server.log
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 13, 2023

New changes are detected. LGTM label has been removed.

@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

rebased to pick Red Hat Trusted App Pipeline CI fix
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 13, 2023

@enxebre: you cannot LGTM your own PR.

In response to this:

rebased to pick Red Hat Trusted App Pipeline CI fix
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre enxebre added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2023
@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

{Failed === RUN TestUpgradeControlPlane/EnsureNoCrashingPods
util.go:457: Container oauth-server in pod oauth-openshift-74bf787c59-6gh9p has a restartCount > 0 (2)

/test e2e-aws

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD af3de4f and 2 for PR HEAD 8093e27 in total

@enxebre
Copy link
Member Author

enxebre commented Jun 13, 2023

/test images
/test e2e-kubevirt-aws-ovn

@openshift-merge-robot openshift-merge-robot merged commit f8a14a8 into openshift:main Jun 14, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/testing Indicates the PR includes changes for e2e testing lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants