Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure FeatureGate is copied from cluster to MCO render source #2581

Merged
merged 1 commit into from Jun 1, 2023

Conversation

JoelSpeed
Copy link
Contributor

What this PR does / why we need it:

The FeatureGate is a required file for the MCO bootstrap once openshift/machine-config-operator#3688 merges.
To avoid breaking HyperShift, make sure that we pass the FeatureGate manifest into the MCC dir so that the bootstrap stage can pick it up before it becomes required.

Note this will now mean that the MCO can use feature gates in templates, which I think may have been a feature gap in how HyperShift supports FeatureGates up until now.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 22, 2023
// Write out the feature gate manifest to the MCC dir.
if err := func() error {
featureGate := &configv1.FeatureGate{}
if err := p.Client.Get(ctx, client.ObjectKey{Name: "cluster"}, featureGate); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are getting the feature gate resource from the management cluster here.
We should be getting the one from the guest cluster which you can actually infer from hcp.Spec.Configuration.
See related https://github.com/openshift/hypershift/pull/2543/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh gotcha, wasn't sure which client I had there, will rework

Copy link
Member

@enxebre enxebre May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the way this is right now, there's no trigger event / listener relation here, and this would fetch whatever happens to be there at any point in time. I think this might be good enough for starters, in future we might want to consider the feature gate input to be part of the payload version + config + pull secret signature and let it be a trigger for generating new payload appropriately.

@openshift-ci openshift-ci bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label May 22, 2023
@JoelSpeed JoelSpeed force-pushed the mco-fg-render branch 2 times, most recently from 4faa380 to a60b38c Compare May 23, 2023 08:50
@@ -467,3 +538,22 @@ func convertRegistryOverridesToCommandLineFlag(registryOverrides map[string]stri
// this is the equivalent of null on a StringToString command line variable.
return "="
}

func fetchFeatureGateScript(featureSet configv1.FeatureSet, payloadVersion string) string {
var script = `#!/bin/sh
Copy link
Member

@enxebre enxebre May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern of this approach is that if any of this criteria aren't met for "reasons", this init container will crash in loop preventing the main container from running with our clear logs or indication of what's going on.
Alternatively we could fetch this from the CPO and pass it through to make it more obvious.

Also have we considered to also run /usr/bin/cluster-config-operator render here instead of doing this?

Also what happens if the feature gate input is changed day 2?

Thoughts? cc @csrwng @sjenning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern of this approach is that if any of this criteria aren't met for "reasons", this init container will crash in loop preventing the main container from running with our clear logs or indication of what's going on.
Alternatively we could fetch this from the CPO and pass it through to make it more obvious.

The crash loop was semi deliberate, but you're right, it's not obvious. We need to make sure that when the ignition server reads the featuregate it has the correct featureset and the correct payload version present.

Also have we considered to also run /usr/bin/cluster-config-operator render here instead of doing this?

@sjenning just changed the way this works in #2585 so that there is currently only one place that the render is happening, if you would rather I can add it here as well but that would duplicate the render step, is that an issue?

Also what happens if the feature gate input is changed day 2?

If the FeatureSet changes, then the featuregate needs to be regenerated (perhaps this needs to be a field within the KASO deployment so that it gets redeployed?), if the payload version changes, then it also needs to be redeployed, but I had assumed payload version change would already trigger a rollout.

As an alternative, would it be an issue to just give the ignition server creds to fetch the featuregate directly from the guest cluster? I don't know if that goes against some design decisions within HyperShift so I'll defer to maintainers on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've switched to rendering the feature gate using CCO as in the KAS, lets see if that works

VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
// Mount the kubeconfig for the guest cluster to fetch the FeatureGate.
SecretName: hcp.Spec.KubeConfig.Name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kubeconfig is meant for external consumers. This should be using KASServiceKubeconfigSecret at most.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if KASServiceKubeconfigSecret has got permission to read the featuregate resource in the guest cluster? If not, where would I look to add that?

@JoelSpeed
Copy link
Contributor Author

/test e2e-aws

@JoelSpeed JoelSpeed force-pushed the mco-fg-render branch 3 times, most recently from f2eb393 to 9add569 Compare May 24, 2023 08:54
@JoelSpeed
Copy link
Contributor Author

Failure reported by the cluster version in e2e-aws seems unrelated

Cluster operator console should not be upgraded between minor versions: ConsoleDefaultRouteSyncUpgradeable: no ingress for host console-openshift-console.apps.example-t2r88.ci.hypershift.devcluster.openshift.com in route console in namespace openshift-console
      DownloadsDefaultRouteSyncUpgradeable: no ingress for host downloads-openshift-console.apps.example-t2r88.ci.hypershift.devcluster.openshift.com in route downloads in namespace openshift-console

Otherwise this seems like it's working, no complaints from the ignition server scripts at least

Copy link
Contributor

@sjenning sjenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things: flag rename and Lookup relocation. Thanks!

@@ -39,6 +40,7 @@ func NewRunLocalIgnitionProviderCommand() *cobra.Command {
cmd.Flags().StringVar(&opts.Image, "image", opts.Image, "Release image")
cmd.Flags().StringVar(&opts.TokenSecret, "token-secret", opts.TokenSecret, "Token secret name")
cmd.Flags().StringVar(&opts.WorkDir, "dir", opts.WorkDir, "Working directory (default: temporary dir)")
cmd.Flags().StringVar(&opts.FeatureGate, "feature-gate", opts.FeatureGate, "Path to a rendered featuregates.config.openshift.io/v1 file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe feature-gate-file for feature-gate-manifest (and struct field names to match) so we don't confuse it for the FeatureSet name or something

@JoelSpeed
Copy link
Contributor Author

Thanks for the review, I've gone through and applied the suggestions and squashed everything into a single commit, should be ready to go now I think

@@ -39,6 +40,7 @@ func NewRunLocalIgnitionProviderCommand() *cobra.Command {
cmd.Flags().StringVar(&opts.Image, "image", opts.Image, "Release image")
cmd.Flags().StringVar(&opts.TokenSecret, "token-secret", opts.TokenSecret, "Token secret name")
cmd.Flags().StringVar(&opts.WorkDir, "dir", opts.WorkDir, "Working directory (default: temporary dir)")
cmd.Flags().StringVar(&opts.FeatureGateManifest, "feature-gate-manifest", opts.FeatureGateManifest, "Path to a rendered featuregates.config.openshift.io/v1 manifest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this flag used anywhere 🤔

Copy link
Contributor Author

@JoelSpeed JoelSpeed Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 106 is using it no?

@@ -88,6 +89,7 @@ func NewStartCommand() *cobra.Command {
cmd.Flags().StringVar(&opts.Platform, "platform", "", "The cloud provider platform name")
cmd.Flags().StringVar(&opts.WorkDir, "work-dir", opts.WorkDir, "Directory in which to store transient working data")
cmd.Flags().StringVar(&opts.MetricsAddr, "metrics-addr", opts.MetricsAddr, "The address the metric endpoint binds to.")
cmd.Flags().StringVar(&opts.FeatureGate, "feature-gate", opts.FeatureGate, "Path to a rendered featuregates.config.openshift.io/v1 file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should change this to feature-gate-manifest as well

@@ -322,6 +379,7 @@ func ReconcileIgnitionServer(ctx context.Context,
"--key-file", "/var/run/secrets/ignition/serving-cert/tls.key",
"--registry-overrides", util.ConvertRegistryOverridesToCommandLineFlag(registryOverrides),
"--platform", string(hcp.Spec.Platform.Type),
"--feature-gate=/shared/99_feature-gate.yaml",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be feature-gate-manifest now right?

@@ -56,6 +56,7 @@ type Options struct {
Platform string
WorkDir string
MetricsAddr string
FeatureGate string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and FeatureGateManifest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 1, 2023

@JoelSpeed: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-kubevirt-aws-ovn 7972d49 link false /test e2e-kubevirt-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sjenning
Copy link
Contributor

sjenning commented Jun 1, 2023

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2023
@openshift-merge-robot openshift-merge-robot merged commit 040f742 into openshift:main Jun 1, 2023
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants