Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe how the CVO can support TechPreviewNoUpgrade #957

Merged
merged 1 commit into from
Nov 18, 2021

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Nov 15, 2021

Operators can provide a manifest annotated as .metadata.annotations["config.openshift.io/feature-gate"]="TechPreviewNoUpgrade" . If this annotation is set, the manifest is only reconciled when featuregates.config.openshift.io|.spec.featureSet="TechPreviewNoUpgrade" is set.

/assign @sdodson @wking @JoelSpeed

If `featuregates.config.openshift.io|.spec.featureSet="TechPreviewNoUpgrade"`, then the manifest is reconciled as normal
for whatever stage the CVO is in.
The `featuregates.config.openshift.io|.spec.featureSet="TechPreviewNoUpgrade"` can be included in the installation
manifests provided by a customer (we do this some CI scenarios as well), but there is no expected CVO change for bootstrapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customer-provided manifests go up through the installer to the bootstrap node and are applied by cluster-bootstrap. There's no CVO connection there.

The CVO-bootstrapping bit I'm concerned about is:

  1. Bootstrap node comes up.
  2. Bootstrap CVO launched in a static pod.
  3. Bootstrap CVO tries to fetch FeatureGates, but because it's early, that fetch fails.
  4. CVO assumes no gates, and skips TechPreviewNoUpgrade manifests.
  5. Something locks up, because bootstrapping this particular cluster depended on a tech-preview feature like "include support for $NEW_HARDWARE".

I think we can resolve my concern via either:

a. Declare that there won't be any TechPreviewNoUpgrade manifests that are needed for bootstrapping, so locking up if someone tries to add something like that is acceptable CVO behavior.
b. Declare that we might have that sort of thing, and explain how the CVO is expected to handle it (e.g. installer sets an environment variable or somehow informs the static bootstrap CVO pod that for this particular install, TechPreviewNoUpgrade is enabled).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. Declare that there won't be any TechPreviewNoUpgrade manifests that are needed for bootstrapping, so locking up if someone tries to add something like that is acceptable CVO behavior.

This is what I was trying to describe with "there is no expected CVO change for bootstrapping". How can I phrase it more clearly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop:

The featuregates.config.openshift.io|.spec.featureSet="TechPreviewNoUpgrade" can be included in the installation manifests provided by a customer...

Because while folks could do that, it's completely orthogonal to this enhancement and would have no effect unless cluster-bootstrap or the installer grows similar handling.

Instead, say something like:

During bootstrapping, the CVO will assume no feature sets are enabled until it can successfully retrieve featuregates.config.openshift.io from the Kubernetes API server.

enhancements/update/cvo-techpreview-manifests.md Outdated Show resolved Hide resolved

### Version Skew Strategy

Because upgrades are prevented when this feature is active, there is no supported skew.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can still be skew between 4.y.z -> 4.y.z', and the CVO does not block those on TechPreviewNoUpgrade today, because the current TechPreviewNoUpgrade block is based on Upgradeable=False in a ClusterOperator, and since the SCC kerfuffle in 4.3, that no longer blocks patch updates. We could teach the CVO to block patch updates on this property, or we could link the release.openshift.io/delete enhancement and point out that component maintainers have the same skew exposure as everyone else for patch updates.

5. create a clusteroperator that every customer has ignore

This fans the problem out from a single point (CVO), to every tech preview operator.
It also means that a cluster-admins have to ignore a clusteroperator in the list that is tech preview.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why this ignoring is a burden. Is it something we need to fix for the image registry operator's managementState: Unmanaged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why this ignoring is a burden. Is it something we need to fix for the image registry operator's managementState: Unmanaged?

The image-registry operator is not a techpreview operator. It is fully supported. When you search for it, you find docs, if it has a problem you know you are fully supported. Seeing a techpreview operator on your not techpreview cluster doesn't seem comparable to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so the concern isn't:

huh, a ClusterOperator for $COMPONENT? I didn't think I had that installed...

We're ok with that conflation between the inert-mode operator and its backing operands. The concern is:

huh, a ClusterOperator for $COMPONENT? I wonder what that does... Getting the docs... What?! TechPreviewNoUpgrade! My cluster is doomed!

Can we be more specific in the wording? Something like:

It also increases the chance that a cluster admin would misinterpret an inert-mode operator as tainting their cluster with unsupported, tech-preview bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note, Tech Preview features should always be documented.


### Goals

1. CVO honors an "only create or update this manifest in clusters with tech preview enabled".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be only create and update instead of only create or update. Better if we can write Create, reconcile and update

@deads2k
Copy link
Contributor Author

deads2k commented Nov 16, 2021

/retest

@JoelSpeed
Copy link
Contributor

Have given this a read through and have nothing really to comment on. This seems like a sensible improvement to CVO and would save our CAPI project a lot of effort in the long run.

+1 from me

@deads2k
Copy link
Contributor Author

deads2k commented Nov 18, 2021

/retest

## Summary

[TechPreviewNoUpgrade](https://github.com/openshift/api/blob/be1be0e89115702f8b508d351c4f5c9a16e5ae95/config/v1/types_feature.go#L32-L34)
is the canonical way to enable tech preview features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the canonical method to enable them or a result of enabling tech preview features? I thought they were enabled via per feature gates but an outcome of enabling some may be TechPreviewNoUpgrade having been set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess part of my confusion is that it's not clear to me if this line is saying what will be the case after this enhancement lands or how things are today.

5. create a clusteroperator that every customer has ignore

This fans the problem out from a single point (CVO), to every tech preview operator.
It also means that a cluster-admins have to ignore a clusteroperator in the list that is tech preview.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note, Tech Preview features should always be documented.

@sdodson
Copy link
Member

sdodson commented Nov 18, 2021

/lgtm
/approve
Some of my questions were just as I came to them but clarified later in the doc or by reading up on additional resources, so approving. Feel free to update if you think others may have the same questions I had.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2021
@openshift-merge-robot openshift-merge-robot merged commit f74ffe7 into openshift:master Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants