New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data/manifests/bootkube/cvo-overrides: Drop the explicit upstream #4112
data/manifests/bootkube/cvo-overrides: Drop the explicit upstream #4112
Conversation
The cluster-version operator has been falling back to a default URI when the ClusterVersion upstream is empty since way back [1,2], and this behavior is enshrined in the API [3]. Drop the installer-side opinion, so: * There is a single location where we version-control the default upstream (the CVO). * Folks consuming in-cluster ClusterVersion objects have one less property to distract them (unless they want to override the default, in which case it's not distracting). [1]: https://github.com/openshift/cluster-version-operator/blame/2c4931dc283c551938be1a00fee290de0b79d99c/pkg/cvo/availableupdates.go#L27-L31 [2]: openshift/cluster-version-operator@ab4d84a#diff-78f2af341fa49292dd6930f378018867R24 [3]: https://github.com/openshift/api/blame/0422dc17083e9e8df18d029f3f34322e96e9c326/config/v1/types_cluster_version.go#L56-L57
OKD installer is using a fork, eventually we'll most likely template that file to override upstream URL. So +1 from me on this |
You're not concerned about "what if a user clears |
OKD is open to experiments and there is no official support :) In any case I'd prefer to avoid forking CVO - that'd require additional tests, which are hard to keep track of as most of them are optional |
So an alternative option would be to have something about preferred upstream get baked into the release as metadata, and have the CVO load that dynamically. Not sure how that would work if in the future we decide to shard by region or some such, but we could set the metadata differently when we build OCP and OKD releases. Anyhow, sounds like we don't need to work out how to handle ODK defaulting before we're comfortable removing this for the OCP installer in this PR :). |
If the CVO team feels that the defaulting in CVO's code should be good enough, I don't see a reason why the installer would enforce this for now. If that changes we can bring it back. /approve |
/retest |
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/assign @vrutkovs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
…sion Hopefully we never actually have to stuff a CVO-generated ClusterVersion into the cluster; it's just for recovery after admins accidentally delete their existing ClusterVersion. But if we ever do hit this code, we want to push it without spec.upstream, to allow later ClusterVersion-consuming code to say "ah, user doesn't care which upstream I use, so I'll use the best default I'm aware of". This effectively pushes default choice from the CVO that creates the ClusterVersion out to the CVO that consumes the ClusterVersion, and that later CVO is almost certainly more current on which default is best. Similar to openshift/installer@c9095b3451 (data/manifests/bootkube/cvo-overrides: Drop the explicit upstream, 2020-08-28, openshift/installer#4112).
…sion Hopefully we never actually have to stuff a CVO-generated ClusterVersion into the cluster; it's just for recovery after admins accidentally delete their existing ClusterVersion. But if we ever do hit this code, we want to push it without spec.upstream, to allow later ClusterVersion-consuming code to say "ah, user doesn't care which upstream I use, so I'll use the best default I'm aware of". This effectively pushes default choice from the CVO that creates the ClusterVersion out to the CVO that consumes the ClusterVersion, and that later CVO is almost certainly more current on which default is best. Similar to openshift/installer@c9095b3451 (data/manifests/bootkube/cvo-overrides: Drop the explicit upstream, 2020-08-28, openshift/installer#4112).
In 4.1, the installer used to explicitly set upstream to our default URI. But in openshift/installer#c9095b34518a0 (data/manifests/bootkube/cvo-overrides: Drop the explicit update, 2020-08-28, openshift/installer#4112), which landed in 4.7 and was not backported, I'd stopped doing that. In clusters born in 4.7 and later, the installer will leave upstream unset, and the cluster-version operator will default to making a reasonable choice. We still need to talk about explicit upstreams in the case where folks are pointing their cluster at a local OpenShift Update Service, but this commit drops the properties where we were incidentally pointing at the default, Red-Hat-hosted location, because explicitly setting that value is an anti-pattern that makes it harder for clusters to adapt if we try to move our default location elsewhere in the future. Also restore a closing brace and dangling comma to clean up after c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974), which also removed some of the stale 'upstream' references.
In 4.1, the installer used to explicitly set upstream to our default URI. But in openshift/installer#c9095b34518a0 (data/manifests/bootkube/cvo-overrides: Drop the explicit update, 2020-08-28, openshift/installer#4112), which landed in 4.7 and was not backported, I'd stopped doing that. In clusters born in 4.7 and later, the installer will leave upstream unset, and the cluster-version operator will default to making a reasonable choice. We still need to talk about explicit upstreams in the case where folks are pointing their cluster at a local OpenShift Update Service, but this commit drops the properties where we were incidentally pointing at the default, Red-Hat-hosted location, because explicitly setting that value is an anti-pattern that makes it harder for clusters to adapt if we try to move our default location elsewhere in the future. Also restore a closing brace and dangling comma to clean up after c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974), which also removed some of the stale 'upstream' references.
In 4.1, the installer used to explicitly set upstream to our default URI. But in openshift/installer#c9095b34518a0 (data/manifests/bootkube/cvo-overrides: Drop the explicit update, 2020-08-28, openshift/installer#4112), which landed in 4.7 and was not backported, I'd stopped doing that. In clusters born in 4.7 and later, the installer will leave upstream unset, and the cluster-version operator will default to making a reasonable choice. We still need to talk about explicit upstreams in the case where folks are pointing their cluster at a local OpenShift Update Service, but this commit drops the properties where we were incidentally pointing at the default, Red-Hat-hosted location, because explicitly setting that value is an anti-pattern that makes it harder for clusters to adapt if we try to move our default location elsewhere in the future. Also restore a closing brace and dangling comma to clean up after c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974), which also removed some of the stale 'upstream' references.
In 4.1, the installer used to explicitly set upstream to our default URI. But in openshift/installer#c9095b34518a0 (data/manifests/bootkube/cvo-overrides: Drop the explicit update, 2020-08-28, openshift/installer#4112), which landed in 4.7 and was not backported, I'd stopped doing that. In clusters born in 4.7 and later, the installer will leave upstream unset, and the cluster-version operator will default to making a reasonable choice. We still need to talk about explicit upstreams in the case where folks are pointing their cluster at a local OpenShift Update Service, but this commit drops the properties where we were incidentally pointing at the default, Red-Hat-hosted location, because explicitly setting that value is an anti-pattern that makes it harder for clusters to adapt if we try to move our default location elsewhere in the future.
In 4.1, the installer used to explicitly set upstream to our default URI. But in openshift/installer#c9095b34518a0 (data/manifests/bootkube/cvo-overrides: Drop the explicit update, 2020-08-28, openshift/installer#4112), which landed in 4.7 and was not backported, I'd stopped doing that. In clusters born in 4.7 and later, the installer will leave upstream unset, and the cluster-version operator will default to making a reasonable choice. We still need to talk about explicit upstreams in the case where folks are pointing their cluster at a local OpenShift Update Service, but this commit drops the properties where we were incidentally pointing at the default, Red-Hat-hosted location, because explicitly setting that value is an anti-pattern that makes it harder for clusters to adapt if we try to move our default location elsewhere in the future.
…e throttling Since [1], clusters born in 4.7 or later default to not having an explicit spec.upstream, so they rely on the CVO's internal default. However, in that case, u.Upstream will be an empty string, while by this point in syncAvailableUpdates, upstream will have been set to the default value. This commit splits up the "do we really need to check?" logic into a number of distinct cases, and gives them more specific logging, to make it easier to understand and confirm the desired behavior: a. If we have no cached data, we need to pull a graph. b. If it's been over minimumUpdateCheckInterval since our last check, we need to pull a graph. Even if nothing has changed on our side, our data is sufficiently stale to need a refresh. c. If the channel has changed, we have different interests, and we need to pull a graph to hear what the upstream recommends for this new set of interests. d. If the upstream hasn't changed, because: i. The current upstream (explicitly or by default) matches the old explicit upstream, or ii. The current upstream (explicitly or by default) matches the default, and the old upstream was unset. then everything's the same on our side, and our cached graph is recent, so we don't need to do anything. e. Otherwise, the upstream has changed, and we need to pull a graph to see what our new guide has to suggest. Cases for upstream: * A -> A: Handled by d.i. * A -> B: Handled by e. * A -> unset (defaulted) or default: Handled by e. * Unset or default -> A: Handled by e. * Default -> default: Handled in d.i. * Default -> unset (defaulted): Handled in d.i. * Unset -> default: Handled by d.ii, new in this commit, previously resulted in an excessive pull. * Unset -> unset (defaulted): Handled by d.i, new in this commit, previously resulted in an excessive pull. [1]: openshift/installer#4112
…e throttling Since [1], clusters born in 4.7 or later default to not having an explicit spec.upstream, so they rely on the CVO's internal default. However, in that case, u.Upstream will be an empty string, while by this point in syncAvailableUpdates, upstream will have been set to the default value. This commit splits up the "do we really need to check?" logic into a number of distinct cases, and gives them more specific logging, to make it easier to understand and confirm the desired behavior: a. If we have no cached data, we need to pull a graph. b. If it's been over minimumUpdateCheckInterval since our last check, we need to pull a graph. Even if nothing has changed on our side, our data is sufficiently stale to need a refresh. c. If the channel has changed, we have different interests, and we need to pull a graph to hear what the upstream recommends for this new set of interests. d. If the upstream hasn't changed, because: i. The current upstream (explicitly or by default) matches the old explicit upstream, or ii. The current upstream (explicitly or by default) matches the default, and the old upstream was unset. then everything's the same on our side, and our cached graph is recent, so we don't need to do anything. e. Otherwise, the upstream has changed, and we need to pull a graph to see what our new guide has to suggest. Cases for upstream: * A -> A: Handled by d.i. * A -> B: Handled by e. * A -> unset (defaulted) or default: Handled by e. * Unset or default -> A: Handled by e. * Default -> default: Handled in d.i. * Default -> unset (defaulted): Handled in d.i. * Unset -> default: Handled by d.ii, new in this commit, previously resulted in an excessive pull. * Unset -> unset (defaulted): Handled by d.ii, new in this commit, previously resulted in an excessive pull. [1]: openshift/installer#4112
…e throttling Since [1], clusters born in 4.7 or later default to not having an explicit spec.upstream, so they rely on the CVO's internal default. However, in that case, u.Upstream will be an empty string, while by this point in syncAvailableUpdates, upstream will have been set to the default value. This commit splits up the "do we really need to check?" logic into a number of distinct cases, and gives them more specific logging, to make it easier to understand and confirm the desired behavior: a. If we have no cached data, we need to pull a graph. b. If it's been over minimumUpdateCheckInterval since our last check, we need to pull a graph. Even if nothing has changed on our side, our data is sufficiently stale to need a refresh. c. If the channel has changed, we have different interests, and we need to pull a graph to hear what the upstream recommends for this new set of interests. d. If the upstream hasn't changed, because: i. The current upstream (explicitly or by default) matches the old explicit upstream, or ii. The current upstream (explicitly or by default) matches the default, and the old upstream was unset. then everything's the same on our side, and our cached graph is recent, so we don't need to do anything. e. Otherwise, the upstream has changed, and we need to pull a graph to see what our new guide has to suggest. Cases for upstream: * A -> A: Handled by d.i. * A -> B: Handled by e. * A -> unset (defaulted) or default: Handled by e. * Unset or default -> A: Handled by e. * Default -> default: Handled in d.i. * Default -> unset (defaulted): Handled in d.i. * Unset -> default: Handled by d.ii, new in this commit, previously resulted in an excessive pull. * Unset -> unset (defaulted): Handled by d.ii, new in this commit, previously resulted in an excessive pull. [1]: openshift/installer#4112
Not bug-worthy, but floating now for 4.7 to centralize discussion.
The cluster-version operator has been falling back to a default URI when the ClusterVersion upstream is empty since way back, and this behavior is enshrined in the API. Drop the installer-side
opinion, so:
Related to openshift/openshift-docs#22252.
CC @vrutkovs , since I'm not sure how this will square with FCOS and a71f424. FCOS options:
Continue to override the CVO default with explicit ClusterVersion. No change from today, but breaks nominal API of:
Teach the CVO to recognize FCOS vs. OCP and use the right default for each.
Fork the CVO and add an fcos branch, as you currently do for the installer.
Thoughts?