Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/manifests/bootkube/cvo-overrides: Drop the explicit upstream #4112

Merged

Conversation

wking
Copy link
Member

@wking wking commented Aug 28, 2020

Not bug-worthy, but floating now for 4.7 to centralize discussion.

The cluster-version operator has been falling back to a default URI when the ClusterVersion upstream is empty since way back, and this behavior is enshrined in the API. Drop the installer-side
opinion, so:

  • There is a single location where we version-control the default upstream (the CVO).
  • Folks consuming in-cluster ClusterVersion objects have one less property to distract them (unless they want to override the default, in which case it's not distracting).

Related to openshift/openshift-docs#22252.

CC @vrutkovs , since I'm not sure how this will square with FCOS and a71f424. FCOS options:

  1. Continue to override the CVO default with explicit ClusterVersion. No change from today, but breaks nominal API of:

    By default it will use the appropriate update server for the cluster"

  2. Teach the CVO to recognize FCOS vs. OCP and use the right default for each.

  3. Fork the CVO and add an fcos branch, as you currently do for the installer.

Thoughts?

The cluster-version operator has been falling back to a default URI
when the ClusterVersion upstream is empty since way back [1,2], and
this behavior is enshrined in the API [3].  Drop the installer-side
opinion, so:

* There is a single location where we version-control the default
  upstream (the CVO).
* Folks consuming in-cluster ClusterVersion objects have one less
  property to distract them (unless they want to override the default,
  in which case it's not distracting).

[1]: https://github.com/openshift/cluster-version-operator/blame/2c4931dc283c551938be1a00fee290de0b79d99c/pkg/cvo/availableupdates.go#L27-L31
[2]: openshift/cluster-version-operator@ab4d84a#diff-78f2af341fa49292dd6930f378018867R24
[3]: https://github.com/openshift/api/blame/0422dc17083e9e8df18d029f3f34322e96e9c326/config/v1/types_cluster_version.go#L56-L57
@vrutkovs
Copy link
Member

OKD installer is using a fork, eventually we'll most likely template that file to override upstream URL. So +1 from me on this

@wking
Copy link
Member Author

wking commented Aug 28, 2020

...eventually we'll most likely template that file to override upstream URL...

You're not concerned about "what if a user clears upstream in their FCOS cluster, expecting the CVO to default to 'the appropriate update server for the cluster' as defined in the API, but they get pointed at the OCP update service instead"?

@vrutkovs
Copy link
Member

OKD is open to experiments and there is no official support :) In any case I'd prefer to avoid forking CVO - that'd require additional tests, which are hard to keep track of as most of them are optional

@wking
Copy link
Member Author

wking commented Aug 28, 2020

In any case I'd prefer to avoid forking CVO...

So an alternative option would be to have something about preferred upstream get baked into the release as metadata, and have the CVO load that dynamically. Not sure how that would work if in the future we decide to shard by region or some such, but we could set the metadata differently when we build OCP and OKD releases. Anyhow, sounds like we don't need to work out how to handle ODK defaulting before we're comfortable removing this for the OCP installer in this PR :).

@abhinavdahiya
Copy link
Contributor

If the CVO team feels that the defaulting in CVO's code should be good enough, I don't see a reason why the installer would enforce this for now. If that changes we can bring it back.

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2020
@wking
Copy link
Member Author

wking commented Oct 6, 2020

/retest

@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-crc c9095b3 link /test e2e-crc
ci/prow/e2e-aws-workers-rhel7 c9095b3 link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovirt c9095b3 link /test e2e-ovirt
ci/prow/e2e-libvirt c9095b3 link /test e2e-libvirt
ci/prow/e2e-aws c9095b3 link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Nov 4, 2020

/retest

@wking
Copy link
Member Author

wking commented Nov 4, 2020

/assign @vrutkovs

@wking wking changed the title data/manifests/bootkube/cvo-overrides: Drop the explicit update data/manifests/bootkube/cvo-overrides: Drop the explicit upstream Nov 4, 2020
Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-crc c9095b3 link /test e2e-crc
ci/prow/e2e-ovirt c9095b3 link /test e2e-ovirt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit e176973 into openshift:master Nov 4, 2020
@wking wking deleted the drop-upstream-opinion branch November 4, 2020 23:34
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 20, 2021
…sion

Hopefully we never actually have to stuff a CVO-generated
ClusterVersion into the cluster; it's just for recovery after admins
accidentally delete their existing ClusterVersion.  But if we ever do
hit this code, we want to push it without spec.upstream, to allow
later ClusterVersion-consuming code to say "ah, user doesn't care
which upstream I use, so I'll use the best default I'm aware of".
This effectively pushes default choice from the CVO that creates the
ClusterVersion out to the CVO that consumes the ClusterVersion, and
that later CVO is almost certainly more current on which default is
best.

Similar to openshift/installer@c9095b3451
(data/manifests/bootkube/cvo-overrides: Drop the explicit upstream,
2020-08-28, openshift/installer#4112).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 20, 2021
…sion

Hopefully we never actually have to stuff a CVO-generated
ClusterVersion into the cluster; it's just for recovery after admins
accidentally delete their existing ClusterVersion.  But if we ever do
hit this code, we want to push it without spec.upstream, to allow
later ClusterVersion-consuming code to say "ah, user doesn't care
which upstream I use, so I'll use the best default I'm aware of".
This effectively pushes default choice from the CVO that creates the
ClusterVersion out to the CVO that consumes the ClusterVersion, and
that later CVO is almost certainly more current on which default is
best.

Similar to openshift/installer@c9095b3451
(data/manifests/bootkube/cvo-overrides: Drop the explicit upstream,
2020-08-28, openshift/installer#4112).
wking added a commit to wking/openshift-docs that referenced this pull request Nov 17, 2021
In 4.1, the installer used to explicitly set upstream to our default
URI.  But in openshift/installer#c9095b34518a0
(data/manifests/bootkube/cvo-overrides: Drop the explicit update,
2020-08-28, openshift/installer#4112), which landed in 4.7 and was not
backported, I'd stopped doing that.  In clusters born in 4.7 and
later, the installer will leave upstream unset, and the
cluster-version operator will default to making a reasonable choice.

We still need to talk about explicit upstreams in the case where folks
are pointing their cluster at a local OpenShift Update Service, but
this commit drops the properties where we were incidentally pointing
at the default, Red-Hat-hosted location, because explicitly setting
that value is an anti-pattern that makes it harder for clusters to
adapt if we try to move our default location elsewhere in the future.

Also restore a closing brace and dangling comma to clean up after
c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974),
which also removed some of the stale 'upstream' references.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Nov 17, 2021
In 4.1, the installer used to explicitly set upstream to our default
URI.  But in openshift/installer#c9095b34518a0
(data/manifests/bootkube/cvo-overrides: Drop the explicit update,
2020-08-28, openshift/installer#4112), which landed in 4.7 and was not
backported, I'd stopped doing that.  In clusters born in 4.7 and
later, the installer will leave upstream unset, and the
cluster-version operator will default to making a reasonable choice.

We still need to talk about explicit upstreams in the case where folks
are pointing their cluster at a local OpenShift Update Service, but
this commit drops the properties where we were incidentally pointing
at the default, Red-Hat-hosted location, because explicitly setting
that value is an anti-pattern that makes it harder for clusters to
adapt if we try to move our default location elsewhere in the future.

Also restore a closing brace and dangling comma to clean up after
c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974),
which also removed some of the stale 'upstream' references.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Nov 17, 2021
In 4.1, the installer used to explicitly set upstream to our default
URI.  But in openshift/installer#c9095b34518a0
(data/manifests/bootkube/cvo-overrides: Drop the explicit update,
2020-08-28, openshift/installer#4112), which landed in 4.7 and was not
backported, I'd stopped doing that.  In clusters born in 4.7 and
later, the installer will leave upstream unset, and the
cluster-version operator will default to making a reasonable choice.

We still need to talk about explicit upstreams in the case where folks
are pointing their cluster at a local OpenShift Update Service, but
this commit drops the properties where we were incidentally pointing
at the default, Red-Hat-hosted location, because explicitly setting
that value is an anti-pattern that makes it harder for clusters to
adapt if we try to move our default location elsewhere in the future.

Also restore a closing brace and dangling comma to clean up after
c0fc03d (osdocs-2368: updating 4.8 references to 4.9, 2021-10-01, openshift#36974),
which also removed some of the stale 'upstream' references.
kalexand-rh pushed a commit to openshift/openshift-docs that referenced this pull request Nov 18, 2021
In 4.1, the installer used to explicitly set upstream to our default
URI.  But in openshift/installer#c9095b34518a0
(data/manifests/bootkube/cvo-overrides: Drop the explicit update,
2020-08-28, openshift/installer#4112), which landed in 4.7 and was not
backported, I'd stopped doing that.  In clusters born in 4.7 and
later, the installer will leave upstream unset, and the
cluster-version operator will default to making a reasonable choice.

We still need to talk about explicit upstreams in the case where folks
are pointing their cluster at a local OpenShift Update Service, but
this commit drops the properties where we were incidentally pointing
at the default, Red-Hat-hosted location, because explicitly setting
that value is an anti-pattern that makes it harder for clusters to
adapt if we try to move our default location elsewhere in the future.
kalexand-rh pushed a commit to kalexand-rh/openshift-docs that referenced this pull request Nov 18, 2021
In 4.1, the installer used to explicitly set upstream to our default
URI.  But in openshift/installer#c9095b34518a0
(data/manifests/bootkube/cvo-overrides: Drop the explicit update,
2020-08-28, openshift/installer#4112), which landed in 4.7 and was not
backported, I'd stopped doing that.  In clusters born in 4.7 and
later, the installer will leave upstream unset, and the
cluster-version operator will default to making a reasonable choice.

We still need to talk about explicit upstreams in the case where folks
are pointing their cluster at a local OpenShift Update Service, but
this commit drops the properties where we were incidentally pointing
at the default, Red-Hat-hosted location, because explicitly setting
that value is an anti-pattern that makes it harder for clusters to
adapt if we try to move our default location elsewhere in the future.
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 17, 2021
…e throttling

Since [1], clusters born in 4.7 or later default to not having an
explicit spec.upstream, so they rely on the CVO's internal default.
However, in that case, u.Upstream will be an empty string, while by
this point in syncAvailableUpdates, upstream will have been set to the
default value.  This commit splits up the "do we really need to
check?" logic into a number of distinct cases, and gives them more
specific logging, to make it easier to understand and confirm the
desired behavior:

a. If we have no cached data, we need to pull a graph.
b. If it's been over minimumUpdateCheckInterval since our last check,
   we need to pull a graph.  Even if nothing has changed on our side,
   our data is sufficiently stale to need a refresh.
c. If the channel has changed, we have different interests, and we
   need to pull a graph to hear what the upstream recommends for this
   new set of interests.
d. If the upstream hasn't changed, because:
   i. The current upstream (explicitly or by default) matches the old
      explicit upstream, or
   ii. The current upstream (explicitly or by default) matches the
       default, and the old upstream was unset.
   then everything's the same on our side, and our cached graph is
   recent, so we don't need to do anything.
e. Otherwise, the upstream has changed, and we need to pull a graph to
   see what our new guide has to suggest.

Cases for upstream:

* A -> A: Handled by d.i.
* A -> B: Handled by e.
* A -> unset (defaulted) or default: Handled by e.
* Unset or default -> A: Handled by e.
* Default -> default: Handled in d.i.
* Default -> unset (defaulted): Handled in d.i.
* Unset -> default: Handled by d.ii, new in this commit, previously
  resulted in an excessive pull.
* Unset -> unset (defaulted): Handled by d.i, new in this commit,
  previously resulted in an excessive pull.

[1]: openshift/installer#4112
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 17, 2021
…e throttling

Since [1], clusters born in 4.7 or later default to not having an
explicit spec.upstream, so they rely on the CVO's internal default.
However, in that case, u.Upstream will be an empty string, while by
this point in syncAvailableUpdates, upstream will have been set to the
default value.  This commit splits up the "do we really need to
check?" logic into a number of distinct cases, and gives them more
specific logging, to make it easier to understand and confirm the
desired behavior:

a. If we have no cached data, we need to pull a graph.
b. If it's been over minimumUpdateCheckInterval since our last check,
   we need to pull a graph.  Even if nothing has changed on our side,
   our data is sufficiently stale to need a refresh.
c. If the channel has changed, we have different interests, and we
   need to pull a graph to hear what the upstream recommends for this
   new set of interests.
d. If the upstream hasn't changed, because:
   i. The current upstream (explicitly or by default) matches the old
      explicit upstream, or
   ii. The current upstream (explicitly or by default) matches the
       default, and the old upstream was unset.
   then everything's the same on our side, and our cached graph is
   recent, so we don't need to do anything.
e. Otherwise, the upstream has changed, and we need to pull a graph to
   see what our new guide has to suggest.

Cases for upstream:

* A -> A: Handled by d.i.
* A -> B: Handled by e.
* A -> unset (defaulted) or default: Handled by e.
* Unset or default -> A: Handled by e.
* Default -> default: Handled in d.i.
* Default -> unset (defaulted): Handled in d.i.
* Unset -> default: Handled by d.ii, new in this commit, previously
  resulted in an excessive pull.
* Unset -> unset (defaulted): Handled by d.ii, new in this commit,
  previously resulted in an excessive pull.

[1]: openshift/installer#4112
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 17, 2021
…e throttling

Since [1], clusters born in 4.7 or later default to not having an
explicit spec.upstream, so they rely on the CVO's internal default.
However, in that case, u.Upstream will be an empty string, while by
this point in syncAvailableUpdates, upstream will have been set to the
default value.  This commit splits up the "do we really need to
check?" logic into a number of distinct cases, and gives them more
specific logging, to make it easier to understand and confirm the
desired behavior:

a. If we have no cached data, we need to pull a graph.
b. If it's been over minimumUpdateCheckInterval since our last check,
   we need to pull a graph.  Even if nothing has changed on our side,
   our data is sufficiently stale to need a refresh.
c. If the channel has changed, we have different interests, and we
   need to pull a graph to hear what the upstream recommends for this
   new set of interests.
d. If the upstream hasn't changed, because:
   i. The current upstream (explicitly or by default) matches the old
      explicit upstream, or
   ii. The current upstream (explicitly or by default) matches the
       default, and the old upstream was unset.
   then everything's the same on our side, and our cached graph is
   recent, so we don't need to do anything.
e. Otherwise, the upstream has changed, and we need to pull a graph to
   see what our new guide has to suggest.

Cases for upstream:

* A -> A: Handled by d.i.
* A -> B: Handled by e.
* A -> unset (defaulted) or default: Handled by e.
* Unset or default -> A: Handled by e.
* Default -> default: Handled in d.i.
* Default -> unset (defaulted): Handled in d.i.
* Unset -> default: Handled by d.ii, new in this commit, previously
  resulted in an excessive pull.
* Unset -> unset (defaulted): Handled by d.ii, new in this commit,
  previously resulted in an excessive pull.

[1]: openshift/installer#4112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants