Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move CVO upgrades doc to enhancements #695

Merged
merged 4 commits into from
Jul 26, 2021

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Mar 15, 2021

Moves CVO docs to enhancements. The tombstone in the original location is here: openshift/cluster-version-operator#631 . These docs cover integration with other components, what those components need to do to participate, what impact they have on each other, and how a cluster upgrades.

See https://github.com/openshift/enhancements/#is-my-thing-an-enhancement for criteria.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from deads2k after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -0,0 +1,83 @@
# Upgrades and order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go under the existing enhancements/update/ directory?

@@ -0,0 +1,83 @@
# Upgrades and order

In the CVO, upgrades are performed in the order described in the payload (lexographic), with operators only running in parallel if they share the same file prefix `0000_NN_` and differ by the next chunk `OPERATOR`, i.e. `0000_70_authentication-operator_*` and `0000_70_image-registry-operator_*` are run in parallel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are parallel graph-node entries (it might be worth bringing over this graph to clarify graph nodes). That means they could run in parallel, but they might not; it depends on how many workers are available.

Also, let's use some modern examples, like:

$ oc adm release extract --to=manifests quay.io/openshift-release-dev/ocp-release:4.7.2-x86_64
Extracted release payload from digest sha256:83fca12e93240b503f88ec192be5ff0d6dfe750f81e8b5ef71af991337d7c584 created at 2021-03-10T11:32:57Z
$ ls manifests/0000_70_* | cat
manifests/0000_70_cluster-network-operator_00_namespace.yaml
...
manifests/0000_70_cluster-network-operator_05_clusteroperator.yaml
manifests/0000_70_console-operator.crd.yaml
manifests/0000_70_dns-operator_00-cluster-role.yaml
...
manifests/0000_70_dns-operator_03-cluster-operator.yaml

So:

..., i.e. 0000_70_cluster-network-operator_* and 0000_70_dns-operator_* are parallel graph nodes, and may be run in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are parallel graph-node entries (it might be worth bringing over this graph to clarify graph nodes). That means they could run in parallel, but they might not; it depends on how many workers are available.

How much of that doc should move over? Read through, it actually answers question I've had.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the expectation is covering the API around operator-provided manifests, we should include the 0000 defaulting stuff from oc:

Operators are expected to host the config they need to be installed to a cluster in the /manifests directory in their image. This command iterates over a set of operator images and extracts those manifests into a single, ordered list of Kubernetes objects that can then be iteratively updated on a cluster by the cluster version operator when it is time to perform an update. Manifest files are renamed to 0000_70_<image_name>_<filename> by default, and an operator author that needs to provide a global-ordered file (before or after other operators) should prepend 0000_NN_<component>_ to their filename, which instructs the release builder to not assign a component prefix. Only images in the input that have the image label io.openshift.release.operator=true will have manifests loaded.

Because folks who don't care about their order in the payload can reduce some mental overhead by leaving 0000_NN_<component>_ off and relying on oc to inject it for them.

Copy link
Member

@wking wking Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of that doc should move over? Read through, it actually answers question I've had.

That doc is aimed at users trying to understand how the CVO works. That's distinct from, but not completely orthogonal to, the stuff that operator maintainers need to understand to set up their manifests. Not clear to me whether you want to attempt a single doc aimed at both, or you're scoping this to operator maintainers, or scoping this to just admins, or what.


kube-apiserver

kcm/ksched
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an enhancement-level doc, I think it's probably worth dropping the code block here and making this whole section an enumerated list with English entries:

  1. config-operator
  2. Kubernetes API-server
  3. Kubernetes controller manager and scheduler
    ...

@deads2k
Copy link
Contributor Author

deads2k commented Mar 15, 2021

I'd rather move the collateral we already have an tidy from there (unless the information is wrong). I'm happy to move whatever other CVO docs make sense to start though. https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md also looks prime for moving to me.


During upgrade we bias towards predictable ordering for operators that lack sophistication about detecting their prerequisites. Over time, operators should be better at detecting their prerequisites without overcomplicating or risking the predictability of upgrades.

Currently, upgrades proceed in operator order without distinguishing between node and control plane components. Future improvements may allow nodes to upgrade independently and at different schedules to reduce production impact. This in turn complicates the investment operator teams must make in testing and understanding how to version their control plane components independently of node infrastructure.
Copy link
Member

@wking wking Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth including this blurb:

Ordering is only applied during upgrades, where some components rely on another component being updated first. As a convenience, the CVO guarantees that components at an earlier run level will be created or updated before your component is invoked. Note however that components without ClusterOperator objects defined may not be fully deployed when your component is executed, so always ensure your prerequisites know that they must correctly obey the ClusterOperator protocol to be available. More sophisticated components should observe the prerequisite ClusterOperators directly and use the versions field to enforce safety.

What about the Kind hints? The extension restrictions? The create-only annotation? The `image-references docs? Probably also worth linking the profile enhancement, because the CVO will ignore manifests that don't declare themselves to be members of the current profile.

- 10-29 - Kubernetes operators (master team)
- 30-39 - Machine API
- 50-59 - Operator-lifecycle manager
- 60-69 - OpenShift core operators (master team)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is stale vs. the one in upgrades.md. Can we drop it?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2021
@deads2k deads2k removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 23, 2021
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2021
@sdodson
Copy link
Member

sdodson commented Jul 26, 2021

/approve
Cause it makes sense and this was agreed upon.
/lgtm cancel
Because it doesn't pass the linter, need to update to lass the linter.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 26, 2021
@deads2k
Copy link
Contributor Author

deads2k commented Jul 26, 2021

serve the machine! linter is now happy, reapplying lgtm

@deads2k deads2k added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2021
@openshift-merge-robot openshift-merge-robot merged commit f978768 into openshift:master Jul 26, 2021
LalatenduMohanty added a commit to LalatenduMohanty/cluster-version-operator that referenced this pull request Jul 26, 2021
As part of openshift#631 CVO docs has been removed as the docs are moved to
enhancements repository [1]

[1] openshift/enhancements#695

Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
wking added a commit to wking/openshift-enhancements that referenced this pull request Sep 22, 2022
…oesn't create

The "resources to create" wording was presumably a typo back in
openshift/cluster-version-operator@a2ded12726 (Update readme to
include fast fill logic, 2020-04-27,
openshift/cluster-version-operator#363), and we pulled it over here
with f8a9ee5 (straight move of cluster-version-operator docs
folder, 2021-07-23, openshift#695).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants