-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move CVO upgrades doc to enhancements #695
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -0,0 +1,83 @@ | |||
# Upgrades and order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this go under the existing enhancements/update/
directory?
@@ -0,0 +1,83 @@ | |||
# Upgrades and order | |||
|
|||
In the CVO, upgrades are performed in the order described in the payload (lexographic), with operators only running in parallel if they share the same file prefix `0000_NN_` and differ by the next chunk `OPERATOR`, i.e. `0000_70_authentication-operator_*` and `0000_70_image-registry-operator_*` are run in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are parallel graph-node entries (it might be worth bringing over this graph to clarify graph nodes). That means they could run in parallel, but they might not; it depends on how many workers are available.
Also, let's use some modern examples, like:
$ oc adm release extract --to=manifests quay.io/openshift-release-dev/ocp-release:4.7.2-x86_64
Extracted release payload from digest sha256:83fca12e93240b503f88ec192be5ff0d6dfe750f81e8b5ef71af991337d7c584 created at 2021-03-10T11:32:57Z
$ ls manifests/0000_70_* | cat
manifests/0000_70_cluster-network-operator_00_namespace.yaml
...
manifests/0000_70_cluster-network-operator_05_clusteroperator.yaml
manifests/0000_70_console-operator.crd.yaml
manifests/0000_70_dns-operator_00-cluster-role.yaml
...
manifests/0000_70_dns-operator_03-cluster-operator.yaml
So:
..., i.e.
0000_70_cluster-network-operator_*
and0000_70_dns-operator_*
are parallel graph nodes, and may be run in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are parallel graph-node entries (it might be worth bringing over this graph to clarify graph nodes). That means they could run in parallel, but they might not; it depends on how many workers are available.
How much of that doc should move over? Read through, it actually answers question I've had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if the expectation is covering the API around operator-provided manifests, we should include the 0000 defaulting stuff from oc
:
Operators are expected to host the config they need to be installed to a cluster in the
/manifests
directory in their image. This command iterates over a set of operator images and extracts those manifests into a single, ordered list of Kubernetes objects that can then be iteratively updated on a cluster by the cluster version operator when it is time to perform an update. Manifest files are renamed to0000_70_<image_name>_<filename>
by default, and an operator author that needs to provide a global-ordered file (before or after other operators) should prepend0000_NN_<component>_
to their filename, which instructs the release builder to not assign a component prefix. Only images in the input that have the image labelio.openshift.release.operator=true
will have manifests loaded.
Because folks who don't care about their order in the payload can reduce some mental overhead by leaving 0000_NN_<component>_
off and relying on oc
to inject it for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of that doc should move over? Read through, it actually answers question I've had.
That doc is aimed at users trying to understand how the CVO works. That's distinct from, but not completely orthogonal to, the stuff that operator maintainers need to understand to set up their manifests. Not clear to me whether you want to attempt a single doc aimed at both, or you're scoping this to operator maintainers, or scoping this to just admins, or what.
|
||
kube-apiserver | ||
|
||
kcm/ksched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an enhancement-level doc, I think it's probably worth dropping the code block here and making this whole section an enumerated list with English entries:
- config-operator
- Kubernetes API-server
- Kubernetes controller manager and scheduler
...
I'd rather move the collateral we already have an tidy from there (unless the information is wrong). I'm happy to move whatever other CVO docs make sense to start though. https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md also looks prime for moving to me. |
|
||
During upgrade we bias towards predictable ordering for operators that lack sophistication about detecting their prerequisites. Over time, operators should be better at detecting their prerequisites without overcomplicating or risking the predictability of upgrades. | ||
|
||
Currently, upgrades proceed in operator order without distinguishing between node and control plane components. Future improvements may allow nodes to upgrade independently and at different schedules to reduce production impact. This in turn complicates the investment operator teams must make in testing and understanding how to version their control plane components independently of node infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth including this blurb:
Ordering is only applied during upgrades, where some components rely on another component being updated first. As a convenience, the CVO guarantees that components at an earlier run level will be created or updated before your component is invoked. Note however that components without
ClusterOperator
objects defined may not be fully deployed when your component is executed, so always ensure your prerequisites know that they must correctly obey theClusterOperator
protocol to be available. More sophisticated components should observe the prerequisiteClusterOperator
s directly and use theversions
field to enforce safety.
What about the Kind
hints? The extension restrictions? The create-only
annotation? The `image-references docs? Probably also worth linking the profile enhancement, because the CVO will ignore manifests that don't declare themselves to be members of the current profile.
- 10-29 - Kubernetes operators (master team) | ||
- 30-39 - Machine API | ||
- 50-59 - Operator-lifecycle manager | ||
- 60-69 - OpenShift core operators (master team) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is stale vs. the one in upgrades.md
. Can we drop it?
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
serve the machine! linter is now happy, reapplying lgtm |
As part of openshift#631 CVO docs has been removed as the docs are moved to enhancements repository [1] [1] openshift/enhancements#695 Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
…oesn't create The "resources to create" wording was presumably a typo back in openshift/cluster-version-operator@a2ded12726 (Update readme to include fast fill logic, 2020-04-27, openshift/cluster-version-operator#363), and we pulled it over here with f8a9ee5 (straight move of cluster-version-operator docs folder, 2021-07-23, openshift#695).
Moves CVO docs to enhancements. The tombstone in the original location is here: openshift/cluster-version-operator#631 . These docs cover integration with other components, what those components need to do to participate, what impact they have on each other, and how a cluster upgrades.
See https://github.com/openshift/enhancements/#is-my-thing-an-enhancement for criteria.