New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: add architecture overview, remove outdated HACKING guide. #1078
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: squeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to update the HACKING guide in a separate PR? It seems that an updated version would be useful to folks making changes for CNO, agree?
@vpickard most of the things in HACKING are now in the wiki, and are more correct. So I'd rather keep those sorts of things in a non-ci-gated wiki. |
I was looking for the same, but Casey's right, we have a run-locally reference in the wiki @ https://github.com/openshift/cluster-network-operator/wiki/Running-a-local-cluster-network-operator-for-plugin-development#run-hackrun-locallysh-to-start-a-cluster-with-your-custom-image (which was probably one of the most useful things there) |
/hold |
@squeed can you add a section about debugging and troubleshooting CNO ? also can you expand on what it will take to extend CNO with new CRD, kind of workflow ? |
4. **Bootstrap** - gather existing cluster state, and create any non-Kubernetes resources (i.e. OpenStack objects) | ||
5. **Render** - process template files in `/bindata` and generate Kubernetes objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is worth to add a subsection for the upgrade logic (and the dual-stack conversion I've added) ?
that make use of these 2 steps to retain one of the changes
docs/architecture.md
Outdated
|
||
## CNO as SLO | ||
|
||
CNO is a so-called second-level-operator (SLO), which means it is installed by the Cluster Version Operator (CVO). Owing to it's critical position in the installation flow, it is installed quite early. However, no other operators wait for the CNO -- their pods just have to wait for the network to come up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"its"
(also, line-wrap?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is installed quite early
IIRC, the CVO originally installed things in a particular order at install time, but it no longer does. (It does still do ordering during upgrades.) It's just that most operators don't tolerate node.kubernetes.io/not-ready
and so won't be scheduled until after the SDN has come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps worth clarifying some more:
- CNO tolerates
node-role.kubernetes.io/master
,node.kubernetes.io/not-ready
, andnode.kubernetes.io/network-unavailable
, and ishostNetwork: true
, to ensure that it can be started before the workers are created and before there is any SDN. - CNO deploys the network plugin, which has similar tolerations
- CNO also deploys some other operands (link to operands.md) which will not be able to come up yet, because they don't have the same tolerations, or because they depend on other operators that haven't started yet.
- Once the network plugin starts up on each node, the node untaints itself and other less-tolerant/non-host-network second-level operators become able to run there.
Updated based on feedback - thanks @danwinship and @aojea. All that's left is @aojea's suggestion to talk about upgade & migration logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
feel free to fix or merge
The CVO has a notion of | ||
[run levels](https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level), | ||
which dictate the order in which components are **upgraded**. Presently, the CNO | ||
(and thus its operands) are runlevel 07, which is comparatively early. At |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth clarifying that MCO updates very late, and thus during an upgrade the new networking components will initially be running against an N-1 RHCOS and in particular an N-1 OVS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added something for here.
@danwinship updated based on your suggestions (I assume I can't self-lgtm). |
@squeed: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, squeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
Manually adding valid-bug, since this is a doc change. |
@squeed: /override requires a failed status context or a job name to operate on.
Only the following contexts were expected:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-aws-ovn-windows |
@squeed: Overrode contexts on behalf of squeed: ci/prow/e2e-aws-ovn-windows, ci/prow/e2e-gcp, ci/prow/e2e-gcp-ovn, ci/prow/e2e-metal-ipi-ovn-ipv6 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/verify |
@squeed: Overrode contexts on behalf of squeed: ci/prow/e2e-agnostic-upgrade, ci/prow/e2e-aws-sdn-multi, ci/prow/images, ci/prow/unit, ci/prow/verify In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Adds an architecture overview. This isn't a detailed reference, rather an overview of the intentions and structure of the code.
/cc @danwinship
/cc @rcarrillocruz
/cc @vpickard
/cc @dcbw