Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: add architecture overview, remove outdated HACKING guide. #1078

Merged
merged 1 commit into from Aug 3, 2021
Merged

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Apr 28, 2021

Adds an architecture overview. This isn't a detailed reference, rather an overview of the intentions and structure of the code.

/cc @danwinship
/cc @rcarrillocruz
/cc @vpickard
/cc @dcbw

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2021
Copy link

@vpickard vpickard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to update the HACKING guide in a separate PR? It seems that an updated version would be useful to folks making changes for CNO, agree?

@squeed
Copy link
Contributor Author

squeed commented Apr 28, 2021

@vpickard most of the things in HACKING are now in the wiki, and are more correct. So I'd rather keep those sorts of things in a non-ci-gated wiki.

@dougbtv
Copy link
Member

dougbtv commented Apr 28, 2021

I was looking for the same, but Casey's right, we have a run-locally reference in the wiki @ https://github.com/openshift/cluster-network-operator/wiki/Running-a-local-cluster-network-operator-for-plugin-development#run-hackrun-locallysh-to-start-a-cluster-with-your-custom-image (which was probably one of the most useful things there)

@squeed
Copy link
Contributor Author

squeed commented May 3, 2021

/hold
Just to stop the retests.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2021
@squeed squeed removed the request for review from rcarrillocruz May 3, 2021 13:58
@msherif1234
Copy link
Contributor

@squeed can you add a section about debugging and troubleshooting CNO ? also can you expand on what it will take to extend CNO with new CRD, kind of workflow ?

Comment on lines +110 to +135
4. **Bootstrap** - gather existing cluster state, and create any non-Kubernetes resources (i.e. OpenStack objects)
5. **Render** - process template files in `/bindata` and generate Kubernetes objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is worth to add a subsection for the upgrade logic (and the dual-stack conversion I've added) ?
that make use of these 2 steps to retain one of the changes


## CNO as SLO

CNO is a so-called second-level-operator (SLO), which means it is installed by the Cluster Version Operator (CVO). Owing to it's critical position in the installation flow, it is installed quite early. However, no other operators wait for the CNO -- their pods just have to wait for the network to come up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"
(also, line-wrap?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is installed quite early

IIRC, the CVO originally installed things in a particular order at install time, but it no longer does. (It does still do ordering during upgrades.) It's just that most operators don't tolerate node.kubernetes.io/not-ready and so won't be scheduled until after the SDN has come up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps worth clarifying some more:

  • CNO tolerates node-role.kubernetes.io/master, node.kubernetes.io/not-ready, and node.kubernetes.io/network-unavailable, and is hostNetwork: true, to ensure that it can be started before the workers are created and before there is any SDN.
  • CNO deploys the network plugin, which has similar tolerations
    • CNO also deploys some other operands (link to operands.md) which will not be able to come up yet, because they don't have the same tolerations, or because they depend on other operators that haven't started yet.
  • Once the network plugin starts up on each node, the node untaints itself and other less-tolerant/non-host-network second-level operators become able to run there.

docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented May 10, 2021

Updated based on feedback - thanks @danwinship and @aojea. All that's left is @aojea's suggestion to talk about upgade & migration logic.

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
feel free to fix or merge

The CVO has a notion of
[run levels](https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#how-do-i-get-added-as-a-special-run-level),
which dictate the order in which components are **upgraded**. Presently, the CNO
(and thus its operands) are runlevel 07, which is comparatively early. At
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth clarifying that MCO updates very late, and thus during an upgrade the new networking components will initially be running against an N-1 RHCOS and in particular an N-1 OVS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added something for here.

docs/architecture.md Outdated Show resolved Hide resolved
docs/architecture.md Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Jul 28, 2021
@squeed
Copy link
Contributor Author

squeed commented Jul 28, 2021

@danwinship updated based on your suggestions (I assume I can't self-lgtm).

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 28, 2021

@squeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-hybrid-step-registry 9a685e2 link /test e2e-ovn-hybrid-step-registry
ci/prow/e2e-ovn-ipsec-step-registry 9a685e2 link /test e2e-ovn-ipsec-step-registry
ci/prow/e2e-azure-ovn 9a685e2 link /test e2e-azure-ovn
ci/prow/e2e-openstack-ovn 9a685e2 link /test e2e-openstack-ovn
ci/prow/e2e-gcp-ovn-upgrade 9a685e2 link /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@squeed
Copy link
Contributor Author

squeed commented Aug 3, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2021
@squeed squeed added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 3, 2021
@squeed
Copy link
Contributor Author

squeed commented Aug 3, 2021

Manually adding valid-bug, since this is a doc change.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2021

@squeed: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

  • ci/prow/

Only the following contexts were expected:

  • ci/prow/e2e-agnostic-upgrade
  • ci/prow/e2e-aws-ovn-windows
  • ci/prow/e2e-aws-sdn-multi
  • ci/prow/e2e-azure-ovn
  • ci/prow/e2e-gcp
  • ci/prow/e2e-gcp-ovn
  • ci/prow/e2e-gcp-ovn-upgrade
  • ci/prow/e2e-metal-ipi-ovn-ipv6
  • ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec
  • ci/prow/e2e-openstack-ovn
  • ci/prow/e2e-ovn-hybrid-step-registry
  • ci/prow/e2e-ovn-ipsec-step-registry
  • ci/prow/e2e-ovn-step-registry
  • ci/prow/e2e-vsphere-ovn
  • ci/prow/e2e-vsphere-windows
  • ci/prow/images
  • ci/prow/unit
  • ci/prow/verify
  • pull-ci-openshift-cluster-network-operator-release-4.1-images
  • pull-ci-openshift-cluster-network-operator-release-4.1-unit
  • pull-ci-openshift-cluster-network-operator-release-4.1-verify
  • pull-ci-openshift-cluster-network-operator-release-4.3-e2e-agnostic-upgrade
  • pull-ci-openshift-cluster-network-operator-release-4.3-e2e-gcp
  • pull-ci-openshift-cluster-network-operator-release-4.4-e2e-gcp-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.5-e2e-aws-sdn-multi
  • pull-ci-openshift-cluster-network-operator-release-4.5-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-cluster-network-operator-release-4.5-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-cluster-network-operator-release-4.5-e2e-ovn-hybrid-step-registry
  • pull-ci-openshift-cluster-network-operator-release-4.5-e2e-ovn-step-registry
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-aws-ovn-windows
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-azure-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-openstack-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.7-e2e-ovn-ipsec-step-registry
  • pull-ci-openshift-cluster-network-operator-release-4.7-e2e-vsphere-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.7-e2e-vsphere-windows
  • pull-ci-openshift-cluster-network-operator-release-4.8-e2e-metal-ipi-ovn-ipv6-ipsec
  • tide

In response to this:

/override ci/prow/e2e-aws-ovn-windows
/override ci/prow/
/override ci/prow/
/override ci/prow/
/override ci/prow/

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@squeed
Copy link
Contributor Author

squeed commented Aug 3, 2021

/override ci/prow/e2e-aws-ovn-windows
/override ci/prow/e2e-gcp
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-metal-ipi-ovn-ipv6

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2021

@squeed: Overrode contexts on behalf of squeed: ci/prow/e2e-aws-ovn-windows, ci/prow/e2e-gcp, ci/prow/e2e-gcp-ovn, ci/prow/e2e-metal-ipi-ovn-ipv6

In response to this:

/override ci/prow/e2e-aws-ovn-windows
/override ci/prow/e2e-gcp
/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-metal-ipi-ovn-ipv6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@squeed
Copy link
Contributor Author

squeed commented Aug 3, 2021

/override ci/prow/verify
/override ci/prow/unit
/override ci/prow/images
/override ci/prow/e2e-aws-sdn-multi
/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2021

@squeed: Overrode contexts on behalf of squeed: ci/prow/e2e-agnostic-upgrade, ci/prow/e2e-aws-sdn-multi, ci/prow/images, ci/prow/unit, ci/prow/verify

In response to this:

/override ci/prow/verify
/override ci/prow/unit
/override ci/prow/images
/override ci/prow/e2e-aws-sdn-multi
/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot merged commit fcdac34 into openshift:master Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants