Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use KIND multinode cluster , add IPv6 support, fix multus webook problem #663

Merged
merged 1 commit into from Jul 17, 2020

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Jun 7, 2020

for networking development and testing, using single node
environments mask a big part of the problems.

We should use multinode by default if possible.

Another improvements:

  • the apiserver autodiscovers the API address, so we don't
    need to pass the kubeconfig

  • fix multus webook problem

  • bump KIND version to 0.8.1

  • allow to configure IPv6 environments

  • use kubectl wait instead of bash loops

@aojea aojea force-pushed the kind branch 2 times, most recently from a08d8be to 0a06493 Compare June 7, 2020 22:43
@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

/assign @trozet @dcbw @danwinship

Please bear in mind that my knowledge of the cluster-network-operator and openshift is minimum, so maybe some of the changes I added to the doc/script does not make sense in that context.
I especially confused about the apiserver IP, since the apiserver "autodiscovers" it, but the code mentions something a bootstrap problem, something is not happening with KIND, is this an openshift specific issue?

@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

/retest

@squeed
Copy link
Contributor

squeed commented Jun 8, 2020

Ah, so you're deploying a vanilla kubernetes / KIND cluster and running the CNO on it? Neat. But tricky.

It mostly works, because the CNO needs to deploy the core networking functionality before the openshift-specific components of the cluster are running. This means we need to have a two-phase rollout in place - first the deployable set of components, then wait, then deploy the rest. In reality we're not that clever - instead, we just step over failures and set our status to Deploying until it all resolves.

So you're seeing the consequences of that decision: CNO + kind works enough to get the network up, but can't ever run to completion. It wants things like monitoring and the service-serving-cert.

So, is it worth implementing enough of the openshift functionality so that we can continue? I'm not sure; it's a lot of work. Probably better to just say "it will give you basic functionality, but don't expect everything".

@danwinship
Copy link
Contributor

Ah, so you're deploying a vanilla kubernetes / KIND cluster and running the CNO on it? Neat. But tricky.

Note that this already exists; @trozet did the initial implementation a while back. This is just an update to what's already there.

@danwinship
Copy link
Contributor

I especially confused about the apiserver IP, since the apiserver "autodiscovers" it, but the code mentions something a bootstrap problem, something is not happening with KIND, is this an openshift specific issue?

The issue is that when you start CNO in a pod, kubelet will tell it KUBERNETES_SERVICE_HOST=172.30.0.1 (or whatever the clusterIP of kubernetes.default is), but that doesn't actually work because there's no kube-proxy yet. We need to tell CNO what the actual direct apiserver IP is. In OCP there is a manually-maintained cloud loadbalancer IP that points back to the 3 masters, but here since there's only a single master you can just pass the IP of that node.

@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

/hold

need to fix this #663 (comment)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2020
@aojea aojea force-pushed the kind branch 2 times, most recently from aafd68a to dde38c3 Compare June 8, 2020 15:55
@aojea aojea changed the title use KIND multinode cluster and other improvements use KIND multinode cluster and other KIND improvements Jun 8, 2020
@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

/hold cancel

it works for IPv4, tested running e2e tests against the KIND cluster
PTAL

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2020
@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

and it works with IPv6 only clusters:

and it works with IPv6
kubectl get services -A
NAMESPACE                  NAME                          TYPE        CLUSTER-IP         EXTERNAL-IP   PORT(S)                  AGE
default                    kubernetes                    ClusterIP   fd00:10:96::1      <none>        443/TCP                  27m
kube-system                kube-dns                      ClusterIP   fd00:10:96::a      <none>        53/UDP,53/TCP,9153/TCP   26m
openshift-multus           multus-admission-controller   ClusterIP   fd00:10:96::c166   <none>        443/TCP,8443/TCP         18m
openshift-multus           network-metrics-service       ClusterIP   None               <none>        8443/TCP                 18m
openshift-ovn-kubernetes   ovn-kubernetes-master         ClusterIP   None               <none>        9102/TCP                 18m
openshift-ovn-kubernetes   ovn-kubernetes-node           ClusterIP   None               <none>        9103/TCP                 18m
openshift-ovn-kubernetes   ovnkube-db                    ClusterIP   None               <none>        9641/TCP,9642/TCP        18m

@aojea
Copy link
Contributor Author

aojea commented Jun 8, 2020

/retest

@aojea
Copy link
Contributor Author

aojea commented Jun 9, 2020

/retest

@aojea aojea changed the title use KIND multinode cluster and other KIND improvements use KIND multinode cluster , add IPv6 support, fix multus webook problem Jun 9, 2020
@aojea
Copy link
Contributor Author

aojea commented Jun 9, 2020

/retest

Copy link
Contributor

@nerdalert nerdalert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get things up and running with BUILD_OVN=true BUILD_CNO=true ./ovn-kind-cno.sh.

I ran into issues when not building CNO with the options BUILD_OVN=true ./ovn-kind-cno.sh (also had to add the $CNO_POD back in to get this option to run):

Creating "cluster-config-v1" configMap with 1 master nodes
configmap/cluster-config-v1 created
Creating OVN CNO config
network.config.openshift.io/cluster created
Sym-linking cni dirs for node 9ef04a9e4197
Sym-linking cni dirs for node 24402e252cbf
Sym-linking cni dirs for node 93a200121f79

pod/ovs-node-dgsbq condition met
pod/ovs-node-jgdrx condition met
pod/ovs-node-nvnks condition met
timed out waiting for the condition on pods/ovnkube-master-8khjb
timed out waiting for the condition on pods/ovnkube-node-h89h5
timed out waiting for the condition on pods/ovnkube-node-krhdv
timed out waiting for the condition on pods/ovnkube-node-xtblf
OVN-k8s pods are not running

Logs at https://gist.github.com/nerdalert/451896250569e0852e28a907a35198d7

Thanks for this. I'm going to poke around on getting hybrid overlay enabled. Ty!

hack/ovn-kind-cno.sh Show resolved Hide resolved
@aojea
Copy link
Contributor Author

aojea commented Jun 9, 2020

@nerdalert if you are working on this, feel free to reuse the parts that are valid of this PR and send everything consolidated in one PR.

for networking development and testing, using single node
environments hide a big part of the problems.

We should use multinode by defalt if possible.

Another improvements:

* the apiserver autodiscovers the API address, so we don't
  need to pass the kubeconfig

* fix multus webook problem

* bump KIND version to 0.8.1

* allow to run IPv6 only clusters

* use kubectl wait instead of bash loops

Signed-off-by: Antonio Ojea <aojea@redhat.com>
@nerdalert
Copy link
Contributor

I was able to get things up and running with BUILD_OVN=true BUILD_CNO=true ./ovn-kind-cno.sh.

I ran into issues when not building CNO with the options BUILD_OVN=true ./ovn-kind-cno.sh (also had to add the $CNO_POD back in to get this option to run):

Creating "cluster-config-v1" configMap with 1 master nodes
configmap/cluster-config-v1 created
Creating OVN CNO config
network.config.openshift.io/cluster created
Sym-linking cni dirs for node 9ef04a9e4197
Sym-linking cni dirs for node 24402e252cbf
Sym-linking cni dirs for node 93a200121f79

pod/ovs-node-dgsbq condition met
pod/ovs-node-jgdrx condition met
pod/ovs-node-nvnks condition met
timed out waiting for the condition on pods/ovnkube-master-8khjb
timed out waiting for the condition on pods/ovnkube-node-h89h5
timed out waiting for the condition on pods/ovnkube-node-krhdv
timed out waiting for the condition on pods/ovnkube-node-xtblf
OVN-k8s pods are not running

Logs at https://gist.github.com/nerdalert/451896250569e0852e28a907a35198d7

Thanks for this. I'm going to poke around on getting hybrid overlay enabled. Ty!

Still stumped on why BUILD_CNO=true BUILD_OVN=true ./ovn-kind-cno.sh builds ok but BUILD_OVN=true ./ovn-kind-cno.sh does not. I went through and ran the same build with master and was able to reproduce the error there as well, so its not related to this PR. Everything looks like its working for me with regard to ovn-kubernetes for me passing the build ENVs.

LGTM with the caveat I don't know enough about Multus to say if its good to go there or not. Ty!

@aojea
Copy link
Contributor Author

aojea commented Jun 11, 2020

/retest
PTAL

@aojea
Copy link
Contributor Author

aojea commented Jun 14, 2020

Still stumped on why BUILD_CNO=true BUILD_OVN=true ./ovn-kind-cno.sh builds ok but BUILD_OVN=true ./ovn-kind-cno.sh does not. I went through and ran the same build with master and was able to reproduce the error there as well, so its not related to this PR. Everything looks like its working for me with regard to ovn-kubernetes for me passing the build ENVs.

same here, but if the images are different maybe the one that is used when is not built is outdated

    Image:         origin-ovn-kubernetes:dev
    Image ID:      sha256:6e08e12de795a32d90a0a4ddcd78ce45383232ff28e0e7762d89e8d87f02884f

LGTM with the caveat I don't know enough about Multus to say if its good to go there or not. Ty!

great
/retest

@aojea
Copy link
Contributor Author

aojea commented Jun 15, 2020

/retest

@aojea
Copy link
Contributor Author

aojea commented Jun 15, 2020

/test e2e-vsphere

@aojea
Copy link
Contributor Author

aojea commented Jun 16, 2020

this job ci/prow/e2e-vsphere — Job failed. NEVER succeded 🙃
does it make sense to run it?

@juanluisvaladas
Copy link
Contributor

/lgtm
I've using this branch for a while, it works great. e2e-vsphere is broken anyway and not required.
@danwinship maybe we can override it?

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2020
@danwinship
Copy link
Contributor

e2e-vsphere is broken anyway and not required.
@danwinship maybe we can override it?

if it's not required it doesn't need to be overridden

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, danwinship, juanluisvaladas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 17, 2020

@aojea: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere 77b03a1 link /test e2e-vsphere

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit a70be1e into openshift:master Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants