use KIND multinode cluster , add IPv6 support, fix multus webook problem #663

aojea · 2020-06-07T14:26:28Z

for networking development and testing, using single node
environments mask a big part of the problems.

We should use multinode by default if possible.

Another improvements:

the apiserver autodiscovers the API address, so we don't
need to pass the kubeconfig
fix multus webook problem
bump KIND version to 0.8.1
allow to configure IPv6 environments
use kubectl wait instead of bash loops

hack/ovn-kind-cno.sh

hack/webhook-create-signed-cert.sh

manifests/0000_70_cluster-network-operator_03_deployment.yaml

aojea · 2020-06-08T08:20:18Z

/assign @trozet @dcbw @danwinship

Please bear in mind that my knowledge of the cluster-network-operator and openshift is minimum, so maybe some of the changes I added to the doc/script does not make sense in that context.
I especially confused about the apiserver IP, since the apiserver "autodiscovers" it, but the code mentions something a bootstrap problem, something is not happening with KIND, is this an openshift specific issue?

aojea · 2020-06-08T09:36:41Z

/retest

squeed · 2020-06-08T11:13:50Z

Ah, so you're deploying a vanilla kubernetes / KIND cluster and running the CNO on it? Neat. But tricky.

It mostly works, because the CNO needs to deploy the core networking functionality before the openshift-specific components of the cluster are running. This means we need to have a two-phase rollout in place - first the deployable set of components, then wait, then deploy the rest. In reality we're not that clever - instead, we just step over failures and set our status to Deploying until it all resolves.

So you're seeing the consequences of that decision: CNO + kind works enough to get the network up, but can't ever run to completion. It wants things like monitoring and the service-serving-cert.

So, is it worth implementing enough of the openshift functionality so that we can continue? I'm not sure; it's a lot of work. Probably better to just say "it will give you basic functionality, but don't expect everything".

danwinship · 2020-06-08T13:24:45Z

Ah, so you're deploying a vanilla kubernetes / KIND cluster and running the CNO on it? Neat. But tricky.

Note that this already exists; @trozet did the initial implementation a while back. This is just an update to what's already there.

danwinship · 2020-06-08T13:31:01Z

I especially confused about the apiserver IP, since the apiserver "autodiscovers" it, but the code mentions something a bootstrap problem, something is not happening with KIND, is this an openshift specific issue?

The issue is that when you start CNO in a pod, kubelet will tell it KUBERNETES_SERVICE_HOST=172.30.0.1 (or whatever the clusterIP of kubernetes.default is), but that doesn't actually work because there's no kube-proxy yet. We need to tell CNO what the actual direct apiserver IP is. In OCP there is a manually-maintained cloud loadbalancer IP that points back to the 3 masters, but here since there's only a single master you can just pass the IP of that node.

aojea · 2020-06-08T13:40:28Z

/hold

need to fix this #663 (comment)

aojea · 2020-06-08T16:00:28Z

/hold cancel

it works for IPv4, tested running e2e tests against the KIND cluster
PTAL

aojea · 2020-06-08T16:57:48Z

and it works with IPv6 only clusters:

and it works with IPv6
kubectl get services -A
NAMESPACE                  NAME                          TYPE        CLUSTER-IP         EXTERNAL-IP   PORT(S)                  AGE
default                    kubernetes                    ClusterIP   fd00:10:96::1      <none>        443/TCP                  27m
kube-system                kube-dns                      ClusterIP   fd00:10:96::a      <none>        53/UDP,53/TCP,9153/TCP   26m
openshift-multus           multus-admission-controller   ClusterIP   fd00:10:96::c166   <none>        443/TCP,8443/TCP         18m
openshift-multus           network-metrics-service       ClusterIP   None               <none>        8443/TCP                 18m
openshift-ovn-kubernetes   ovn-kubernetes-master         ClusterIP   None               <none>        9102/TCP                 18m
openshift-ovn-kubernetes   ovn-kubernetes-node           ClusterIP   None               <none>        9103/TCP                 18m
openshift-ovn-kubernetes   ovnkube-db                    ClusterIP   None               <none>        9641/TCP,9642/TCP        18m

aojea · 2020-06-08T22:25:17Z

/retest

aojea · 2020-06-09T06:05:37Z

/retest

aojea · 2020-06-09T19:38:29Z

/retest

nerdalert

I was able to get things up and running with BUILD_OVN=true BUILD_CNO=true ./ovn-kind-cno.sh.

I ran into issues when not building CNO with the options BUILD_OVN=true ./ovn-kind-cno.sh (also had to add the $CNO_POD back in to get this option to run):

Creating "cluster-config-v1" configMap with 1 master nodes
configmap/cluster-config-v1 created
Creating OVN CNO config
network.config.openshift.io/cluster created
Sym-linking cni dirs for node 9ef04a9e4197
Sym-linking cni dirs for node 24402e252cbf
Sym-linking cni dirs for node 93a200121f79

pod/ovs-node-dgsbq condition met
pod/ovs-node-jgdrx condition met
pod/ovs-node-nvnks condition met
timed out waiting for the condition on pods/ovnkube-master-8khjb
timed out waiting for the condition on pods/ovnkube-node-h89h5
timed out waiting for the condition on pods/ovnkube-node-krhdv
timed out waiting for the condition on pods/ovnkube-node-xtblf
OVN-k8s pods are not running

Logs at https://gist.github.com/nerdalert/451896250569e0852e28a907a35198d7

Thanks for this. I'm going to poke around on getting hybrid overlay enabled. Ty!

hack/ovn-kind-cno.sh

aojea · 2020-06-09T20:52:57Z

@nerdalert if you are working on this, feel free to reuse the parts that are valid of this PR and send everything consolidated in one PR.

for networking development and testing, using single node environments hide a big part of the problems. We should use multinode by defalt if possible. Another improvements: * the apiserver autodiscovers the API address, so we don't need to pass the kubeconfig * fix multus webook problem * bump KIND version to 0.8.1 * allow to run IPv6 only clusters * use kubectl wait instead of bash loops Signed-off-by: Antonio Ojea <aojea@redhat.com>

nerdalert · 2020-06-11T19:37:28Z

I was able to get things up and running with BUILD_OVN=true BUILD_CNO=true ./ovn-kind-cno.sh.

I ran into issues when not building CNO with the options BUILD_OVN=true ./ovn-kind-cno.sh (also had to add the $CNO_POD back in to get this option to run):
Creating "cluster-config-v1" configMap with 1 master nodes
configmap/cluster-config-v1 created
Creating OVN CNO config
network.config.openshift.io/cluster created
Sym-linking cni dirs for node 9ef04a9e4197
Sym-linking cni dirs for node 24402e252cbf
Sym-linking cni dirs for node 93a200121f79

pod/ovs-node-dgsbq condition met
pod/ovs-node-jgdrx condition met
pod/ovs-node-nvnks condition met
timed out waiting for the condition on pods/ovnkube-master-8khjb
timed out waiting for the condition on pods/ovnkube-node-h89h5
timed out waiting for the condition on pods/ovnkube-node-krhdv
timed out waiting for the condition on pods/ovnkube-node-xtblf
OVN-k8s pods are not running
Logs at https://gist.github.com/nerdalert/451896250569e0852e28a907a35198d7

Thanks for this. I'm going to poke around on getting hybrid overlay enabled. Ty!

Still stumped on why BUILD_CNO=true BUILD_OVN=true ./ovn-kind-cno.sh builds ok but BUILD_OVN=true ./ovn-kind-cno.sh does not. I went through and ran the same build with master and was able to reproduce the error there as well, so its not related to this PR. Everything looks like its working for me with regard to ovn-kubernetes for me passing the build ENVs.

LGTM with the caveat I don't know enough about Multus to say if its good to go there or not. Ty!

aojea · 2020-06-11T19:52:47Z

/retest
PTAL

aojea · 2020-06-14T22:13:30Z

Still stumped on why BUILD_CNO=true BUILD_OVN=true ./ovn-kind-cno.sh builds ok but BUILD_OVN=true ./ovn-kind-cno.sh does not. I went through and ran the same build with master and was able to reproduce the error there as well, so its not related to this PR. Everything looks like its working for me with regard to ovn-kubernetes for me passing the build ENVs.

same here, but if the images are different maybe the one that is used when is not built is outdated

    Image:         origin-ovn-kubernetes:dev
    Image ID:      sha256:6e08e12de795a32d90a0a4ddcd78ce45383232ff28e0e7762d89e8d87f02884f

LGTM with the caveat I don't know enough about Multus to say if its good to go there or not. Ty!

great
/retest

aojea · 2020-06-15T15:52:18Z

/retest

aojea · 2020-06-15T20:22:51Z

/test e2e-vsphere

aojea · 2020-06-16T07:54:10Z

this job ci/prow/e2e-vsphere — Job failed. NEVER succeded 🙃
does it make sense to run it?

juanluisvaladas · 2020-07-17T11:03:02Z

/lgtm
I've using this branch for a while, it works great. e2e-vsphere is broken anyway and not required.
@danwinship maybe we can override it?

danwinship · 2020-07-17T13:23:23Z

e2e-vsphere is broken anyway and not required.
@danwinship maybe we can override it?

if it's not required it doesn't need to be overridden

/lgtm

openshift-ci-robot · 2020-07-17T13:23:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, danwinship, juanluisvaladas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-07-17T13:53:53Z

/retest