golang based watcher for ovn-kubernetes #110

rajatchopra · 2017-04-10T21:31:44Z

Code for golang based controller that uses go-client libraries from upstream kubernetes to perform watch/get/post functions for ovn-kubernetes. This code has two main enhancements:

Replaces the watcher daemon for better reliability
Adds an IPAM for nodes as they are born, so that node setup/initialization is a one command job

To be done:

For services, only ClusterIP is supported in this code, NodePort/ExternalIP remain
Better documentation of usage
Unit tests

rajatchopra · 2017-04-10T21:34:39Z

Attn: @shettyg @salv-orlando
/cc @dcbw

Big fat PR I know, but its non-intrusive. Please take a look as I look to add callbacks for Network Policy stuff.
Will integrate this with the vagrant setup if the code looks broadly okay to all.

shettyg · 2017-04-10T21:35:07Z

Rajat,
Would you mind adding a Signed-off-by to the commit. The author as shown in 'git log' and Signed-off-by should be the same.

Signed-off-by: Rajat Chopra <rchopra@redhat.com>

rajatchopra · 2017-04-10T21:41:07Z

@shettyg just did. realised as soon as I pushed.

shettyg

I just cloned the repo. Before I go deep into the review, should we have a coding style defined? Since I do not know much about golang, do you have a recommendation here? Is golint acceptable? In Python we use flake8, since that takes care of all basic violations, it is easy to just look at important parts of the code.

shettyg · 2017-04-11T17:00:48Z

go-controller/pkg/cluster/bin/ovnkube-setup-master

+	exit 1
+fi
+
+install() {


This is very RHEL specific. Can we only call this when the platform supports yum?

shettyg · 2017-04-13T21:43:08Z

go-controller/README.md

+
+### Build
+
+Ensure go version >= 1.8


The hack/build-go.sh seems satisfied with 1.6. I suppose that needs a change.

There was a reason, but now I cannot recall it.
Will fix if I cannot recollect soon.

shettyg

I went through a few golang tutorials. So first set of comments. I am still looking through the pod and endpoint watcher.

shettyg · 2017-04-19T17:14:44Z

go-controller/pkg/cluster/node.go

+	var subnet *net.IPNet
+
+	for count > 0 {
+		if count != 30 {


This looks a little suspect. count = 30 initially. So count will never be decremented.

Good catch. Thank you. Fixed.

shettyg · 2017-04-20T16:10:46Z

go-controller/pkg/cluster/master.go

+}
+
+func calculateMasterSwitchNetwork(clusterNetwork string, hostSubnetLength uint32) (string, error) {
+	subAllocator, err := netutils.NewSubnetAllocator(clusterNetwork, hostSubnetLength, make([]string, 0))


Since the functions being called do not have comments on what they do, can we add a comment here about what they do and what is the return value for both the called functions here?

shettyg · 2017-04-20T16:12:30Z

go-controller/pkg/cluster/master.go

+			subrange = append(subrange, hostsubnet)
+		}
+	}
+	masterSwitchNetwork, err := calculateMasterSwitchNetwork(clusterNetwork.String(), hostSubnetLength)


It is not clear to me how this will work. Above, we look at all the nodes that have already been allocated a subnet. We call a subnet allocator for master. How do we know that they will not overlap?

that is what the line 'subrange = append(subrange, masterSwitchNetwork)' ensures.
The subnetAllocator will give out subnets, but will assume 'subrange' argument as one that is already taken.

I guess, I understand now. Since master.go will always be run first before any node will get subnet allocated, it takes the first subnet. Right?

Right. Subnet allocation happens only in this code anyway which is only called in the master mode.
It's not intended that we run two copies of this code in parallel, but we could keep locks in place to check for that. Right now that feels more like paranoia than perfection.

shettyg · 2017-04-20T16:19:38Z

go-controller/pkg/cluster/master.go

+		return err
+	}
+	subrange = append(subrange, masterSwitchNetwork)
+	cluster.masterSubnetAllocator, err = netutils.NewSubnetAllocator(clusterNetwork.String(), hostSubnetLength, subrange)


A comment here about what NewSubnetAllocator does would be nice (as the function itself does not have any documentation).

Yes. Done the needful.

shettyg · 2017-04-20T16:23:04Z

go-controller/pkg/cluster/master.go

+		}
+	}
+
+	cluster.SetupMaster(masterNodeName, masterSwitchNetwork)


If "master-init" is called twice, we will call the setup with 2 different subnets?

That is correct. This might break the cluster. So we need to lock it such that only one subnet is alive at one point. Added a TODO here.

I mean added a TODO at the top of this function

shettyg · 2017-04-20T16:39:49Z

go-controller/pkg/cluster/master.go

+
+	cluster.SetupMaster(masterNodeName, masterSwitchNetwork)
+
+	go utilwait.Forever(cluster.watchNodes, 0)


Can we add a comment here about what the go routine would do. If I understand right, this waits forever watching nodes and calls addNode and deleteNode functions. So a master-init will actually run forever? If so, won't we have 2 watchers in the master node? Would it make sense to add the node watcher functionality when we add other watcher?

shettyg

I am done. I am happy with it. I really cannot give any meaningful golang feedback though.

shettyg · 2017-04-20T18:28:00Z

go-controller/pkg/ovn/pods.go

+		if logical_switch != "" {
+			break
+		}
+		if count != 30 {


The check for count looks suscpect.

shettyg · 2017-04-20T18:28:51Z

go-controller/pkg/ovn/pods.go

+		p, err := oc.Kube.GetPod(pod.Namespace, pod.Name)
+		if err != nil {
+			glog.Errorf("Could not get pod %s/%s for obtaining the logical switch it belongs to", pod.Namespace, pod.Name)
+			return


Did you want a continue instead of return? I guess, I don't understand why we have the loop here?

Fair point. Added continue here.
The loop is primarily there because we are racing with the scheduler. If this controller catches the birth of a pod sooner than the scheduler, we want to give enough tries so that scheduler can assign the NodeName field for us (which decides the logical switch). Will matter in large clusters.

shettyg · 2017-04-20T18:31:56Z

go-controller/pkg/ovn/pods.go

+			break
+		}
+		glog.V(4).Infof("Error while obtaining addresses for %s - %v", portName, err)
+		time.Sleep(time.Second)


I do not see a counter decrement. Though because we use --wait=sb in the ovn-nbctl command, we are supposed to not need the retry logic here. But it does not hurt.

shettyg · 2017-04-20T19:47:30Z

go-controller/pkg/ovn/pods.go

+	err = oc.Kube.SetAnnotationOnPod(pod, "ovn", annotation)
+	if err != nil {
+		glog.Errorf("Failed to set annotation on pod %s - %v", pod.Name, err)
+	}


We currently are not handling named ports here. The python version does handle it. See "k8s_l4_port_name_cache" in overlay.py. The support necessarily need not come with the first version.

shettyg · 2017-04-20T21:40:38Z

go-controller/pkg/ovn/endpoints.go

+	kapi "k8s.io/client-go/pkg/api/v1"
+)
+
+func (ovn *OvnController) getLoadBalancer(protocol kapi.Protocol) string {


Since the load-balancer is created before watcher gets any events and it remains constant, a cache here will save ovn-nbctl calls.

Added a TODO here. Will catch it in the next sweep.

shettyg · 2017-04-20T21:43:24Z

go-controller/pkg/ovn/endpoints.go

+	// key is of the form "IP:port" (with quotes around)
+	key := fmt.Sprintf("\"%s:%d\"", serviceIP, port)
+
+	if len(ips) == 0 {


In the python watcher, we had a watcher for services too. So when a service got deleted, we would get a service delete event and that inturn would cause len(ips) to be zero. When I had tested it, I would not get endpoint events when the service itself got deleted. Has that changed now? Do you get endpoint event with zero ips when a service gets deleted (without pods getting deleted)?

In my tests, the endpoints get deleted when service is deleted. Will keep this in mind though if I see any peculiarities.

shettyg · 2017-04-20T21:45:17Z

On another note, I do not see any code for syncing. In the python version we would do it when the watcher started. Is the syncing taken care by the kubernetes library here?

…ter decrements in loops

rajatchopra · 2017-04-21T00:05:15Z

On another note, I do not see any code for syncing. In the python version we would do it when the watcher started. Is the syncing taken care by the kubernetes library here?

If you mean the re-sync on http disconnect, then yes that is what the golang client library automatically gives us. Also upon a re-sync it gives back any events that we may have missed by listing and comparing with the cache.

shettyg · 2017-04-21T15:30:22Z

@rajatchopra

Any opinion on coding style? I am personally not a fan of unlimited line lengths.

Should we be using golint?

rajatchopra · 2017-04-21T18:06:08Z

@shettyg
golint is definitely desirable. I am lousy at following any style we pick, so the police is needed, let me put up a PR to put that as part of checks within travis/otherwise.

rajatchopra · 2017-04-21T23:31:51Z

@shettyg I have lint'ed this code(the last commit). Will add the travis part in the next PR as soon as this gets in.

rajatchopra force-pushed the master branch from 6ab28b2 to 7a86535 Compare April 10, 2017 21:38

golang based watcher for ovn-kubernetes

e2d6e41

Signed-off-by: Rajat Chopra <rchopra@redhat.com>

rajatchopra force-pushed the master branch from 7a86535 to e2d6e41 Compare April 10, 2017 21:40

shettyg reviewed Apr 11, 2017

View reviewed changes

feiskyer mentioned this pull request Apr 13, 2017

Use kubernetes official python client #111

Closed

shettyg reviewed Apr 13, 2017

View reviewed changes

Install for specific OS versions only

215d8ff

shettyg reviewed Apr 20, 2017

View reviewed changes

Rajat Chopra added 2 commits April 20, 2017 18:17

Feedback from review: added comments; TODO items; logic fix with coun…

ade09f5

…ter decrements in loops

counter decrement fix

909c0f2

gometalint'ed the code

5884b70

shettyg merged commit 0ed86ae into ovn-org:master Apr 24, 2017

feiskyer mentioned this pull request Apr 25, 2017

Kubernetes network policy support #48

Closed


		cluster.SetupMaster(masterNodeName, masterSwitchNetwork)

		go utilwait.Forever(cluster.watchNodes, 0)

golang based watcher for ovn-kubernetes #110

golang based watcher for ovn-kubernetes #110

Conversation

rajatchopra commented Apr 10, 2017

rajatchopra commented Apr 10, 2017

shettyg commented Apr 10, 2017

rajatchopra commented Apr 10, 2017

shettyg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shettyg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shettyg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shettyg commented Apr 20, 2017

rajatchopra commented Apr 21, 2017

shettyg commented Apr 21, 2017 • edited

rajatchopra commented Apr 21, 2017

rajatchopra commented Apr 21, 2017

shettyg commented Apr 21, 2017 •

edited