Bug 1822296: Expose raft (nb-db/sb-db) election-timer and ovn-controller inactivit… #615

vishnoianil · 2020-05-05T14:30:56Z

…y-probe.

These timers are currently set to fixed value based on my current observation
from scale tests. We might have to increase these values in future based on
the scale we will be supporting for upcoming release. Currently election-timer
values are limited by the raft jsonrpc inactivity-probe time of 5 seconds as well.
To further increase the election-timer value, we need to disable jsonrpc
inactivity-probe.

Signed-off-by: Anil Vishnoi avishnoi@redhat.com

vishnoianil · 2020-05-05T14:31:52Z

@dcbw @trozet PTAL, thanks.

dcbw · 2020-05-05T14:52:20Z

First thought is that we shouldn't duplicate code if we can help it. @squeed do we move back to having a separate script that we can source and call functions from?

vishnoianil · 2020-05-05T15:20:01Z

First thought is that we shouldn't duplicate code if we can help it. @squeed do we move back to having a separate script that we can source and call functions from?

I was thinking the same. Frankly saying, it looked pretty ugly to me, but i didn't wanted to disrupt too much in the deployment yaml. I had another thought to align this part of the code with the upstream ovn-kubernetes ovndb-raft-functions + ovnkube.sh approach, but problem with that is the way we define our daemonsets for ovnkube-master and ovnkube-node, so we probably will have to manage our own ovnkube.sh, which i believe is not desired because we don't push any custom code in our downstream ovn-kubernetes (everything is contained in CNO).

Even if we move this logic to separate script (more cleaner approach), i would like to eventually get rid of this logic, because in my opinion it's unnecessary from consumer's point of view. I opened the following RFE to internal openvswitch project to provide support to set election-timer at the start-up of the raft cluster. That would make our life pretty simple.

https://bugzilla.redhat.com/show_bug.cgi?id=1831755

I am opening another bugzilla against ovn to wrap this feature in ovn-ctl (which we use to start the raft clusters) so user can pass election-timer through it.

vishnoianil · 2020-05-05T15:39:14Z

OVN Bugzilla : https://bugzilla.redhat.com/show_bug.cgi?id=1831778

bindata/network/ovn-kubernetes/ovnkube-master.yaml

pecameron · 2020-05-05T18:35:31Z

@vishnoianil @dcbw I thought we avoided control knobs like this. @smarterclayton made the point a while ago that we should figure out the correct setting without bothering the admin.

vishnoianil · 2020-05-07T00:19:12Z

@vishnoianil @dcbw I thought we avoided control knobs like this. @smarterclayton made the point a while ago that we should figure out the correct setting without bothering the admin.

@pecameron I believe that's the idea as of now. These knobs are exposed so that we can set it to some fixed value based on our scale testing, and the cluster can be deployed with those initail raft timer values.

Currently there is no way to set these values at the deployment time, admin will have to find the nb/sb db leader and fire ovs-appctl command to change the timer, and also need to do the same across all the worker nodes in the deployment to change the inactivity-probe.

smarterclayton · 2020-05-07T00:58:27Z

As long as this is not in a public API for customers that is fine.

…

On Wed, May 6, 2020 at 8:19 PM Anil Vishnoi ***@***.***> wrote: @vishnoianil <https://github.com/vishnoianil> @dcbw <https://github.com/dcbw> I thought we avoided control knobs like this. @smarterclayton <https://github.com/smarterclayton> made the point a while ago that we should figure out the correct setting without bothering the admin. @pecameron <https://github.com/pecameron> I believe that's the idea as of now. These knobs are exposed so that we can set it to some fixed value based on our scale testing, and the cluster can be deployed with those initail raft timer values. Currently there is no way to set these values at the deployment time, admin will have to find the nb/sb db leader and fire ovs-appctl command to change the timer, and also need to do the same across all the worker nodes in the deployment to change the inactivity-probe. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#615 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI37JYFR5CQ3ANTO35UBDLRQH5AZANCNFSM4MZUAL3Q> .

manifests/0000_70_cluster-network-operator_03_deployment.yaml

vishnoianil · 2020-05-11T14:44:43Z

/retest

vishnoianil · 2020-05-12T14:52:51Z

/retest

squeed · 2020-05-26T15:15:22Z

bindata/network/ovn-kubernetes/ovnkube-master.yaml

@@ -161,6 +163,59 @@ spec:
                  sleep 2
                  done
                fi
+
+                election_timer="${OVN_NB_RAFT_ELECTION_TIMER}"


Should this all block ovn-kube master process? Would it make sense for it to be a separate container? It could just apply changes as necessary.

I think setting this timer while the master pods startup will give us bit more deterministic behavior. If user start cluster with high number of worker nodes, we can get into the death loop of raft partition.
Also i believe the general guideline is that we don't want to allow user to change this bahavior dynamically.

Right, what I'm saying is that this should be a separate container in the master pod, rather than gating the master container.

If we think of allowing user to change this value dynamically, i think having separate container makes sense to me. But if it's one time startup config, there is nothing much for this container to do, it will come up, set the value and die, and not sure CNO will be happy with that.

squeed · 2020-05-26T15:16:48Z

manifests/0000_70_cluster-network-operator_03_deployment.yaml

@@ -45,6 +45,12 @@ spec:
          value: "quay.io/openshift/origin-multus-route-override-cni:4.4"
        - name: OVN_IMAGE
          value: "quay.io/openshift/origin-ovn-kubernetes:4.3"
+        - name: OVN_NB_RAFT_ELECTION_TIMER


This should be added the operator configuration CRD, which is defined here: https://github.com/openshift/api/blob/master/operator/v1/types_network.go#L37

However, that is a long process, and won't be in in time for 4.5. The problem is that these fields will be instantly overwritten by the CVO.

Personally i don't like this PR as well. Mainly because the method that OVN provides us to set the election-timer is not very operation friendly. I already raised a RFE for OVN team to improve this and take election-timer as a parameter to ovn-ctl. That will make the entire logic pretty simple and we can write more stable logic to set the timer. So i am hoping that we will get rid of this logic, once OVN provides us a better way to set this timer.

That makes sense. I'm arguing that this value should either be hard-coded in the code, or available as a configuration knob.

I am more in the favor of having it as a configuration knob, because for raft based implementation, choosing approximately right election-timer depends on the workload.

I agree with @squeed that it should be hard-coded for now, until we have a better idea whether it acutally does need to be changed. It should not be user-visible configuration at any rate, especially not in the API.

@dcbw In general we want to keep these numbers as minimal as possible. Currently these numbers are set for 200 nodes (for 50 pods per node workload), so the moment number of nodes increases, you will have to increase it, there is no question about it. We can hardcode it for 500 nodes as of now (60 seconds-- frankly it's very high), and live with it until and unless something else comes up that require it to change it.
For election-timer i don't have strong opinion against hardcoding, because you can change this value anyways directly on the pod if needed.
So for hardcoding what approach do we need to take ? using operator CRD? is it okay to make operator CRD to be aware of SB DB deployment model (raft or ha)?

squeed · 2020-05-26T16:26:44Z

I think this PR is basically fine as written, albeit a bit awkward (as everyone has mentioned). I'd just like to see the configuration knob somewhere else.

openshift-ci-robot · 2020-05-27T03:34:48Z

@vishnoianil: This pull request references Bugzilla bug 1822296, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1822296: Expose raft (nb-db/sb-db) election-timer and ovn-controller inactivit…

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vishnoianil · 2020-05-27T03:36:36Z

/bugzilla refresh

openshift-ci-robot · 2020-05-27T03:36:39Z

@vishnoianil: This pull request references Bugzilla bug 1822296, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vishnoianil · 2020-05-27T22:28:19Z

/retest

knobunc · 2020-05-28T13:22:19Z

/retest

dcbw · 2020-06-01T22:08:51Z

@vishnoianil I'm confused as to why this PR has 35 commits... Can you rebase it to get down to the single commit(s) that make the change?

vishnoianil · 2020-06-02T01:12:26Z

@vishnoianil I'm confused as to why this PR has 35 commits... Can you rebase it to get down to the single commit(s) that make the change?

@dcbw yes i am working on it, multiple rebase and merge caused this mess.

vishnoianil · 2020-06-02T03:29:59Z

@vishnoianil I'm confused as to why this PR has 35 commits... Can you rebase it to get down to the single commit(s) that make the change?

@dcbw yes i am working on it, multiple rebase and merge caused this mess.

@dcbw done

vishnoianil · 2020-06-02T21:36:23Z

/retest

openshift-bot · 2020-06-24T18:15:10Z

/retest