New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1822296: Expose raft (nb-db/sb-db) election-timer and ovn-controller inactivit… #615
Bug 1822296: Expose raft (nb-db/sb-db) election-timer and ovn-controller inactivit… #615
Conversation
First thought is that we shouldn't duplicate code if we can help it. @squeed do we move back to having a separate script that we can source and call functions from? |
I was thinking the same. Frankly saying, it looked pretty ugly to me, but i didn't wanted to disrupt too much in the deployment yaml. I had another thought to align this part of the code with the upstream ovn-kubernetes Even if we move this logic to separate script (more cleaner approach), i would like to eventually get rid of this logic, because in my opinion it's unnecessary from consumer's point of view. I opened the following RFE to internal openvswitch project to provide support to set election-timer at the start-up of the raft cluster. That would make our life pretty simple. https://bugzilla.redhat.com/show_bug.cgi?id=1831755 I am opening another bugzilla against ovn to wrap this feature in ovn-ctl (which we use to start the raft clusters) so user can pass election-timer through it. |
OVN Bugzilla : https://bugzilla.redhat.com/show_bug.cgi?id=1831778 |
@vishnoianil @dcbw I thought we avoided control knobs like this. @smarterclayton made the point a while ago that we should figure out the correct setting without bothering the admin. |
@pecameron I believe that's the idea as of now. These knobs are exposed so that we can set it to some fixed value based on our scale testing, and the cluster can be deployed with those initail raft timer values. Currently there is no way to set these values at the deployment time, admin will have to find the nb/sb db leader and fire ovs-appctl command to change the timer, and also need to do the same across all the worker nodes in the deployment to change the inactivity-probe. |
As long as this is not in a public API for customers that is fine.
…On Wed, May 6, 2020 at 8:19 PM Anil Vishnoi ***@***.***> wrote:
@vishnoianil <https://github.com/vishnoianil> @dcbw
<https://github.com/dcbw> I thought we avoided control knobs like this.
@smarterclayton <https://github.com/smarterclayton> made the point a
while ago that we should figure out the correct setting without bothering
the admin.
@pecameron <https://github.com/pecameron> I believe that's the idea as of
now. These knobs are exposed so that we can set it to some fixed value
based on our scale testing, and the cluster can be deployed with those
initail raft timer values.
Currently there is no way to set these values at the deployment time,
admin will have to find the nb/sb db leader and fire ovs-appctl command to
change the timer, and also need to do the same across all the worker nodes
in the deployment to change the inactivity-probe.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#615 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI37JYFR5CQ3ANTO35UBDLRQH5AZANCNFSM4MZUAL3Q>
.
|
/retest |
1 similar comment
/retest |
@@ -161,6 +163,59 @@ spec: | |||
sleep 2 | |||
done | |||
fi | |||
|
|||
election_timer="${OVN_NB_RAFT_ELECTION_TIMER}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this all block ovn-kube master process? Would it make sense for it to be a separate container? It could just apply changes as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think setting this timer while the master pods startup will give us bit more deterministic behavior. If user start cluster with high number of worker nodes, we can get into the death loop of raft partition.
Also i believe the general guideline is that we don't want to allow user to change this bahavior dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, what I'm saying is that this should be a separate container in the master pod, rather than gating the master container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think of allowing user to change this value dynamically, i think having separate container makes sense to me. But if it's one time startup config, there is nothing much for this container to do, it will come up, set the value and die, and not sure CNO will be happy with that.
@@ -45,6 +45,12 @@ spec: | |||
value: "quay.io/openshift/origin-multus-route-override-cni:4.4" | |||
- name: OVN_IMAGE | |||
value: "quay.io/openshift/origin-ovn-kubernetes:4.3" | |||
- name: OVN_NB_RAFT_ELECTION_TIMER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added the operator configuration CRD, which is defined here: https://github.com/openshift/api/blob/master/operator/v1/types_network.go#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, that is a long process, and won't be in in time for 4.5. The problem is that these fields will be instantly overwritten by the CVO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally i don't like this PR as well. Mainly because the method that OVN provides us to set the election-timer is not very operation friendly. I already raised a RFE for OVN team to improve this and take election-timer as a parameter to ovn-ctl. That will make the entire logic pretty simple and we can write more stable logic to set the timer. So i am hoping that we will get rid of this logic, once OVN provides us a better way to set this timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'm arguing that this value should either be hard-coded in the code, or available as a configuration knob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more in the favor of having it as a configuration knob, because for raft based implementation, choosing approximately right election-timer depends on the workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @squeed that it should be hard-coded for now, until we have a better idea whether it acutally does need to be changed. It should not be user-visible configuration at any rate, especially not in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcbw In general we want to keep these numbers as minimal as possible. Currently these numbers are set for 200 nodes (for 50 pods per node workload), so the moment number of nodes increases, you will have to increase it, there is no question about it. We can hardcode it for 500 nodes as of now (60 seconds-- frankly it's very high), and live with it until and unless something else comes up that require it to change it.
For election-timer i don't have strong opinion against hardcoding, because you can change this value anyways directly on the pod if needed.
So for hardcoding what approach do we need to take ? using operator CRD? is it okay to make operator CRD to be aware of SB DB deployment model (raft or ha)?
I think this PR is basically fine as written, albeit a bit awkward (as everyone has mentioned). I'd just like to see the configuration knob somewhere else. |
@vishnoianil: This pull request references Bugzilla bug 1822296, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@vishnoianil: This pull request references Bugzilla bug 1822296, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
1 similar comment
/retest |
@vishnoianil I'm confused as to why this PR has 35 commits... Can you rebase it to get down to the single commit(s) that make the change? |
@dcbw yes i am working on it, multiple rebase and merge caused this mess. |
@dcbw done |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
19 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@vishnoianil: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@vishnoianil: All pull requests linked via external trackers have merged: openshift/cluster-network-operator#615. Bugzilla bug 1822296 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.5 |
@dcbw: #615 failed to apply on top of branch "release-4.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…de job The TargetDown alert was always getting fired for all the ovnkube-nodes because the metrics-bind-address flag was not getting passed in the exec command for the ovnkube-node container. This regression was introduced in PR openshift#615. Promethues was hence not able to scrape the up metrics. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
…de job The TargetDown alert was always getting fired for all the ovnkube-nodes because the metrics-bind-address flag was not getting passed in the exec command for the ovnkube-node container. This regression was introduced in PR openshift#615. Prometheus was hence not able to scrape the up metrics. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
…de job The TargetDown alert was always getting fired for all the ovnkube-nodes because the metrics-bind-address flag was not getting passed in the exec command for the ovnkube-node container. This regression was introduced in PR openshift#615. Prometheus was hence not able to scrape the up metrics. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
…y-probe.
These timers are currently set to fixed value based on my current observation
from scale tests. We might have to increase these values in future based on
the scale we will be supporting for upcoming release. Currently election-timer
values are limited by the raft jsonrpc inactivity-probe time of 5 seconds as well.
To further increase the election-timer value, we need to disable jsonrpc
inactivity-probe.
Signed-off-by: Anil Vishnoi avishnoi@redhat.com