-
Notifications
You must be signed in to change notification settings - Fork 574
Bug 2100536: Update API to config EgressIP timeout #1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @msherif1234! Some important instructions when contributing to openshift/api: For merging purposes, this repository follows the no-Feature-Freeze process which means that in addition to the standard
OR
Who should apply these qe/docs/px labels?
|
@msherif1234: This pull request references Bugzilla bug 2100536, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3fb93a5
to
47f349e
Compare
/lgtm |
/assign @knobunc |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an enhancement that pairs with this feature change?
there was an RFE https://issues.redhat.com/browse/RFE-2889 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording wise and design wise I'm pretty happy now.
There's just one final thing. I think we should impose an upper and lower limit using +kubebuilder:validation:Minimum
and +kubebuilder:validation:Maximum
, the minimum is obviously 0, but we should choose a maximum that is sensible to prevent users set arbitrarily high values
Do you have any thoughts on what an appropriate maximum timeout might be within the context of ovnk running on OpenShift?
operator/v1/types_network.go
Outdated
// The current default is 1 second. | ||
// A value of 0 disables the EgressIP node's reachability check. | ||
// +kubebuilder:validation:Minimum=0 | ||
// +kubebuilder:validation:Maximum=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are setting a maximum of 10s, can you quickly explain why you chose that value before we merge, just want to make sure that's going to be enough (this is smaller than I thought it would be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked this value given osdn platform was shipping with 5s and given its impact if its too large I went with 10s I am checking with @MichaelWasher to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... I mean it all depends on the rety-count within this time. The concern is not that a single packet takes more than 5s/10s to complete. It's more about how many times is the packets are sent. By default with TCP, the first SYN will retry after 5.8s, so you're only getting 1 retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However reviewing the new dialer code, it looks like you're retrying ever second-ish, so that'll be ~9 retries, which should be a tonne. IMO it LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that 10s is quite short, and while it makes sense, will it always make sense? Would an average person be expecting to set such a short timeout given other timeouts in kube are normally much higher? For example Kubelet doesn't timeout for 5 minutes of inactivity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not meant to be long like kubelet for the side effects mentioned, since the default is 1sec
the upper bond shouldn't be that far off IMO hence the 10sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The side effect for this and kubelet, as I understand it, isn't particularly different. We have something that's being healthchecked, we take it down/offline at some point if the healthchecks are failing. This would be similar to pods as well with readyz checks, which are rarely as short as 10s in my experience. I think having such a low limit will confuse users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60s will be good upper limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoelSpeed , the timeout was originally 1s with no retries. 60s is a massive jump from this, however as it's the upper bound, it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: the criteria should be: how did the feature behave before we made this a knob. And that should be the value we pick (or something close to that). Not: how does any unrelated feature in the K8s API normally select timeouts. + this is a networking feature, and downtime is usually detected in ms to seconds, not in minutes.
We use this as a detection mechanism for EgressIP failover. Imagine EgressIP as a HSRP or VRRP VIP, just the other way around. All traffic to the outside world is funneled through that node from EgressIP marked pods, so potentially with a 60 seconds timeout by default, we cause those pods' traffic to be blackholed for 60 seconds in case of a node issue. So the shorter the timeout (within a reasonable delay) the better.
IMO, the criteria should be: how did the feature behave before we made this a knob. And that should be the value we pick (or something close to that). Not: how does any unrelated feature in the K8s API normally select timeouts. If your ready probe on a pod takes a minute to time out, then your one pods is affected. If the reachability check for EgressIP takes 60 seconds (or 4 minutes) to timeout, then potentially dozens of pods can't reach the outside world for that time. Add to this that customers expected this to happen within a second, and now from one version to another of the product we bump this to a value 60 times higher ... seems a lot to me. And add to this again that node reboots are actually really common in OCP due to the machine config operator taking them offline and online.
Would an average person be expecting to set such a short timeout
I'd argue they would. And that on the contrary, nobody would expect a VIP (or an inverted VIP like egressIP) to take minutes to timeout. Seconds is already a lot for a networking feature.
Just to clarify: the dialer isn't actually establishing TCP connections, right. Instead, it's calling into tcp/9 with no listener on the other side, it's get a port unreachable message or equivalent back, and thus it will know that the host on the other side is up (there would probably have been better ways of detecting that the node is up, but this is how it was done in SDN and OVNK way before this change). Regardless if the current detection mechanism is good or not, on the API side, I'd push for a shorter delay rather than a longer delay because of the aforementioned reasons.
is 1 second. A value of 0 disables the EgressIP node's | ||
reachability check. | ||
format: int32 | ||
maximum: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
After reading the discussion here + email and looking at the code I too am convinced that 60 is a reasonable value as max. |
Once the code is updated to allow up to 60s, this lgtm. |
Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
@msherif1234: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreaskaris, flavio-fernandes, jcaamano, JoelSpeed, knobunc, msherif1234 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@JoelSpeed : in that case... |
/hold cancel |
@msherif1234: All pull requests linked via external trackers have merged: Bugzilla bug 2100536 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Need to extend apis to allow configuring EgressIP node reachability timeout config
Signed-off-by: Mohamed Mahmoud mmahmoud@redhat.com