New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more agressive acquiring the iptables lock #85771
Conversation
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-e2e-kind-ipv6 |
/test pull-kubernetes-e2e-kind-ipv6 |
in the original PR you noted:
So what changed? As I said before, I don't think the kind problem is actually a timeout getting the lock, particularly given that kubelet reliably always acquires the lock successfully and kube-proxy reliably hits the timeout. Something else is going on there. (eg like with #82587 where pkg/util/iptables was grabbing the lock itself and then calling the iptable binary which also tried to grab the lock but was guaranteed to fail since we had already grabbed it). |
I was investigating more the kind issue #85727 and as you say has to be something different ... However, I could observe that the values used in this PR are similar to the previous values used in kube-proxy when it didn't have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks more palatable to me - @danwinship has the final LGTM
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-integration |
/test pull-kubernetes-e2e-gce |
iptables has two options to modify the behaviour trying to acquire the lock. --wait -w [seconds] maximum wait to acquire xtables lock before give up --wait-interval -W [usecs] wait time to try to acquire xtables lock interval to wait for xtables lock default is 1 second Kubernetes uses -w 5 that means that wait 5 seconds to try to acquire the lock. If we are not able to acquire it, kube-proxy fails and retries in 30 seconds, that is an important penalty on sensitive applications. We can be a bit more aggresive and try to acquire the lock every 100 msec, that means that we have to fail 50 times to not being able to succeed.
/test pull-kubernetes-e2e-kind-ipv6 |
/retest |
/test pull-kubernetes-e2e-kind-ipv6 |
@danwinship this is green - you hold LGTM |
/lgtm |
What type of PR is this?
/kind flake
What this PR does / why we need it:
if kube-proxy is not able to get the lock it means a maximum penalty of 35 sec, 5 secs waiting to get the lock and 30 secs to retry
Currently, kubernetes uses the iptables -w 5 option, waiting 5 seconds to acquire the lock
We can be more aggressive trying to acquire the lock using an smaller interval
We can reproduce this situation using flock to hold the lock
acquire the lock in iptables
observe iptables behaviour with -W = 100000
observe iptables behaviour with -W = 10000
remove the lock
exec 3>&-
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Seems this was the previous behaviour based on
kubernetes/pkg/util/iptables/iptables_linux.go
Line 74 in 650c797
and the one implemented for flushing the chains
kubernetes/pkg/util/iptables/iptables.go
Lines 528 to 533 in 650c797
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: