-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create pod latency increase #54651
Comments
Interesting, taking a look. |
@squeed Thanks for your help. |
Ah, I think I know the issue. The new CNI plugins wait for DAD to finish before proceeding, which takes 3 seconds on Linux. Still need to do some testing (helpfully, my laptop drive died yesterday). |
The address-settle adds about 1 second of delay. Is that likely the complete cause, or is it also something else? (I'm not familiar with this test) |
Ah, network setup is also serialized behind a mutex; the increase from 20ms to 1020ms for network setup in the critical path is almost certainly the culprit. The CNI plugins could avoid the DAD delay for now (since we're not doing IPv6), this will eventually become a problem when ipv6 is supported. Since the CNI plugin execution itself doesn't need to be synchronized, I think I can move that outside the mutex. |
This seems to be a v1.9-blocking issue that needs to be fixed/tracked, right? |
Good catch... This is why we need to bump CNI version at the early stage of a release cycle. |
@squeed sig-network is planning to add ipv6 alpha support for the 1.9 release. |
The CNI plugin can take up to 3 seconds to execute. CNI plugins can safely be executed in parallel, so yield the lock to speed up pod creation. Fixes: #54651
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubenet: yield lock while executing CNI plugin. The CNI plugin can take up to 3 seconds to execute. CNI plugins can safely be executed in parallel, so yield the lock to speed up pod creation. This caused problems with the pod latency tests - previously, CNI plugins executed in under 20ms. Now they must wait for DAD to finish and addresses to leave tentative state. Fixes: #54651 **What this PR does / why we need it**: After upgrading CNI plugins to v0.6 in #51250, the pod latency tests began failing. This is because the plugins, in order to support IPv6, need to wait for DAD to finish. Because this delay is while the kubenet lock is held, it significantly slows down the pod creation rate. **Special notes for your reviewer**: The CNI plugins also do locking for their critical paths, so it is safe to run them concurrently. **Release note**: ```release-note NONE ```
/reopen |
@porridge: you can't re-open an issue/PR unless you authored it or you are assigned to it. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/open |
To be honest it does not seem like #54800 has fixed this. We are still seeing high pod startup in large ane medium scale tests: #55060 (comment) |
Crazy idea: Can we do things conditionally for IPv4 and IPv6? |
Since my knee-jerk reaction to discovering what the DAD acronym stands for
was "why do we need this anyway", I think that option (3) sounds most
compelling.
OTOH I clearly am missing the big picture since I didn't know that there is
an option for using anything else than kubenet.
|
I thought kubenet would be deprecated sooner or later. If that's still true, does that mean we'd need yet another solution for a non-kubenet setup? |
Another question from a layman: do we know why it's so slow on Linux? Is it talking to other nodes (something like gratitious ARP)? Does it have a chance to become faster in the future? Does the delay depend on the size of the network/number of nodes, etc? |
One way to make it faster on Linux is to enable "optimistic DAD" (RFC4429), which allows usage of tentative addresses. This is still considered to be an "experimental" feature, and is not enabled in all kernels. IIRC, the latest Ubuntu still has it disabled. CoreOS CL, Debian do have it. This is a sysctl that would need to be set in the container's network namespace, since it's a namespaced sysctl (and new network namespaces take the default configuration, rather than inheriting from the host). |
@yujuhong yeah, you're right. Hopefully by then optimistic_dad is enabled on all kernels... |
it will
yes
probably, like @squeed said above. pinging @kubernetes/kubernetes-release-managers already -- this MIGHT be one of the big things we need to notify people about if this doesn't get solved. |
[MILESTONENOTIFIER] Milestone Issue Current Issue Labels
|
@spiffxp Could you approve this for the milestone? This issue needs to be resolved urgently, as it's causing 5k-node performance job (which is release-blocking) to fail. |
I'm working on disabling dad in kubenet (option 3). Should be ready soon. |
OK, PR filed. |
/status approved-for-milestone |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubenet: disable DAD in the container. Since kubenet externally guarantees that IP address will not conflict, we can short-circuit the kernel's normal wait. This lets us avoid the 1 second network wait. **What this PR does / why we need it**: Fixes the pod startup latency identified in #54651 and #55060 **Release note**: ```release-note NONE ```
PR #55247 seems to have done the trick. The latency has decreased from:
to:
Closing the issue. Thanks everyone for working on this! |
@shyamjvs Do we have any coverage what happens with these numbers if you're using a 3rd party CNI provider (Weave, Calico, Flannel, whatever)? If this ONLY fixes kubenet but throws a perf penalty to everyone non-GCE, I think we should reopen this issue. |
sig-scalability does not run tests with other CNI providers, AFAIK
|
The delay was on the CNI executable side; so it depends on the plugin being used. That said, it would be good to have that codepath under test in general. |
We're not scale-testing with other CNI providers. One, because it seems to be the default option being used at head. Two, because it's simply not possible to scale test different configurations exhaustively due to resource and time constraints of running large cluster tests. So we choose to validate against (mostly) the default configs. That said, opening this issue to study those options sg. However, it would be hard to get this under sig-scale's purview atm. |
I'd suggest opening a separate issue to track this, as it's unlikely to fit
in 1.9.
|
Ok, thanks. |
+1. Is this tracked anywhere? |
Not AFAIK @yujuhong, please open one
… On 13 Nov 2017, at 23:46, Yu-Ju Hong ***@***.***> wrote:
I'd suggest opening a separate issue to track this, as it's unlikely to fit
in 1.9.
+1. Is this tracked anywhere?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The node e2e density tests have been failing consistently: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e&include-filter-by-regex=create%20a%20batch%20of%20pods
The only PRs in the diff that touch the kubelet seem related to CNI: 88975e9...03cb11f, with the likely candidate being #51250
In the run prior to it failing, the latency ranged from 3s to 4s. In the first failing run, 50% percentile latency ranged from ~16s to ~19s, which is a major increase..
cc @kubernetes/sig-node-bugs @kubernetes/sig-network-bugs
cc @dixudx
The text was updated successfully, but these errors were encountered: