Create pod latency increase #54651

dashpole · 2017-10-26T19:45:08Z

The node e2e density tests have been failing consistently: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e&include-filter-by-regex=create%20a%20batch%20of%20pods
The only PRs in the diff that touch the kubelet seem related to CNI: 88975e9...03cb11f, with the likely candidate being #51250
In the run prior to it failing, the latency ranged from 3s to 4s. In the first failing run, 50% percentile latency ranged from ~16s to ~19s, which is a major increase..

cc @kubernetes/sig-node-bugs @kubernetes/sig-network-bugs
cc @dixudx

The text was updated successfully, but these errors were encountered:

dixudx · 2017-10-27T02:40:20Z

Wow, this is a huge increase.
@squeed Any comments on the performances? I wonder whether this would have any impacts on kubelet and GC.
/cc @luxas

squeed · 2017-10-27T08:52:49Z

Interesting, taking a look.

dixudx · 2017-10-27T08:55:29Z

@squeed Thanks for your help.

squeed · 2017-10-27T14:13:24Z

Ah, I think I know the issue. The new CNI plugins wait for DAD to finish before proceeding, which takes 3 seconds on Linux. Still need to do some testing (helpfully, my laptop drive died yesterday).

squeed · 2017-10-27T15:05:17Z

The address-settle adds about 1 second of delay. Is that likely the complete cause, or is it also something else? (I'm not familiar with this test)

squeed · 2017-10-27T16:53:05Z

Ah, network setup is also serialized behind a mutex; the increase from 20ms to 1020ms for network setup in the critical path is almost certainly the culprit.

The CNI plugins could avoid the DAD delay for now (since we're not doing IPv6), this will eventually become a problem when ipv6 is supported.

Since the CNI plugin execution itself doesn't need to be synchronized, I think I can move that outside the mutex.

luxas · 2017-10-27T17:07:56Z

This seems to be a v1.9-blocking issue that needs to be fixed/tracked, right?

bowei · 2017-10-30T20:27:20Z

@dnardo @freehan

freehan · 2017-10-30T20:34:11Z

Good catch... This is why we need to bump CNI version at the early stage of a release cycle.

danehans · 2017-10-31T16:29:23Z

@squeed sig-network is planning to add ipv6 alpha support for the 1.9 release.

The CNI plugin can take up to 3 seconds to execute. CNI plugins can safely be executed in parallel, so yield the lock to speed up pod creation. Fixes: #54651

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubenet: yield lock while executing CNI plugin. The CNI plugin can take up to 3 seconds to execute. CNI plugins can safely be executed in parallel, so yield the lock to speed up pod creation. This caused problems with the pod latency tests - previously, CNI plugins executed in under 20ms. Now they must wait for DAD to finish and addresses to leave tentative state. Fixes: #54651 **What this PR does / why we need it**: After upgrading CNI plugins to v0.6 in #51250, the pod latency tests began failing. This is because the plugins, in order to support IPv6, need to wait for DAD to finish. Because this delay is while the kubenet lock is held, it significantly slows down the pod creation rate. **Special notes for your reviewer**: The CNI plugins also do locking for their critical paths, so it is safe to run them concurrently. **Release note**: ```release-note NONE ```

porridge · 2017-11-06T11:38:04Z

/reopen

k8s-ci-robot · 2017-11-06T11:38:04Z

@porridge: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

porridge · 2017-11-06T11:38:13Z

/open

porridge · 2017-11-06T11:39:20Z

To be honest it does not seem like #54800 has fixed this. We are still seeing high pod startup in large ane medium scale tests: #55060 (comment)

luxas · 2017-11-06T18:45:23Z

Crazy idea: Can we do things conditionally for IPv4 and IPv6?
Does this only affect kubenet or any CNI plugin (like Weave, Flannel, etc)?

porridge · 2017-11-06T18:46:27Z

Since my knee-jerk reaction to discovering what the DAD acronym stands for was "why do we need this anyway", I think that option (3) sounds most compelling. OTOH I clearly am missing the big picture since I didn't know that there is an option for using anything else than kubenet.

yujuhong · 2017-11-06T18:58:23Z

We disable DAD when using kubenet. Since Kubenet has external guarantees that addresses won't collide, this is safe.

I thought kubenet would be deprecated sooner or later. If that's still true, does that mean we'd need yet another solution for a non-kubenet setup?

porridge · 2017-11-06T19:00:45Z

Another question from a layman: do we know why it's so slow on Linux? Is it talking to other nodes (something like gratitious ARP)? Does it have a chance to become faster in the future? Does the delay depend on the size of the network/number of nodes, etc?

squeed · 2017-11-06T19:08:03Z

One way to make it faster on Linux is to enable "optimistic DAD" (RFC4429), which allows usage of tentative addresses. This is still considered to be an "experimental" feature, and is not enabled in all kernels. IIRC, the latest Ubuntu still has it disabled. CoreOS CL, Debian do have it. This is a sysctl that would need to be set in the container's network namespace, since it's a namespaced sysctl (and new network namespaces take the default configuration, rather than inheriting from the host).

squeed · 2017-11-06T19:10:39Z

@yujuhong yeah, you're right. Hopefully by then optimistic_dad is enabled on all kernels...

luxas · 2017-11-06T19:15:00Z

I thought kubenet would be deprecated sooner or later.

it will

If that's still true

yes

does that mean we'd need yet another solution for a non-kubenet setup

probably, like @squeed said above.

pinging @kubernetes/kubernetes-release-managers already -- this MIGHT be one of the big things we need to notify people about if this doesn't get solved.

k8s-github-robot · 2017-11-07T13:29:00Z

[MILESTONENOTIFIER] Milestone Issue Current

@dashpole

Issue Labels

sig/network sig/node: Issue will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/bug: Fixes a bug discovered during the current release.

Help

shyamjvs · 2017-11-07T13:34:09Z

@spiffxp Could you approve this for the milestone?

This issue needs to be resolved urgently, as it's causing 5k-node performance job (which is release-blocking) to fail.

squeed · 2017-11-07T13:34:15Z

I'm working on disabling dad in kubenet (option 3).

Should be ready soon.

squeed · 2017-11-07T15:46:48Z

OK, PR filed.

spiffxp · 2017-11-07T16:06:24Z

/status approved-for-milestone
FYI @shyamjvs it looks like @luxas already added to this to the v1.9 milestone, and @dchen1107 already added the label manually.

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubenet: disable DAD in the container. Since kubenet externally guarantees that IP address will not conflict, we can short-circuit the kernel's normal wait. This lets us avoid the 1 second network wait. **What this PR does / why we need it**: Fixes the pod startup latency identified in #54651 and #55060 **Release note**: ```release-note NONE ```

shyamjvs · 2017-11-09T15:25:14Z

PR #55247 seems to have done the trick. The latency has decreased

from:

    {
      "data": {
        "Perc100": 4762.418815,
        "Perc50": 3308.810795,
        "Perc90": 3955.747742,
        "Perc99": 4756.553711
      },

to:

    {
      "data": {
        "Perc100": 3178.085005,
        "Perc50": 2186.109036,
        "Perc90": 2768.180504,
        "Perc99": 3126.759942
      },

Closing the issue. Thanks everyone for working on this!

luxas · 2017-11-09T17:10:41Z

@shyamjvs Do we have any coverage what happens with these numbers if you're using a 3rd party CNI provider (Weave, Calico, Flannel, whatever)? If this ONLY fixes kubenet but throws a perf penalty to everyone non-GCE, I think we should reopen this issue.

porridge · 2017-11-09T17:20:56Z

sig-scalability does not run tests with other CNI providers, AFAIK

squeed · 2017-11-09T17:27:23Z

The delay was on the CNI executable side; so it depends on the plugin being used. That said, it would be good to have that codepath under test in general.

shyamjvs · 2017-11-09T17:30:46Z

We're not scale-testing with other CNI providers. One, because it seems to be the default option being used at head. Two, because it's simply not possible to scale test different configurations exhaustively due to resource and time constraints of running large cluster tests. So we choose to validate against (mostly) the default configs.

That said, opening this issue to study those options sg. However, it would be hard to get this under sig-scale's purview atm.

porridge · 2017-11-09T17:33:45Z

I'd suggest opening a separate issue to track this, as it's unlikely to fit in 1.9.

luxas · 2017-11-09T18:42:24Z

Ok, thanks.
cc @timothysc FYI

yujuhong · 2017-11-13T21:45:51Z

I'd suggest opening a separate issue to track this, as it's unlikely to fit
in 1.9.

+1. Is this tracked anywhere?

luxas · 2017-11-13T21:52:05Z

Not AFAIK @yujuhong, please open one

…

On 13 Nov 2017, at 23:46, Yu-Ju Hong ***@***.***> wrote: I'd suggest opening a separate issue to track this, as it's unlikely to fit in 1.9. +1. Is this tracked anywhere? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Oct 26, 2017

luxas added this to the v1.9 milestone Oct 27, 2017

k8s-github-robot added the milestone/incomplete-labels label Oct 27, 2017

luxas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 27, 2017

k8s-github-robot added milestone/needs-approval and removed milestone/incomplete-labels labels Oct 27, 2017

squeed mentioned this issue Oct 30, 2017

kubenet: yield lock while executing CNI plugin. #54800

Merged

dchen1107 mentioned this issue Oct 30, 2017

Use CNI v0.6.x in Kubernetes v1.9.0 #49480

Closed

dchen1107 added status/approved-for-milestone and removed milestone/needs-approval labels Oct 30, 2017

k8s-github-robot closed this as completed in #54800 Nov 2, 2017

k8s-github-robot pushed a commit that referenced this issue Nov 2, 2017

kubenet: yield lock while executing CNI plugin.

256d6cc

The CNI plugin can take up to 3 seconds to execute. CNI plugins can safely be executed in parallel, so yield the lock to speed up pod creation. Fixes: #54651

shyamjvs added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 7, 2017

shyamjvs added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Nov 7, 2017

squeed mentioned this issue Nov 7, 2017

kubenet: disable DAD in the container. #55247

Merged

k8s-ci-robot added the status/approved-for-milestone label Nov 7, 2017

shyamjvs closed this as completed Nov 9, 2017

Random-Liu mentioned this issue Jan 22, 2018

Disable IPv6 dad by default. containerd/cri#563

Merged

purva-n mentioned this issue Jul 29, 2019

Running kubetest performance tests independently #80702

Closed

Create pod latency increase #54651

Create pod latency increase #54651

Comments

dashpole commented Oct 26, 2017 • edited

dixudx commented Oct 27, 2017

squeed commented Oct 27, 2017

dixudx commented Oct 27, 2017

squeed commented Oct 27, 2017

squeed commented Oct 27, 2017

squeed commented Oct 27, 2017

luxas commented Oct 27, 2017

bowei commented Oct 30, 2017

freehan commented Oct 30, 2017

danehans commented Oct 31, 2017

porridge commented Nov 6, 2017

k8s-ci-robot commented Nov 6, 2017

porridge commented Nov 6, 2017

porridge commented Nov 6, 2017

luxas commented Nov 6, 2017

porridge commented Nov 6, 2017 via email

yujuhong commented Nov 6, 2017

porridge commented Nov 6, 2017

squeed commented Nov 6, 2017

squeed commented Nov 6, 2017

luxas commented Nov 6, 2017

k8s-github-robot commented Nov 7, 2017

shyamjvs commented Nov 7, 2017

squeed commented Nov 7, 2017

squeed commented Nov 7, 2017

spiffxp commented Nov 7, 2017

shyamjvs commented Nov 9, 2017

luxas commented Nov 9, 2017

porridge commented Nov 9, 2017 via email

squeed commented Nov 9, 2017

shyamjvs commented Nov 9, 2017

porridge commented Nov 9, 2017 via email

luxas commented Nov 9, 2017

yujuhong commented Nov 13, 2017

luxas commented Nov 13, 2017 via email

dashpole commented Oct 26, 2017 •

edited