Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet OOM killing in 'g1-small' node during huge-cluster perf test #47865

Closed
shyamjvs opened this issue Jun 21, 2017 · 37 comments
Closed

Kubelet OOM killing in 'g1-small' node during huge-cluster perf test #47865

shyamjvs opened this issue Jun 21, 2017 · 37 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Milestone

Comments

@shyamjvs
Copy link
Member

While running scalability tests today (as part of #47344) on a 4000-node GCE cluster, this happened during density test termination. Currently, load test is running.
It failed due to some density pod's condition not being updated and on digging up a bit turned out a couple of kubelets (one where the pod was running) crashed:

I0621 08:08:26.374] Jun 21 08:08:26.372: INFO: Waiting up to 3m0s for all (but 50) nodes to be ready
I0621 08:08:27.435] Jun 21 08:08:27.435: INFO: Condition Ready of node e2e-enormous-cluster-minion-group-1-xdwx is false instead of true. Reason: NodeStatusUnknown, message: Kubelet stopped posting node status.
I0621 08:08:27.437] Jun 21 08:08:27.437: INFO: Condition Ready of node e2e-enormous-cluster-minion-group-nxl2 is false instead of true. Reason: NodeStatusUnknown, message: Kubelet stopped posting node status.
..... repeats

From the kernel logs:

Jun 21 14:52:07.298991 e2e-enormous-cluster-minion-group-nxl2 kernel: Out of memory: Kill process 13774 (event-exporter) score 1684 or sacrifice child
Jun 21 14:52:07.312821 e2e-enormous-cluster-minion-group-nxl2 kernel: Killed process 13774 (event-exporter) total-vm:1268972kB, anon-rss:1193588kB, file-rss:0kB
Jun 21 15:09:02.204129 e2e-enormous-cluster-minion-group-nxl2 kernel: fluentd invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=883
Jun 21 15:09:02.298774 e2e-enormous-cluster-minion-group-nxl2 kernel: fluentd cpuset=1e88c29d9ecdec0d6d2e380aa6cf9c7b11db5a60b0bda35ac2b3694a58232b47 mems_allowed=0
..
..
Jun 21 17:06:42.055581 e2e-enormous-cluster-minion-group-nxl2 kernel: Memory cgroup out of memory: Kill process 16497 (ip-masq-agent) score 1463 or sacrifice child
Jun 21 17:06:42.055604 e2e-enormous-cluster-minion-group-nxl2 kernel: Killed process 22398 (iptables-restor) total-vm:25744kB, anon-rss:3660kB, file-rss:0kB
Jun 21 17:07:46.960055 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables-restor invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=996
Jun 21 17:07:46.960183 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables-restor cpuset=8f72983ec1d83e25928f29a8b1ad953265489b4bf721e922db68bd70b11f2f31 mems_allowed=0
..
..
Jun 21 17:08:39.596296 e2e-enormous-cluster-minion-group-nxl2 kernel: Memory cgroup out of memory: Kill process 23412 (iptables) score 1866 or sacrifice child
Jun 21 17:08:39.596318 e2e-enormous-cluster-minion-group-nxl2 kernel: Killed process 23412 (iptables) total-vm:23460kB, anon-rss:7460kB, file-rss:4kB
Jun 21 17:08:43.074466 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=996
Jun 21 17:08:43.074553 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables cpuset=db1fd653a6e6cd30ed40ddf44828ad5159f83e75a02bee69bf8c648335b75e7e mems_allowed=0

The cluster is still running and to reach the node:
gcloud compute ssh e2e-enormous-cluster-minion-group-nxl2 --project kubernetes-scale --zone us-east1-a

cc @kubernetes/sig-node-bugs @kubernetes/sig-scalability-misc @dchen1107 @yujuhong @gmarek

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Jun 21, 2017
@shyamjvs shyamjvs added this to the v1.7 milestone Jun 21, 2017
@shyamjvs
Copy link
Member Author

/assign @dchen1107
Feel free to reassign as apt.

@shyamjvs
Copy link
Member Author

If you can confirm that the reason is due to not having large-enough nodes, we can rerun with larger ones.

@gmarek
Copy link
Contributor

gmarek commented Jun 21, 2017

This basically means that between 1.6 and 1.7 resource usage on Nodes grew enough to cause widespread OOMs on 1.7GB machines, when they're running ~30 pause Pods.

@shyamjvs
Copy link
Member Author

Seems like there are nodes crashing from time to time (a bit more even for load test I guess):

NAME                                       STATUS                     AGE       VERSION
e2e-enormous-cluster-minion-group-1-3g2q   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-2-fhh2   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-2-mt54   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-2-nhtw   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-2-zh8l   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-3-23h6   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-3-gtgh   NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a
e2e-enormous-cluster-minion-group-6000     NotReady                   6h        v1.8.0-alpha.1.73+a3501fb9948f6a

Most of them OOMs. Let's try with bigger machines tomorrow and see if the problem still persists.

@yujuhong
Copy link
Contributor

yujuhong commented Jun 21, 2017

This basically means that between 1.6 and 1.7 resource usage on Nodes grew enough to cause widespread OOMs on 1.7GB machines, when they're running ~30 pause Pods.

Yep. There are many add-on pods, so it's hard to guess which one uses more resources in 1.7 without a side-by-side comparison. There were new daemonset (ip-masq-agent) added too, so the increase in resource usage may be expected.

kube-proxy was using quite a lot of memory, but I assumed this is by design since the test created ~13k services.

$ kubectl get services --all-namespaces | wc -l
13125

The only thing that caught my attention is that ip-masq-agent got OOM killed because it exceeded its own memory limit. I think the limit might be too small for the load? /cc @dnardo

@yujuhong
Copy link
Contributor

oops. I cc the wrong person. should be @dnardo because of #46782

@dnardo
Copy link
Contributor

dnardo commented Jun 21, 2017

the limits for ip-masq-agent are pretty small so I doubt it's taking too much resources. If it was oomkilled then yeah maybe it was too small, that said I doubled it from a 24 hour max so I'm a bit surprised.

@dchen1107
Copy link
Member

@dnardo, what is the current limit? I have asked this in another issue / pr, but no one answered my question. @matchstick Are we sure we want to enable this by default for 1.7 release? I raised my concern related to this before at #46651 (comment)

@davidopp This is the concern I was talking to you yesterday about 1.7 release: newly added daemonsets on every node. This one can totally make the node useless. We need to make sure the node is large enough to include all those default daemons & daemonsets. Your spreadsheet can help answer this question. @kubernetes/kubernetes-release-managers We should include this information into our release notes.

@yujuhong
Copy link
Contributor

@dnardo, what is the current limit? I have asked this in another issue / pr, but no one answered my question. @matchstick Are we sure we want to enable this by default for 1.7 release? I raised my concern related to this before at #46651 (comment)

The memory limit is only 8MB which is pretty small. I didn't mean to say that the new daemon is the culprit. Any existing daemon on the node could've had an significant increase in the resource usage, or all of them could have collectively caused the memory to went over limit. Hard to pinpoint what's the exact cause without baseline (1.6) to compare against.

@dnardo
Copy link
Contributor

dnardo commented Jun 22, 2017

I'm less concerned about ip-masq-agent than I am this

Jun 21 17:07:46.960055 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables-restor invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=996
Jun 21 17:07:46.960183 e2e-enormous-cluster-minion-group-nxl2 kernel: iptables-restor cpuset=8f72983ec1d83e25928f29a8b1ad953265489b4bf721e922db68bd70b11f2f31 mems_allowed=0

Why is iptables-restore being killed. Wouldn't that be kube-proxy calling iptables-restore?

ip-masq-agent doesn't call that.

Lastly, even if the ip-masq-agent was killed, it wouldn't have caused any issues. It would have at least run once, and that would have setup the ip-masq rules. It would never have needed to change after that.

@dchen1107
Copy link
Member

dchen1107 commented Jun 22, 2017

Why iptables-restor 's oom_score_adj is so high: 996? If it is the children process of Kube-proxy, it should inherit the oom_score_adj from kube-proxy, which should be set to much lower value by me since 1.4 release as a temporary workaround before we have full story for #22212

Is there a regression in this release? We changed kube-proxy's oom_score_adj as a critical static pod? cc/ @vishh

k8s-github-robot pushed a commit that referenced this issue Jun 22, 2017
Automatic merge from submit-queue (batch tested with PRs 42252, 42251, 42249, 47512, 47887)

Bump the memory request/limit for ip-masq-daemon.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
issue #47865
**Special notes for your reviewer**:

**Release note**:

```release-note
```
@gmarek
Copy link
Contributor

gmarek commented Jun 22, 2017

We'll run a test using n1-standard-1s to see if they have enough memory.

@gmarek
Copy link
Contributor

gmarek commented Jun 22, 2017

This happens also on n1-standard-1 Nodes, which seems bad. Ref. #47899

@yujuhong
Copy link
Contributor

Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: [ 2003]     0  2003      257        1       4       2        0          -998 pause
Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: [ 2038]     0  2038     2439     1040       9       5        0           996 ip-masq-agent
Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: [12226]     0 12226     9707     4210      24       3        0           996 iptables-restor
Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: Memory cgroup out of memory: Kill process 12226 (iptables-restor) score 1976 or sacrifice child
Jun 22 15:04:47 e2e-enormous-cluster-minion-group-06pw kernel: Killed process 12226 (iptables-restor) total-vm:38828kB, anon-rss:15132kB, file-rss:1708kB
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel: iptables-restor invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=996
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel: iptables-restor cpuset=15f0dda763e8e64e102fb993f0dd5047554c1cf6c266f74974a093a2c564f712 mems_allowed=0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel: CPU: 0 PID: 12302 Comm: iptables-restor Not tainted 4.4.52+ #1
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  0000000000000000 ffff8800b0483ca8 ffffffff8130b7d4 ffff8800b0483d88
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  ffff8801281ab500 ffff8800b0483d18 ffffffff811a0523 ffff8800b0483ce0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  ffffffff8113d330 ffff8801281ab500 0000000000000206 ffff8800b0483cf0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel: Call Trace:
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8130b7d4>] dump_stack+0x63/0x8f
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff811a0523>] dump_header+0x65/0x1d4
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8113d330>] ? find_lock_task_mm+0x20/0xb0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8113dacd>] oom_kill_process+0x28d/0x430
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8119bbdb>] ? mem_cgroup_iter+0x1db/0x390
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8119e0e4>] mem_cgroup_out_of_memory+0x284/0x2d0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8119eb59>] mem_cgroup_oom_synchronize+0x2f9/0x310
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff81199820>] ? memory_high_write+0xc0/0xc0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff8113e1a8>] pagefault_out_of_memory+0x38/0xa0
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff81045c17>] mm_fault_error+0x77/0x150
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff810460e4>] __do_page_fault+0x3f4/0x400
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff81046112>] do_page_fault+0x22/0x30
Jun 22 15:04:52 e2e-enormous-cluster-minion-group-06pw kernel:  [<ffffffff815a94d8>] page_fault+0x28/0x30

@dnardo iptables-restore was in the ip-masq-agent's cgroups. That's what caused ip-masq-agent to be OOM killed. From the numbers above, the new limit would not be enough.

@dnardo
Copy link
Contributor

dnardo commented Jun 22, 2017

I think what might be happening is that when ip-masq-agent writes out its rules, it may be reading all the ip tables rules that are currently configured. That may explain the usage here. Let me take a look and see.

@gmarek
Copy link
Contributor

gmarek commented Jun 22, 2017

Thanks @dnardo

@shyamjvs
Copy link
Member Author

@gmarek Do we have the apiserver logs available for some 5k-node run for 1.6 somewhere? I can't find them anywhere, and they'd be useful for my debugging work. Also, any way to verify if 1.6 scale tests ran with/without services?

@dchen1107
Copy link
Member

dchen1107 commented Jun 23, 2017

We had several discussions offline related to this. Here are the summary of the decision and action items what I had: cc/ @kubernetes/kubernetes-release-managers

  1. @dnardo's comment at Kubelet OOM killing in 'g1-small' node during huge-cluster perf test #47865 (comment) is the last straw to make us decide to disable ip-masq-agent by default. On a large cluster with many services, the memory usage of ip-masq-agent can jump up to 100M, 200M, ... like kube-proxy since both of them are scaled based on # of nodes, # of services, etc. Thinking about the large cluster with many small nodes (4G Memory), before the user schedule any workloads, the overhead introduced here is unacceptable to the users.

The decision is disable ip-masq-agent for OSS k8s 1.7 release by default. @dnardo has a pending pr for this.

  1. But on another hand, we understand there are the users waiting for features: RFC 1918 and network policy. We are going to document in detail how to enable this feature by
    • deploying the daemonset and
    • turn on the kubelet flag
      and how much extra overhead the user might encounter. So that the user can make the cautious decision on this. @dnardo is going to write the doc for this.

Also @dnardo and the network team is working on how to reduce the overhead. They have several proposals already.

  1. @shyamjvs and @gmarek are going to re-run scalability tests without services tests. So that we can compare the test result with last release. But @gmarek thanks for running the test with services tests. I do have concern with ip-masq-agent's overhead, and raised a couple of times, but we don't have the data to make the final decision. Thanks.

  2. Node team has node perf dashboard. @yguo0905 is collecting the data to compare the memory usages of kubelet and docker on both 1.6 and 1.7 release. We should make sure there is no regression.

@shyamjvs
Copy link
Member Author

@dchen1107 Thanks a lot for the detailed update!

k8s-github-robot pushed a commit that referenced this issue Jun 23, 2017
Automatic merge from submit-queue

Remove limits from ip-masq-agent for now and disable ip-masq-agent in GCE

ip-masq-agent when issuing an iptables-save will read any configured iptables on the node.  This means that the ip-masq-agent's memory requirements would grow with the number of iptables (i.e. services) on the node.



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
#47865
**Special notes for your reviewer**:

**Release note**:

```release-note
```
@shyamjvs
Copy link
Member Author

FYI, I've uploaded the logs for the current run of gce-enormous-cluster to gcs (available here) and brought down the cluster. Re-kicked a new job with services and ip-masq-agent disabled this time. Let's see how much this helps.

@gmarek
Copy link
Contributor

gmarek commented Jun 23, 2017

@shyamjvs started test ~10 PM PDT (thanks a lot!), Load test should finish in ~12h, i.e. Friday 10 am PDT.

@gmarek
Copy link
Contributor

gmarek commented Jun 23, 2017

@dchen1107 - Load test passed. It's highly likely that Density test will pass as well, which means we're golden.

We'll try running those tests with services enabled, but that's not a blocker for release.

@yguo0905
Copy link
Contributor

Here are the resource usage stats for both 1.6.6 and 1.7.0.
https://docs.google.com/spreadsheets/d/1HO3okawImtgbTbvC5SKKl-5a5Y-Bb1u-FIFxO6yvfK4

@shyamjvs
Copy link
Member Author

Yup, both load and density test passed, that too with no high-latency requests. List pods 99%ile latency fell all the way from 6s to ~1.5s. Will verify this weekend if just the ip_masq_agent created this mischief or services too.

@dchen1107
Copy link
Member

@shyamjvs and @gmarek Thanks for the test result. Please share the test result with the service enabled later.

From looking at @yguo0905's data, there is not much change on memory usage footprint for both Kubelet and docker (same 1.11.2 anyway) between 1.6.6 and 1.7.0-beta3.

I am closing the issue and thanks everyone!

k8s-github-robot pushed a commit that referenced this issue Jun 24, 2017
Automatic merge from submit-queue (batch tested with PRs 47993, 47892, 47591, 47469, 47845)

Use a different env var to enable the ip-masq-agent addon.

We shouldn't mix setting the non-masq-cidr with enabling the addon.



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```

#47865
@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 24, 2017

And.. the load test failed with services enabled. We are seeing a high qps, like before (similar to #47899 (comment)). But disabling the ip-masq-agent helped by removing some ooms and the pod-status/events update requests arising due to it from kubelet. But now fluentd seems to do something similar (it was also there before iirc, but ip-masq-agent dominated).

Out of 9800 qps of 429s, 7k are from kubelet and the rest from npd. And half of those 7k requests are due to fluentd oom-killing (which kubelets respond to by sending PUT pod-status and PATCH events). The other half are PATCH node-status calls (same for npd), but that's just a consequence iiuc.

From the kernel logs on the nodes, fluentd and event-exporter seem to be oom-killed frequently. From fluentd logs it seems like it's not able to handle the log volume:

  2017-06-24 11:04:38 +0000 [warn]: suppressed same stacktrace
2017-06-24 11:05:07 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-06-24 11:05:36 +0000 error_class="Google::Apis::RateLimitError" error="rateLimitExceeded: Insufficient tokens for quota 'WriteGroup' and limit 'CLIENT_PROJECT-100s' of service 'logging.googleapis.com' for consumer 'project_number:51872839970'." plugin_id="object:3fc8e6c24bd0"
  2017-06-24 11:05:48 +0000 [warn]: suppressed same stacktrace
2017-06-24 11:05:48 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-06-24 11:05:48 +0000 error_class="Google::Apis::RateLimitError" error="rateLimitExceeded: Insufficient tokens for quota 'WriteGroup' and limit 'CLIENT_PROJECT-100s' of service 'logging.googleapis.com' for consumer 'project_number:51872839970'." plugin_id="object:3fc8e6c24bd0"
..
..

cc @crassirostris

@shyamjvs
Copy link
Member Author

We can either try running fluentd with higher memory limits or try finding and reducing the source of this high logs traffic. The only difference between this run and the last run (which passed) is enabling of services, so kubeproxy should be the one doing the mischief. We have the logging verbosity level set to v1 (https://github.com/kubernetes/test-infra/blob/master/jobs/ci-kubernetes-e2e-gce-enormous-cluster.env#L23) and still kubeproxy logs are huge.

@shyamjvs
Copy link
Member Author

kube-proxy.log was 920B without services and ~6-7 GBs (rotated logs included) with services. It's mainly because of printing out iptables rules which is too much to log for large clusters with many services.
cc @kubernetes/sig-network-misc

@crassirostris
Copy link

@shyamjvs

From the kernel logs on the nodes, fluentd and event-exporter seem to be oom-killed frequently. From fluentd logs it seems like it's not able to handle the log volume:

Those are different problems. Quota issues are expected to go away in the coming week, OOM issues are expected under the high load (more than 200KB/sec)

@gmarek
Copy link
Contributor

gmarek commented Jun 26, 2017

@bowei, @kubernetes/sig-network-bugs - KubeProxy shouldn't log this much on v1 level. @shyamjvs is going to file an issue for that.

@shyamjvs
Copy link
Member Author

shyamjvs commented Jun 26, 2017

Filed an issue.
@crassirostris Thanks for the lead.

OOM issues are expected under the high load (more than 200KB/sec)

If that's the case, we are sure to thrash fluentd even on moderately big clusters with quite some service endpoints as just this line in kube-proxy can create a logline of multiple MBs.

@spiffxp
Copy link
Member

spiffxp commented Jun 26, 2017

/remove-priority P0
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/P0 labels Jun 26, 2017
@shyamjvs
Copy link
Member Author

FYI, we are now running the test with fluentd disabled but services still enabled to check if there's any problem with kube-proxy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

9 participants