Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy constantly syncing/restoring iptables rules, consuming CPU resources #1158

Closed
atombender opened this issue Feb 20, 2017 · 46 comments
Labels
area/performance Performance related issues co/xhyve kind/bug Categorizes issue or PR as related to a bug.

Comments

@atombender
Copy link

atombender commented Feb 20, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Minikube version (use minikube version): 0.16.0

Environment:

  • OS (e.g. from /etc/os-release): macOS 10.12.3
  • VM Driver (e.g. cat ~/.minikube/machines/minikube/config.json | grep DriverName): xhyve
  • ISO version (e.g. cat ~/.minikube/machines/minikube/config.json | grep ISO): minikube-v1.0.6.iso
  • Install tools: -
  • Others: ingress addon enabled.

What happened:

localkube is consuming about 12% CPU constantly, even though no container is actively running. On host, docker-machine-driver-xhyve is consuming ~30% CPU.

Nothing much is actually running:

$ kubectl get --all-namespaces pods            
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE
kube-system   default-http-backend-27qh8       1/1       Running   0          5h
kube-system   kube-addon-manager-minikube      1/1       Running   0          5h
kube-system   kube-dns-v20-k1vv5               3/3       Running   0          5h
kube-system   kubernetes-dashboard-scg1j       1/1       Running   0          5h
kube-system   nginx-ingress-controller-mzn9l   1/1       Running   0          5h

Verified with ps that no container is using that CPU: It's all localkube. According to ps, localkube has also allocated 10GB of virtual memory.

Here is a gist with ps output, plus output from running journalctl-f for half a minute or so. It mostly shows it asking about some containers that no longer exists.

Nothing is being emitted to any container logs.

Problem re-occurs if I kill the process.

Here's an strace log (-fF -s10000 -tt). Looks like it's constantly spawning iptables, iptables-save and iptables-restore.

What you expected to happen:

I expected localkube not to use much CPU when the system idle.

How to reproduce it (as minimally and precisely as possible):

No idea. I did install Helm (the tiller controller), but I uninstalled it.

@r2d4 r2d4 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 21, 2017
@eden
Copy link
Contributor

eden commented Mar 3, 2017

This is most likely due to this issue with kube-proxy. The fix hasn't yet been released in the latest non-beta version of kubernetes.

Unfortunately trying to force localkube to use the userspace proxy by passing --extra-config=proxy.Mode=userspace to minikube start does not work. That would be a temporary workaround until kube-proxy fixed.

@dimpavloff
Copy link

@eden kubernetes/kubernetes#26637 seems to be an issue only with multiple schedulers and controller-managers so could it be something else?

@eden
Copy link
Contributor

eden commented Mar 3, 2017

@dimpavloff Initially, I thought the same thing, but after increasing the verbosity of minikube's logs to 10, I see this in localkubes logs nearly continously:

Mar 03 10:22:33 minikube localkube[3315]: I0303 10:22:33.052801    3315 proxier.go:805] Syncing iptables rules
Mar 03 10:22:33 minikube localkube[3315]: I0303 10:22:33.095894    3315 proxier.go:1103] Port "nodePort for kube-system/kubernetes-dashboard:" (:30000/tcp) was open before and is still needed
Mar 03 10:22:33 minikube localkube[3315]: I0303 10:22:33.096150    3315 proxier.go:1311] Restoring iptables rules: *filter
Mar 03 10:22:33 minikube localkube[3315]: I0303 10:22:33.103277    3315 proxier.go:798] syncProxyRules took 50.469613ms
Mar 03 10:22:33 minikube localkube[3315]: I0303 10:22:33.103304    3315 proxier.go:567] OnEndpointsUpdate took 50.53347ms for 5 endpoints

There's a lot going on in kubernetes/kubernetes#26637, but this fix in particular looks like it might help: kubernetes/kubernetes#41223

@keimoon
Copy link

keimoon commented Mar 24, 2017

@eden kubernetes has fixed this issue in v1.5.4, so when will minikube enable this version?

@eden
Copy link
Contributor

eden commented Mar 24, 2017

You should be able to get minikube to start using 1.5.4 by starting it with the --kubernetes-version option on startup, but that version must be built and uploaded. You can see what's available here. Right now 1.5.4 doesn't look like it's in the list.

You can use 1.6, too, if you're ok with a beta, and when I did, I noticed the iptables issue is gone. However, I noticed something new that seems to cause a small amount of continuous CPU use (although anecdotally not as much). I had to switch back because I wasn't ready for some other changes that 1.6 introduced.

@r2d4
Copy link
Contributor

r2d4 commented Mar 24, 2017

Two points

  • The leader election polling is because of the external hostpah provisioner we've implemented. This is going to be in all future versions, although leaderelection is a bit overkill for a single node cluster.

  • We can and should ship a 1.5.4 and 1.5.5. I can do that today. I'm a little reluctant to make those the default since it's going to lead to a bunch of rebasing once we merge the 1.6 branch, but we will make them available.

@mezis
Copy link

mezis commented Apr 4, 2017

Confirming this issue still exists with Minikube running kubernetes 1.6.0, on a fresh install:

$ minikube version
minikube version: v0.17.1

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T23:37:53Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"dirty", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.7", Compiler:"gc", Platform:"linux/amd64"}

top reports localkube using an average 4% an occasional docker usage spikes.

@lefeverd
Copy link

lefeverd commented Apr 4, 2017

Same CPU issue, though not sure if related or not to the original one.
Always using around 20% CPU even without any other containers than the system ones running.

VM Driver: xhyve
Minikube:

$ minikube version
minikube version: v0.17.1

Kubernetes:

$ kubectl version  
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T19:15:41Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"dirty", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.7", Compiler:"gc", Platform:"linux/amd64"}

Logs:
It seems kube-addon-manager-minikube is consuming a lot, I see the same error happening again and again in the logs:

2017-04-04T19:26:46.261306567Z WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-04-04T19:26:46+0000. 0 tries remaining. ==
2017-04-04T19:26:51.264005597Z WRN: == Kubernetes addon reconcile completed with errors at 2017-04-04T19:26:51+0000 ==
2017-04-04T19:26:51.766514723Z error: no objects passed to create
2017-04-04T19:26:51.769507638Z INFO: == Kubernetes addon ensure completed at 2017-04-04T19:26:51+0000 ==
2017-04-04T19:27:35.408150389Z error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource

EDIT
Apparently issue above is due to the version of the addon manager not compatible with Kubernetes 1.6.0 (see kubernetes/kubernetes#43755).
I tried again with the default Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T19:15:41Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}

No more issue in the logs, but still high CPU.

@racerpeter
Copy link

For what its worth, I was able to confirm that the iptables churn is no longer an issue with minikube 0.18.0, which includes kubernetes/kubernetes#41223.

I can also confirm that the fix only cut CPU usage by about half (I was hovering around 25-30% on the osx host, now down to 10-15%), but its not iptables, as evidenced by the lack of iptables-related output from strace-ing localkube.

@VolCh
Copy link

VolCh commented May 14, 2017

Fresh installed minikube & kubectl on Ubuntu 17.04 with VirtualBox 5.1.18_Ubuntur114002:
minikube version: v0.19.0

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3", GitCommit:"0480917b552be33e2dba47386e51decb1a211df6", GitTreeState:"clean", BuildDate:"2017-05-10T15:48:59Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-05-09T23:19:49Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"linux/amd64"}`

Nothing started, nothing configured, just 'minikube start' and got ~30% CPU consuming at Core i7

@francisu
Copy link

Happens with these versions:

minikube version: v0.19.1

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:34:20Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-30T22:03:41Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}

Just did a minikube start and it's taking about 15-18% of my CPU.

@stela
Copy link

stela commented Jul 3, 2017

It got much worse with minikube v0.20.0 and kubernetes 1.7.0:

$ minikube start --memory=6144 --vm-driver=xhyve --kubernetes-version v1.7.0
...
$ minikube version
minikube version: v0.20.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-30T09:51:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-30T10:17:58Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

$ minikube ssh
$ top
  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND                                                                                                                           
 3208 root      20   0 10.829g 404.7m  50.7  6.8   2:30.70 S localkube

My mac's activity monitor claims docker-machine-driver-xhyve is using about 110% CPU. The numbers for top is 50% of two CPUs. docker ps on minikube only reports kubernetes system-processes, "kubectl get all" also indicates I didn't run any pods of my own.

@francisu
Copy link

francisu commented Jul 3, 2017

With 0.19.1 here is a snippet from my logs:

Jul 03 17:35:03 minikube localkube[3600]: E0703 17:35:03.698261    3600 kuberuntime_image.go:106] ListImages failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:03 minikube localkube[3600]: W0703 17:35:03.698266    3600 image_gc_manager.go:176] [imageGCManager] Failed to update image list: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:03 minikube localkube[3600]: E0703 17:35:03.919380    3600 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:03 minikube localkube[3600]: E0703 17:35:03.919406    3600 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:03 minikube localkube[3600]: E0703 17:35:03.919413    3600 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:04 minikube localkube[3600]: E0703 17:35:04.920018    3600 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:04 minikube localkube[3600]: E0703 17:35:04.920046    3600 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:04 minikube localkube[3600]: E0703 17:35:04.920054    3600 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:05 minikube localkube[3600]: E0703 17:35:05.920799    3600 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:05 minikube localkube[3600]: E0703 17:35:05.920826    3600 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:05 minikube localkube[3600]: E0703 17:35:05.920834    3600 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:05 minikube localkube[3600]: E0703 17:35:05.928964    3600 kubelet.go:2079] Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:06 minikube localkube[3600]: I0703 17:35:06.265710    3600 kubelet.go:1752] skipping pod synchronization - [container runtime is down]
Jul 03 17:35:06 minikube localkube[3600]: E0703 17:35:06.922085    3600 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:06 minikube localkube[3600]: E0703 17:35:06.922114    3600 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:06 minikube localkube[3600]: E0703 17:35:06.922123    3600 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:07 minikube localkube[3600]: E0703 17:35:07.922878    3600 remote_runtime.go:163] ListPodSandbox with filter "nil" from runtime service failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:07 minikube localkube[3600]: E0703 17:35:07.922908    3600 kuberuntime_sandbox.go:185] ListPodSandbox failed: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Jul 03 17:35:07 minikube localkube[3600]: E0703 17:35:07.922917    3600 generic.go:198] GenericPLEG: Unable to retrieve pods: rpc error: code = 2 desc = Cannot connect to the Docker daemon. Is the docker daemon running on this host?

@wstrange
Copy link

wstrange commented Jul 5, 2017

Also seeing very high system CPU time on 1.7.0 / MacOS / VirtualBox. With 2 CPUs allocated, each is at 50% System CPU. VBox on the mac is chewing up 150% CPU.

@zacheryph
Copy link

MacOS / VBox / Kube 1.6.4. minikube start -v 3. Kube 1.7 seems to bring the iptables overhead back (and a LOT more cpu usage.) 1.6.4 Only shows the below repeatedly. I see eviction manager output every once and a while between these as well.

Jul 05 19:14:02 minikube localkube[3498]: I0705 19:14:02.157508    3498 config.go:95] Calling handler.OnEndpointsUpdate()
Jul 05 19:14:02 minikube localkube[3498]: I0705 19:14:02.157562    3498 healthcheck.go:223] Not saving endpoints for unknown healthcheck "kube-system/kubernetes-dashboard"
Jul 05 19:14:02 minikube localkube[3498]: I0705 19:14:02.157576    3498 healthcheck.go:223] Not saving endpoints for unknown healthcheck "kube-system/kube-dns"
Jul 05 19:14:03 minikube localkube[3498]: I0705 19:14:03.083730    3498 wrap.go:75] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (988.922µs) 200 [[localkube/v1.6.4 (linux/amd64) kubernetes/$Format/leader-election] 127.0.0.1:32964]
Jul 05 19:14:03 minikube localkube[3498]: I0705 19:14:03.087432    3498 wrap.go:75] PUT /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (2.64199ms) 200 [[localkube/v1.6.4 (linux/amd64) kubernetes/$Format/leader-election] 127.0.0.1:32964]
Jul 05 19:14:03 minikube localkube[3498]: I0705 19:14:03.088801    3498 config.go:95] Calling handler.OnEndpointsUpdate()
Jul 05 19:14:03 minikube localkube[3498]: I0705 19:14:03.089178    3498 healthcheck.go:223] Not saving endpoints for unknown healthcheck "kube-system/kube-dns"
Jul 05 19:14:03 minikube localkube[3498]: I0705 19:14:03.089329    3498 healthcheck.go:223] Not saving endpoints for unknown healthcheck "kube-system/kubernetes-dashboard"
Jul 05 19:14:04 minikube localkube[3498]: I0705 19:14:04.161904    3498 wrap.go:75] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.45693ms) 200 [[localkube/v1.6.4 (linux/amd64) kubernetes/$Format/leader-election] 127.0.0.1:32964]
Jul 05 19:14:04 minikube localkube[3498]: I0705 19:14:04.168189    3498 wrap.go:75] PUT /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (5.547657ms) 200 [[localkube/v1.6.4 (linux/amd64) kubernetes/$Format/leader-election] 127.0.0.1:32964]
Jul 05 19:14:04 minikube localkube[3498]: I0705 19:14:04.168948    3498 config.go:95] Calling handler.OnEndpointsUpdate()

@zacheryph
Copy link

MacOS / VBox / Kube 1.7.0. minikube --kubernetes-version v1.7.0 -v 3. The part that caught my eye in v1.7.0 is the following output. It seems the runner has a [min] sync period of 0. Maybe this got fixed and is a regression in 1.7? I tried tracking down how this gets set / can be set. It appears to be settable via --extra-config but I couldn't figure out how to set it.

Jul 05 19:26:56 minikube localkube[3483]: I0705 19:26:56.190243    3483 proxier.go:991] Syncing iptables rules
Jul 05 19:26:56 minikube localkube[3483]: I0705 19:26:56.417815    3483 bounded_frequency_runner.go:221] sync-runner: ran, next possible in 0s, periodic in 0s

@zacheryph
Copy link

I don't think this makes much of a difference either but I see the same results with qemu on Debian 9 as well. Just wanted to give an idea that this appears to be minikube/localkube related and not MacOS related.

@stela
Copy link

stela commented Jul 6, 2017

@zacheryph great research :) There is a KubeProxyIPTablesConfiguration.MinSyncPeriod setting, see https://godoc.org/k8s.io/kubernetes/pkg/apis/componentconfig#KubeProxyIPTablesConfiguration

// minSyncPeriod is the minimum period that iptables rules are refreshed (e.g. '5s', '1m',
// '2h22m').
MinSyncPeriod metav1.Duration

When I type dmesg kernel logs showed a lot of repeated networking-related log entries, so guess it makes sense. I didn't find the location of any logs though?

The KubeProxyIPTablesConfiguration struct is used in the KubeProxyConfiguration struct, as a field named IPTables. To set KubeProxyConfiguration, the key should start with proxy
The way to set it would then be, as far as I can understand

--extra-config=proxy.IPTables.SyncPeriod=1m --extra-config=proxy.IPTables.MinSyncPeriod=1m

in order to do the check every minute instead of continuously. However the CPU usage stays high even with that extra option for me :( I do see the option being passed on to localkube though.

I copied an strace binary into the minikube VM, running it against the localkube process and using the -f flag shows it's continuously forking off iptables. Examples:

[pid 19500] execve("/sbin/iptables", ["iptables", "-w2", "-N", "KUBE-SERVICES", "-t", "filter"], [/* 12 vars */] <unfinished ...>
[pid 19501] execve("/sbin/iptables", ["iptables", "-w2", "-N", "KUBE-SERVICES", "-t", "nat"], [/* 12 vars */] <unfinished ...>
[pid 19502] execve("/sbin/iptables", ["iptables", "-w2", "-C", "INPUT", "-t", "filter", "-m", "comment", "--comment", "kubernetes service portals", "-j", "KUBE-SERVICES"], [/* 12 vars */] <unfinished ...>
[pid 19503] execve("/sbin/iptables", ["iptables", "-w2", "-C", "OUTPUT", "-t", "filter", "-m", "comment", "--comment", "kubernetes service portals", "-j", "KUBE-SERVICES"], [/* 12 vars */] <unfinished ...>
[pid 19504] execve("/sbin/iptables", ["iptables", "-w2", "-C", "OUTPUT", "-t", "nat", "-m", "comment", "--comment", "kubernetes service portals", "-j", "KUBE-SERVICES"], [/* 12 vars */] <unfinished ...>
...
[pid 19508] execve("/sbin/iptables-save", ["iptables-save", "-t", "filter"], [/* 12 vars */] <unfinished ...>
...
[pid 19510] execve("/sbin/iptables-restore", ["iptables-restore", "--noflush", "--counters"], [/* 12 vars */] <unfinished ...>

It also performs quite a bit of HTTP GET requests various APIs:

[pid 26767] write(112, "GET /v2/keys/registry/services/e"..., 206 <unfinished ...>
[pid 26828] write(168, "GET /v1.23/containers/json?all=1"..., 228) = 228
[pid 26834] write(168, "GET /v1.23/containers/json?all=1"..., 185 <unfinished ...>
...
[pid 21278] write(262, "GET /v2/keys/registry/services/e"..., 168 <unfinished ...>

@stela
Copy link

stela commented Jul 6, 2017

I tried to set most of the duration-settings described at https://godoc.org/k8s.io/kubernetes/pkg/apis/componentconfig to 1 minute, but no success in reducing CPU usage.

@oliverbestmann
Copy link

oliverbestmann commented Jul 14, 2017

--extra-config=proxy.IPTables.SyncPeriod=1m --extra-config=proxy.IPTables.MinSyncPeriod=1m
This did not work for me either, localkube was complaining that it was not able to parse 1m as an integer.

Setting the options to 5 seconds by specifying it in nanoseconds helped:

  --extra-config=proxy.IPTables.SyncPeriod.Duration=5000000000 \
  --extra-config=proxy.IPTables.MinSyncPeriod.Duration=3000000000

@stela
Copy link

stela commented Jul 14, 2017

@oliverbestmann huge thanks! That one worked for me too, the laptop fan is now quiet again. According to https://kubernetes.io/docs/admin/kube-proxy/ the default SyncPeriod is meant to be 30 seconds, so set it to that in nanoseconds. Localcube now consumes "just" 14% according to "ps uaxS" (S includes the child processes), similar to the original bug description.

@r2d4
Copy link
Contributor

r2d4 commented Jul 14, 2017

I'll set it to the upstream defaults. This regression probably happened because of the new way of configuring kube-proxy, there is no longer a default config struct that we can init. I'll make sure the other defaults are properly set also.

@r2d4 r2d4 self-assigned this Jul 14, 2017
r2d4 added a commit to r2d4/minikube that referenced this issue Jul 14, 2017
Set some kube-proxy defaults that got unset through the new way of
configuring kube-proxy.  Add more delay to the ip tables syncing reduces
idle CPU load a lot.

See
kubernetes#1158 (comment)
@r2d4
Copy link
Contributor

r2d4 commented Jul 14, 2017

I've sent #1699 to make those kube-proxy options the default for minikube.

Once that is merged, I would like to set some benchmarks that we would like to hit to make this issue more concrete.

Something like

With default options:

  • idling, localkube consumes less than 15% of CPU and Memory on average

I'm not sure exactly how we could set limits on the driver binary or minikube, since those will probably fluctuate a lot more running on different platforms.

@stela
Copy link

stela commented Jul 18, 2017

@r2d4 Nice that #1699 fixes the defaults, but since using values like "1m" didn't work for overriding the Durations but raw nanoseconds did, is there a time-unit parsing- or documentation issue as well?

@r2d4
Copy link
Contributor

r2d4 commented Jul 18, 2017

@stela I've added an issue here #1712. It should be a relatively simple fix

@stela
Copy link

stela commented Jul 21, 2017

@r2d4 Thanks!

@daveoconnor
Copy link

This workaround doesn't seem to work for me:

minikube start --kubernetes-version v1.7.0 --vm-driver xhyve --extra-config=proxy.IPTables.SyncPeriod.Duration=5000000000 --extra-config=proxy.IPTables.MinSyncPeriod.Duration=3000000000

Does this need to be done with more recent code than v0.20.0?

I'm still seeing ~60-80%, ~30%, ~30% CPU usage on the 3 VBoxHeadless processes I have.

@alanbrent
Copy link

alanbrent commented Jul 25, 2017

@daveoconnor I'm not sure, but perhaps this requires a newer ISO than the current release uses by default. This fix workaround works for me, and I'm using a newer ISO:

$ minikube config view | grep iso-url
- iso-url: https://storage.googleapis.com/minikube/iso/minikube-v0.22.0.iso

@daveoconnor
Copy link

daveoconnor commented Jul 25, 2017

@alanbrent Thanks for the response.

$ minikube config view gives me nothing back.

I tried using

$ minikube start --kubernetes-version v1.7.0\ 
--iso-url=https://storage.googleapis.com/minikube/iso/minikube-v0.22.0.iso \
--vm-driver xhyve \
--extra-config=proxy.IPTables.SyncPeriod.Duration=5000000000 \
--extra-config=proxy.IPTables.MinSyncPeriod.Duration=3000000000 \

As per the documentation at https://kubernetes.io/docs/getting-started-guides/minikube/#using-rkt-container-engine and the output I got was:

Starting local Kubernetes v1.7.0 cluster...
Starting VM...
Moving files into cluster...Slightly lower
Setting up certs...
Starting cluster components...
Connecting to cluster...
Setting up kubeconfig...
Kubectl is now configured to use the cluster.

No message of downloading a new ISO so I'm guessing it's already using that.

After trying

$ minikube config set iso-url https://storage.googleapis.com/minikube/iso/minikube-v0.22.0.iso
$ minikube start --kubernetes-version v1.7.0 \
--vm-driver xhyve \
--extra-config=proxy.IPTables.SyncPeriod.Duration=5000000000 \
--extra-config=proxy.IPTables.MinSyncPeriod.Duration=3000000000

The startup output was the same, and I can't be sure the CPU load is any different. Not spiking quite as high but I'm not sure that's not a coincidence.

@stela
Copy link

stela commented Jul 25, 2017

@daveoconnor I think at least that I had to run minikube delete before any new settings from minikube start would take effect? (beware, this will of course wipe your kubernetes installation with containers and all)

@daveoconnor
Copy link

daveoconnor commented Jul 25, 2017

@stela Thanks, good idea. I'm not seeing much change. Here's how that went:

EDIT: removed xhyve driver section because I realised I shouldn't be using it. Sorry if that caused confusion.

$ minikube delete
$ minikube start --kubernetes-version v1.7.0 \
 --extra-config=proxy.IPTables.SyncPeriod.Duration=5000000000 \
 --extra-config=proxy.IPTables.MinSyncPeriod.Duration=3000000000 \
Starting local Kubernetes v1.7.0 cluster...
Starting VM...
Moving files into cluster...
Setting up certs...
Starting cluster components...
Connecting to cluster...
Setting up kubeconfig...
Kubectl is now configured to use the cluster.

CPU processes still at 40-70%, 20-30%, 20-30%.

@philipn
Copy link

philipn commented Aug 14, 2017

FWIW, I am seeing significantly lower (from ~100% CPU utilization down to 20%) with minikube v0.21.0 (and no custom settings). Killing all of the localkube and k8s services brings VirtualBox down to 8% or so.

@atombender
Copy link
Author

It's been 8 months, anything happening with this?

I'm not seeing any difference with the above flags (I've confirmed that localkube is using them) on Minikube 0.22.2, Kubernetes 1.7.5. localkube is still using 6-12% CPU on the VM, and docker-machine-driver-xhyve is using about 20% on the host. As a result, the fan on my machine runs constantly, and it's just not a very pleasant developer experience.

@oliverbestmann
Copy link

Using around 15 to 20 percent CPU is after those flags was applied as a default. A little bit of tracing the minikube binary reveals most of the time spend in cgroup/cadvisor stats for getting the CPU usage of the containers and for querying various APIs internally by localkube or one of its components.

@alanbrent
Copy link

alanbrent commented Oct 5, 2017

I'm not sure if this is a workable solution for everyone, but I've solved this problem by switching to the kubeadm bootstrapper. You can do so by either invoking the cli param when you start: minikube start --bootstrapper=kubeadm, or setting the configuration parameter globally: minikube config set bootstrapper kubeadm.

Please note that in order for this to be effective (I believe) you'll need to minikube delete first.

@eden
Copy link
Contributor

eden commented Oct 19, 2017

@alanbrent just tried this with minikube 0.22.3 and the vm is still consuming around 18%-20% on the host. There's no localkube to blame anymore, but I see various kubernetes components collectively consuming around 5-20% cpu on the vm itself.

It does not appear that kubeadm is helping things here.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2018
@eden
Copy link
Contributor

eden commented Jan 17, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2018
@koniiiik
Copy link

koniiiik commented Apr 10, 2018

With minikube v0.25.2, running kubernetes v1.9.4, I still see localkube being the top CPU hog, oscillaiting between about 40 and 90% CPU usage inside the VM; this is on Linux with Virtualbox, with an i7-8550U. Never mind, please disregard this, the machine was somehow stuck in the lowest-power state at just 400 MHz CPU clock speeds; with proper CPU frequency scaling it remains at much more reasonable levels around 2-5%. Nevertheless, it maintains CPU usage at levels just high enough to not allow the CPU to enter low-power states, which just drains laptop batteries without doing anything; OTOH, I get that k8s is designed for DC environments rather than laptops…

Attaching a strace of localkube (localkube.strace.gz); looks like it's constantly polling stuff inside /sys/fs/cgroup/; and a fragment of the journal from the VM (minikube-journal.gz) where I don't really see anything interesting…

@cmbernard333
Copy link

I am affected by this issue as well. Running minikube using virtualbox driver consistently drivers the CPU well over 100%.

@hsyed
Copy link

hsyed commented Jun 26, 2018

So I'm looking at 30% thrashing of hyperkit at idle state on a Mac Pro.

When I minikube ssh into the container and use top the control plane components all stay below 2% -- I don't know how accurate the cpu time slicing information is inside a hyperkit vm but It makes me think that the hyperkit VM should remain below 10%. gut feel says 4%.

I've been using the mac "Activity Monitor" when judging the behaviour of the hyperkit process. Using htop paints a different picture, this gives what I expect the timing reports to be with the cpu idling on average at 2%.

When I was doing systems programming "Activity Monitor" was off about the virtual memory metrics -- so I can completely believe it is just flat out wrong. Could one of the systems programmers here shed some light on the discrepancies between "Activity monitor" and htop ? Which one do we trust ?

A simple guide on correctly profiling and interpreting the vm process and the processes inside the vm would help a lot in diagnosing and providing feedback on such issues.

@hsyed
Copy link

hsyed commented Jun 26, 2018

disregard the last one, sudo htop is in line with activity monitor :(

@rafalrusin
Copy link

Same here. Using Ubuntu, minikube v0.25.2, kubernetes 1.7.5, VirtualBox. It's idling at 40% CPU with nothing installed.

@tstromberg tstromberg changed the title localkube consumes CPU when system is "idle" kube-proxy constantly syncing/restoring iptables rules, consuming CPU resources Sep 20, 2018
@tstromberg
Copy link
Contributor

The iptables issues were fixed long ago. Please re-open other performance issues as new bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues co/xhyve kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests