Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

istio-proxy - high CPU usage #15883

Closed
omerlh opened this issue Jul 28, 2019 · 8 comments
Assignees
Milestone

Comments

@omerlh
Copy link

@omerlh omerlh commented Jul 28, 2019

Bug description
I enabled istio injection on a service with a high traffic load. Once the pod restarted and istio-proxy enabled, we noticed a huge memory consumption:
image
The query running is avg(rate(container_cpu_usage_seconds_total)).
You can see that envoy used twice the resources of the main container. This API sends a lot of outgoing requests to other service and to the storage if that's relevant.

While on this question - inspection istio container request and limit, and the default value is pretty high:

    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi

Why the limit is so high? 2 core CPU is a lot...
Also, we enabled it also on some other services, but didn't experience the same issue.

** Edit **
Looking at Istio Performanc and Scalability, look like istio should take 0.6vCPU for 1000r/s. We had 600 RPS at pick (according to istio own metrics - istio_requests_total), and we had ~20 pods - so it makes sense - ~0.5vCPU per pod?

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[X] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
Low CPU usage

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version)
istioctl version --remote:

client version: 1.2.2
citadel version: 1.2.2
galley version: 1.2.2
pilot version: 1.2.2
pilot version: 1.2.2
policy version: 1.2.2
policy version: 1.2.2
sidecar-injector version: 1.2.2
telemetry version: 1.2.2
telemetry version: 1.2.2

kubectl version:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:34:27Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:41:57Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed? Helm

Environment where bug was observed (cloud vendor, OS, etc)
Azure, Kubernetes

Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

@huang195

This comment has been minimized.

Copy link
Member

@huang195 huang195 commented Aug 1, 2019

From what I have seen, Istio is pretty CPU-intensive. By design, it uses iptables to bounce packets from your application to another user-level app (Envoy), and on their way back. This involves a lot of packet copying: user-to-kernel and kernel-to-user, which uses CPU. Moreover, by default it collects telemetry data for every request and sends it to the telemetry server. This also requires more CPU. Additionally, if your application operates at one of the higher level protocols, e.g., http, Envoy's plugin that handles http traffic (encoding/decoding) has been known to be not very CPU efficient. I don't know if tracing is enabled in your environment, but that could also contribute toward more CPU usage.

You could turn off some of these features, e.g., telemetry and tracing, if you don't really need them. That could help. But even if you turn off most of the features, the fact that there's a user-level proxy that intercepts all user packets says CPU overhead of using Istio will be significant. Just to give a data point - when I was using fortio to test Istio latency, where fortio server acts as a simple echo server, the sidecar (i.e., istio-proxy) was using more CPU than my app (fortio).

@omerlh

This comment has been minimized.

Copy link
Author

@omerlh omerlh commented Aug 1, 2019

I'm using istio for those features :) do you know if I can fine-tune tracing for example, e.g. trace only a small percentage of the requests?

@huang195

This comment has been minimized.

Copy link
Member

@huang195 huang195 commented Aug 1, 2019

I believe the default tracing percentage is already set to 1% (https://github.com/istio/istio/blob/release-1.2/install/kubernetes/helm/istio/charts/pilot/values.yaml#L12), unless you modified the default value

@omerlh

This comment has been minimized.

Copy link
Author

@omerlh omerlh commented Aug 5, 2019

We also thinking it might be because the request is too small (100m). I just discovered you can request more by using the sidecar.istio.io/proxyCPU annotation. Going to try that and see how this will behave.

@duderino duderino added this to the 1.3 milestone Aug 27, 2019
@duderino

This comment has been minimized.

Copy link
Contributor

@duderino duderino commented Aug 27, 2019

@mandarjog @howardjohn can you take a look at this and offer some advice? Thanks

@mandarjog

This comment has been minimized.

Copy link
Contributor

@mandarjog mandarjog commented Aug 27, 2019

As far as "high" cpu is concerned, @omerlh looks like your cpu usage is inline with our guidance.

You should estimate your load and use appropriate request or the sidecar.istio.io/proxyCPU. Using 100m, your pods may be too tightly packed and may not be able to get more CPU.

@duderino

This comment has been minimized.

Copy link
Contributor

@duderino duderino commented Aug 27, 2019

performance and efficiency improvements are coming. for now, the question on this bug has been answered so closing it. please re-open if this didn't address your question

@duderino duderino closed this Aug 27, 2019
@omerlh

This comment has been minimized.

Copy link
Author

@omerlh omerlh commented Sep 1, 2019

Is there a way to track performance improvement features
Also, in case this wasn't clear - we have services with similar traffic where istio proxy behave significantly differently - we see services where the proxy consume ~0.5 CPU and cases where it takes ~0.003. We will be happy to better understand why it happens.
I'm aware of the annotation and already used it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.