I enabled istio injection on a service with a high traffic load. Once the pod restarted and istio-proxy enabled, we noticed a huge memory consumption:
The query running is avg(rate(container_cpu_usage_seconds_total)).
You can see that envoy used twice the resources of the main container. This API sends a lot of outgoing requests to other service and to the storage if that's relevant.
While on this question - inspection istio container request and limit, and the default value is pretty high:
Why the limit is so high? 2 core CPU is a lot...
Also, we enabled it also on some other services, but didn't experience the same issue.
** Edit **
Looking at Istio Performanc and Scalability, look like istio should take 0.6vCPU for 1000r/s. We had 600 RPS at pick (according to istio own metrics - istio_requests_total), and we had ~20 pods - so it makes sense - ~0.5vCPU per pod?
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[X] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Low CPU usage
Steps to reproduce the bug
Version (include the output of istioctl version --remote and kubectl version) istioctl version --remote:
From what I have seen, Istio is pretty CPU-intensive. By design, it uses iptables to bounce packets from your application to another user-level app (Envoy), and on their way back. This involves a lot of packet copying: user-to-kernel and kernel-to-user, which uses CPU. Moreover, by default it collects telemetry data for every request and sends it to the telemetry server. This also requires more CPU. Additionally, if your application operates at one of the higher level protocols, e.g., http, Envoy's plugin that handles http traffic (encoding/decoding) has been known to be not very CPU efficient. I don't know if tracing is enabled in your environment, but that could also contribute toward more CPU usage.
You could turn off some of these features, e.g., telemetry and tracing, if you don't really need them. That could help. But even if you turn off most of the features, the fact that there's a user-level proxy that intercepts all user packets says CPU overhead of using Istio will be significant. Just to give a data point - when I was using fortio to test Istio latency, where fortio server acts as a simple echo server, the sidecar (i.e., istio-proxy) was using more CPU than my app (fortio).
We also thinking it might be because the request is too small (100m). I just discovered you can request more by using the sidecar.istio.io/proxyCPU annotation. Going to try that and see how this will behave.
Is there a way to track performance improvement features
Also, in case this wasn't clear - we have services with similar traffic where istio proxy behave significantly differently - we see services where the proxy consume ~0.5 CPU and cases where it takes ~0.003. We will be happy to better understand why it happens.
I'm aware of the annotation and already used it.