Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

lifecycle-sidecar at 100% of CPU limit #515

Closed
kpurdon opened this issue Jun 26, 2020 · 8 comments
Closed

lifecycle-sidecar at 100% of CPU limit #515

kpurdon opened this issue Jun 26, 2020 · 8 comments
Labels
area/connect Related to Connect, e.g. injection bug Something isn't working

Comments

@kpurdon
Copy link

kpurdon commented Jun 26, 2020

I'm seeing right at 100% of the requested limit CPU usage for the lifecycle sidecar. Would it be possible to increase, or allow configuration of the resource requests for this.

I think

resources:
requests:
memory: "25Mi"
cpu: "10m"
limits:
memory: "25Mi"
cpu: "10m"
is the spot that would need to allow for configuration.

@lkysow
Copy link
Member

lkysow commented Jun 26, 2020 via email

@kpurdon
Copy link
Author

kpurdon commented Jun 26, 2020

This is on the connect enabled pods, no mesh gateway.

Here is a chart showing the CPU limit utilization for all lifecycle containers for the last hour:

Screen Shot 2020-06-25 at 9 15 02 PM

Here is a single top for one of the pods:

POD                      NAME                               CPU(cores)   MEMORY(bytes)
ceweb-7848c69b66-smkxb   consul-connect-lifecycle-sidecar   5m           22Mi
ceweb-7848c69b66-smkxb   consul-connect-envoy-sidecar       3m           18Mi
ceweb-7848c69b66-smkxb   ceweb                              2m           493Mi

Happy to provide any additional metrics that may be useful.

@kpurdon
Copy link
Author

kpurdon commented Jun 26, 2020

Here is the same graph for the last day. There is no real difference in the pods in the top or bottom group, and both groups include pods from each of the services I have connect enabled.

Screen Shot 2020-06-26 at 3 36 18 PM

@lkysow
Copy link
Member

lkysow commented Jun 26, 2020

Okay thank you for this information. We're going to look to bump this up and we're also looking at another issue related to resource settings and OOM. If you need a workaround right now you'd need to bump these up yourself:

https://github.com/hashicorp/consul-k8s/blob/9a3a22edbabd4f935e8831f32afd55e816ebdd0b/connect-inject/lifecycle_sidecar.go#L10-L15

and build a custom consul-k8s image. To be clear this is a high priority for us and we're working on a fix as we speak.

@lkysow lkysow added area/connect Related to Connect, e.g. injection bug Something isn't working labels Jun 26, 2020
@lkysow lkysow changed the title Configure lifecycle-sidecar resource requests lifecycle-sidecar at 100% of CPU limit Jun 26, 2020
@lkysow
Copy link
Member

lkysow commented Jul 9, 2020

Will be addressed by #533

@lkysow
Copy link
Member

lkysow commented Jul 9, 2020

This bugfix is available in 0.23.0.

@lkysow lkysow closed this as completed Jul 9, 2020
@kpurdon
Copy link
Author

kpurdon commented Jul 10, 2020

Awesome @lkysow ... quick question. Is the same multi-step upgrade process required for using a newer helm chart version if the underlying consul version has not changed?

@lkysow
Copy link
Member

lkysow commented Jul 10, 2020

Hey Kyle, it depends on whether the consul client daemonset pods will end up being restarted by the helm upgrade. The release (you should actually use 0.23.1, there was a TLS bug we just patched) changes the default version of consul-k8s to 0.17.0 in order to get this bugfix.

That shouldn't affect the client daemonset unless you have ACLs enabled. If you do, then consul-k8s is actually used as an init container in the client daemonset and so bumping the Docker image version will trigger a client daemonset restart and so for a no-downtime upgrade you would need to follow the multi-step upgrade process.

There's no built-in way to helm to see what will be updated but there is a helm diff plugin: https://github.com/databus23/helm-diff

helm repo update
helm diff upgrade <your release name> hashicorp/consul -f scratch/tls.yaml --version 0.23.1

If the output contains a line about the daemonset like

default, consul-consul, DaemonSet (apps) has changed:

Then the client will be updated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/connect Related to Connect, e.g. injection bug Something isn't working
Projects
None yet
2 participants