[BUG] OOMKIlled on fresh installation #2128

ccom-dev-platform · 2021-07-12T22:35:07Z

Software version numbers
State the version numbers of applications involved in the bug.

Kubernetes version: 1.18 (EKS)
Kyverno version: 1.4.1, Chart version 1.8.2

Describe the bug
Just installed Kyverno through the Helm Chart and the pod keeps getting killed by been OOM.

Cant even play around with the requests or limits of the deployment since those are not exposed in the Chart.

To Reproduce
Steps to reproduce the behavior:

Install Kyverno through the Helm Chart
Pod starts to restart and gets OOMKilled state

Additional context
Pod description:

kubectl describe pod kyverno-c96cf9cd7-tctxg -n policy-guard
Name:         kyverno-c96cf9cd7-tctxg
Namespace:    policy-guard
Priority:     0
Node:         ip-172-22-24-171.ec2.internal/172.22.24.171
Start Time:   Mon, 12 Jul 2021 18:02:50 -0400
Labels:       app=kyverno
              app.kubernetes.io/component=kyverno
              app.kubernetes.io/instance=kyverno
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kyverno
              app.kubernetes.io/part-of=kyverno
              app.kubernetes.io/version=v1.4.1
              helm.sh/chart=kyverno-v1.4.1
              pod-template-hash=c96cf9cd7
Annotations:  kubernetes.io/limit-ranger: LimitRanger plugin set: cpu limit for container kyverno
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           172.22.24.171
IPs:
  IP:           172.22.24.171
Controlled By:  ReplicaSet/kyverno-c96cf9cd7
Init Containers:
  kyverno-pre:
    Container ID:   docker://2554eff215d25836d6d08cd4f1e428c4489593eed098acf6dfed0dd323dcd1a4
    Image:          ghcr.io/kyverno/kyvernopre:v1.4.1
    Image ID:       docker-pullable://ghcr.io/kyverno/kyvernopre@sha256:d81c7caee2c0f7b0f8a10f57d4041d51602003b658ce75b5815018a44d746e04
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 12 Jul 2021 18:02:52 -0400
      Finished:     Mon, 12 Jul 2021 18:03:27 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  64Mi
    Environment:
      KYVERNO_NAMESPACE:  policy-guard (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kyverno-token-dxwqr (ro)
Containers:
  kyverno:
    Container ID:   docker://a79718df9ca255d3ced40511dab3f803db35a7625b4acb0782ccdf8f66515f9d
    Image:          ghcr.io/kyverno/kyverno:v1.4.1
    Image ID:       docker-pullable://ghcr.io/kyverno/kyverno@sha256:24107c0eb18d43ee137b30306d35c160be613dd9ee4126dd59ef6c6ebe581b37
    Ports:          9443/TCP, 8000/TCP
    Host Ports:     9443/TCP, 8000/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 12 Jul 2021 18:19:38 -0400
      Finished:     Mon, 12 Jul 2021 18:20:11 -0400
    Ready:          False
    Restart Count:  7
    Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:      100m
      memory:   50Mi
    Liveness:   http-get https://:9443/health/liveness delay=10s timeout=5s period=30s #success=1 #failure=2
    Readiness:  http-get https://:9443/health/readiness delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      INIT_CONFIG:        kyverno
      KYVERNO_NAMESPACE:  policy-guard (v1:metadata.namespace)
      KYVERNO_SVC:        kyverno-svc
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kyverno-token-dxwqr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kyverno-token-dxwqr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kyverno-token-dxwqr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node.kubernetes.io/role=system
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  19m                  default-scheduler  Successfully assigned policy-guard/kyverno-c96cf9cd7-tctxg to ip-172-22-24-171.ec2.internal
  Normal   Pulling    19m                  kubelet            Pulling image "ghcr.io/kyverno/kyvernopre:v1.4.1"
  Normal   Pulled     19m                  kubelet            Successfully pulled image "ghcr.io/kyverno/kyvernopre:v1.4.1"
  Normal   Created    19m                  kubelet            Created container kyverno-pre
  Normal   Started    19m                  kubelet            Started container kyverno-pre
  Normal   Pulling    18m                  kubelet            Pulling image "ghcr.io/kyverno/kyverno:v1.4.1"
  Normal   Pulled     18m                  kubelet            Successfully pulled image "ghcr.io/kyverno/kyverno:v1.4.1"
  Normal   Created    15m (x4 over 18m)    kubelet            Created container kyverno
  Normal   Started    15m (x4 over 18m)    kubelet            Started container kyverno
  Normal   Pulled     13m (x4 over 18m)    kubelet            Container image "ghcr.io/kyverno/kyverno:v1.4.1" already present on machine
  Warning  BackOff    4m1s (x45 over 17m)  kubelet            Back-off restarting failed container

The text was updated successfully, but these errors were encountered:

realshuting · 2021-07-12T22:39:43Z

Hi @ccom-dev-platform - what's the size of your cluster?

I've seen a similar issue today #2127 that the Kyverno pods get OOM killed on a large-scale cluster. Can you bump up the memory limits and see if that helps?

ccom-dev-platform · 2021-07-12T22:57:32Z

The cluster is around 40ish ec2 nodes of 16 cpus and 32gb of ram, it varies quite a lot in node count and size since we auto scale constantly.

I could modify the deployment manually but would really loved if the Chart would let me do it! Will report back once I do.

realshuting · 2021-07-13T01:03:57Z

Yes, you can configure it via Helm

kyverno/charts/kyverno/values.yaml

Lines 74 to 87 in 13caaed

    
           resources: 
        
             limits: 
        
               memory: 256Mi 
        
             requests: 
        
               cpu: 100m 
        
               memory: 50Mi 
        
           initResources: 
        
             limits: 
        
               cpu: 100m 
        
               memory: 256Mi 
        
             requests: 
        
               cpu: 10m 
        
               memory: 64Mi

.

ccom-dev-platform · 2021-07-13T15:39:01Z

Thanks @realshuting didnt catch that part before.

I updated the deployment to the following values and everything is ok for now:

resources = {
          limits = {
            memory = "4Gi"
            cpu    = "2"
          }
          requests = {
            memory = "500Mi"
            cpu    = "500m"
          }
        }

ccom-dev-platform · 2021-07-13T15:39:33Z

Feel free to close the issue if you dont have anything else to add.

realshuting · 2021-07-13T17:28:00Z

Great! Closing this issue.

ccom-dev-platform added the bug Something isn't working label Jul 12, 2021

realshuting self-assigned this Jul 13, 2021

realshuting closed this as completed Jul 13, 2021

realshuting added this to the Kyverno Release 1.4.2 milestone Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OOMKIlled on fresh installation #2128

[BUG] OOMKIlled on fresh installation #2128

ccom-dev-platform commented Jul 12, 2021 •

edited

realshuting commented Jul 12, 2021

ccom-dev-platform commented Jul 12, 2021 •

edited

realshuting commented Jul 13, 2021

ccom-dev-platform commented Jul 13, 2021

ccom-dev-platform commented Jul 13, 2021

realshuting commented Jul 13, 2021

[BUG] OOMKIlled on fresh installation #2128

[BUG] OOMKIlled on fresh installation #2128

Comments

ccom-dev-platform commented Jul 12, 2021 • edited

realshuting commented Jul 12, 2021

ccom-dev-platform commented Jul 12, 2021 • edited

realshuting commented Jul 13, 2021

ccom-dev-platform commented Jul 13, 2021

ccom-dev-platform commented Jul 13, 2021

realshuting commented Jul 13, 2021

ccom-dev-platform commented Jul 12, 2021 •

edited

ccom-dev-platform commented Jul 12, 2021 •

edited