Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OOMKIlled on fresh installation #2128

Closed
ccom-dev-platform opened this issue Jul 12, 2021 · 6 comments
Closed

[BUG] OOMKIlled on fresh installation #2128

ccom-dev-platform opened this issue Jul 12, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@ccom-dev-platform
Copy link

ccom-dev-platform commented Jul 12, 2021

Software version numbers
State the version numbers of applications involved in the bug.

  • Kubernetes version: 1.18 (EKS)
  • Kyverno version: 1.4.1, Chart version 1.8.2

Describe the bug
Just installed Kyverno through the Helm Chart and the pod keeps getting killed by been OOM.

Cant even play around with the requests or limits of the deployment since those are not exposed in the Chart.

To Reproduce
Steps to reproduce the behavior:

  1. Install Kyverno through the Helm Chart
  2. Pod starts to restart and gets OOMKilled state

Additional context
Pod description:

kubectl describe pod kyverno-c96cf9cd7-tctxg -n policy-guard
Name:         kyverno-c96cf9cd7-tctxg
Namespace:    policy-guard
Priority:     0
Node:         ip-172-22-24-171.ec2.internal/172.22.24.171
Start Time:   Mon, 12 Jul 2021 18:02:50 -0400
Labels:       app=kyverno
              app.kubernetes.io/component=kyverno
              app.kubernetes.io/instance=kyverno
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kyverno
              app.kubernetes.io/part-of=kyverno
              app.kubernetes.io/version=v1.4.1
              helm.sh/chart=kyverno-v1.4.1
              pod-template-hash=c96cf9cd7
Annotations:  kubernetes.io/limit-ranger: LimitRanger plugin set: cpu limit for container kyverno
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           172.22.24.171
IPs:
  IP:           172.22.24.171
Controlled By:  ReplicaSet/kyverno-c96cf9cd7
Init Containers:
  kyverno-pre:
    Container ID:   docker://2554eff215d25836d6d08cd4f1e428c4489593eed098acf6dfed0dd323dcd1a4
    Image:          ghcr.io/kyverno/kyvernopre:v1.4.1
    Image ID:       docker-pullable://ghcr.io/kyverno/kyvernopre@sha256:d81c7caee2c0f7b0f8a10f57d4041d51602003b658ce75b5815018a44d746e04
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 12 Jul 2021 18:02:52 -0400
      Finished:     Mon, 12 Jul 2021 18:03:27 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  64Mi
    Environment:
      KYVERNO_NAMESPACE:  policy-guard (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kyverno-token-dxwqr (ro)
Containers:
  kyverno:
    Container ID:   docker://a79718df9ca255d3ced40511dab3f803db35a7625b4acb0782ccdf8f66515f9d
    Image:          ghcr.io/kyverno/kyverno:v1.4.1
    Image ID:       docker-pullable://ghcr.io/kyverno/kyverno@sha256:24107c0eb18d43ee137b30306d35c160be613dd9ee4126dd59ef6c6ebe581b37
    Ports:          9443/TCP, 8000/TCP
    Host Ports:     9443/TCP, 8000/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 12 Jul 2021 18:19:38 -0400
      Finished:     Mon, 12 Jul 2021 18:20:11 -0400
    Ready:          False
    Restart Count:  7
    Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:      100m
      memory:   50Mi
    Liveness:   http-get https://:9443/health/liveness delay=10s timeout=5s period=30s #success=1 #failure=2
    Readiness:  http-get https://:9443/health/readiness delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      INIT_CONFIG:        kyverno
      KYVERNO_NAMESPACE:  policy-guard (v1:metadata.namespace)
      KYVERNO_SVC:        kyverno-svc
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kyverno-token-dxwqr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kyverno-token-dxwqr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kyverno-token-dxwqr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node.kubernetes.io/role=system
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  19m                  default-scheduler  Successfully assigned policy-guard/kyverno-c96cf9cd7-tctxg to ip-172-22-24-171.ec2.internal
  Normal   Pulling    19m                  kubelet            Pulling image "ghcr.io/kyverno/kyvernopre:v1.4.1"
  Normal   Pulled     19m                  kubelet            Successfully pulled image "ghcr.io/kyverno/kyvernopre:v1.4.1"
  Normal   Created    19m                  kubelet            Created container kyverno-pre
  Normal   Started    19m                  kubelet            Started container kyverno-pre
  Normal   Pulling    18m                  kubelet            Pulling image "ghcr.io/kyverno/kyverno:v1.4.1"
  Normal   Pulled     18m                  kubelet            Successfully pulled image "ghcr.io/kyverno/kyverno:v1.4.1"
  Normal   Created    15m (x4 over 18m)    kubelet            Created container kyverno
  Normal   Started    15m (x4 over 18m)    kubelet            Started container kyverno
  Normal   Pulled     13m (x4 over 18m)    kubelet            Container image "ghcr.io/kyverno/kyverno:v1.4.1" already present on machine
  Warning  BackOff    4m1s (x45 over 17m)  kubelet            Back-off restarting failed container
@ccom-dev-platform ccom-dev-platform added the bug Something isn't working label Jul 12, 2021
@realshuting
Copy link
Member

Hi @ccom-dev-platform - what's the size of your cluster?

I've seen a similar issue today #2127 that the Kyverno pods get OOM killed on a large-scale cluster. Can you bump up the memory limits and see if that helps?

@ccom-dev-platform
Copy link
Author

ccom-dev-platform commented Jul 12, 2021

The cluster is around 40ish ec2 nodes of 16 cpus and 32gb of ram, it varies quite a lot in node count and size since we auto scale constantly.

I could modify the deployment manually but would really loved if the Chart would let me do it! Will report back once I do.

@realshuting
Copy link
Member

Yes, you can configure it via Helm

resources:
limits:
memory: 256Mi
requests:
cpu: 100m
memory: 50Mi
initResources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 10m
memory: 64Mi
.

@realshuting realshuting self-assigned this Jul 13, 2021
@ccom-dev-platform
Copy link
Author

Thanks @realshuting didnt catch that part before.

I updated the deployment to the following values and everything is ok for now:

resources = {
          limits = {
            memory = "4Gi"
            cpu    = "2"
          }
          requests = {
            memory = "500Mi"
            cpu    = "500m"
          }
        }

@ccom-dev-platform
Copy link
Author

Feel free to close the issue if you dont have anything else to add.

@realshuting
Copy link
Member

Great! Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants