Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPA] Pod scheduled with memory limit above limitrange #3319

Closed
rhysemmas opened this issue Jul 14, 2020 · 9 comments · Fixed by #3463
Closed

[VPA] Pod scheduled with memory limit above limitrange #3319

rhysemmas opened this issue Jul 14, 2020 · 9 comments · Fixed by #3463
Assignees

Comments

@rhysemmas
Copy link

rhysemmas commented Jul 14, 2020

Hi there,

We're seeing an issue with pods that have more than one container which VPA recommends and updates memory requests/limits for. The pods are being scheduled with memory limits that are above the pod memory limit configured in the namespace's limitrange.

We don't see this issue when a pod only has one container - VPA updates the memory limit to be within the limitrange. The issue only seems to occur when there are multiple containers in a pod which VPA updates requests/limits for.

Also, I should mention that we're not configuring any minAllowed or maxAllowed limits via container policies in the VPA resource policy, so we don't expect there to be any conflicts which would cause VPA to set limits above the limit range. In case it makes a difference, we are setting a request/limit ratio of 1:1 when initially deploying the pod.

When looking at the replicaset when the pod fails to schedule, the pod is scheduled with a total memory limit which is 1 byte over the limit imposed by the limitrange.

E.g:

Namespace limitrange has a memory limit of 115Gi (== 123480309760 bytes)

Type        Resource  Min  Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---  ---    ---------------  -------------  -----------------------
Pod         memory    5Mi  115Gi  -                -              -
Container   memory    5Mi  -      256Mi            512Mi          -

VPA recommends memory request for two containers in a pod which totals above the limitrange limit (as expected)

    Container Recommendations:
      Container Name:  oomps
      Lower Bound:
        Cpu:     407m
        Memory:  54651563825
      Target:
        Cpu:     813m
        Memory:  130488397494
      Uncapped Target:
        Cpu:     813m
        Memory:  130488397494
      Upper Bound:
        Cpu:           195933m
        Memory:        31447703796054
      Container Name:  oomps-2
      Lower Bound:
        Cpu:     838m
        Memory:  1823741057
      Target:
        Cpu:     1168m
        Memory:  2539377048
      Uncapped Target:
        Cpu:     1168m
        Memory:  2539377048
      Upper Bound:
        Cpu:     211408m
        Memory:  459627245688
Events:          <none>

Pod seems to be updated with requests/limits that attempt to fall more in-line with the limitrange, but fails to schedule due to total pod memory limit being 1 byte above the limitrange:

Events:
  Type     Reason            Age                From                   Message
  ----     ------            ----               ----                   -------
<truncated>
  Warning  FailedCreate      66s                replicaset-controller  Error creating: pods "rhys-oomps-6676478b6-vj89d" is forbidden: maximum memory usage per Pod is 115Gi, but limit is 123480309761
  Warning  FailedCreate      26s (x9 over 65s)  replicaset-controller  (combined from similar events): Error creating: pods "rhys-oomps-6676478b6-zb8fv" is forbidden: maximum memory usage per Pod is 115Gi, but limit is 123480309761

As the pods are consistently scheduled with only 1 byte over the limitrange, it makes me think there may be an error in the calculation that updates requests/limits for multiple containers in a pod, while trying to keep the total requests of the pod within the limitrange. I'm not sure though, so if I am missing something I'd be really grateful for any help!

@bskiba
Copy link
Member

bskiba commented Jul 15, 2020

Thanks for reporting!
/assign @jbartosik

@jbartosik could you take a look? I think you have the most context.

@rhysemmas Can you let us know which version of VPA you are using?

@rhysemmas
Copy link
Author

Hey @bskiba @jbartosik thanks for taking a look at this! We're using version 0.8.0

@jbartosik
Copy link
Collaborator

I took a look at this. I added a test and in the test I get the following capped memory recommendations for pods (I set up test to have limit ranges and pods like described in this issue):

  • 121123184974863m
  • 2357124785136m

Which adds up to 123480309759999m, just 1m below max for Pod. I guess 1m memory doesn't make sense so those get rounded up to 121123184975000m and 2357124786000m, which add up to just a bit above max.

The solution would be to round memory recommendations down to 1B.

@jbartosik
Copy link
Collaborator

With minimums we should round up.

@surajnarwade
Copy link

surajnarwade commented Nov 27, 2020

Hello @jbartosik @bskiba,

I am facing the same issue again with latest VPA version 0.9.0

pods are failed to schedule because of the following error:

  Warning  FailedCreate      8m10s (x14 over 14h)  statefulset-controller  create Pod prometheus-kube-system-0 in StatefulSet prometheus-kube-system failed error: pods "prometheus-kube-system-0" is forbidden: maximum memory usage per Pod is 115Gi, but limit is 237420533722

In my VPA configuration, I've given:

  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        memory: 50Mi

and my limit range looks like this:

Type        Resource  Min  Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---  ---    ---------------  -------------  -----------------------
Pod         memory    5Mi  115Gi  -                -              -
Container   memory    5Mi  -      256Mi            512Mi          -

VPA recommendations are:

  recommendation:
    containerRecommendations:
    - containerName: thanos-sidecar
      lowerBound:
        cpu: 77m
        memory: "1303507665"
      target:
        cpu: 126m
        memory: "1312092764"
      uncappedTarget:
        cpu: 126m
        memory: "1312092764"
      upperBound:
        cpu: 780m
        memory: "5625822399"
    - containerName: prometheus
      lowerBound:
        cpu: 180m
        memory: "4037093361"
      target:
        cpu: 410m
        memory: "6434836068"
      uncappedTarget:
        cpu: 410m
        memory: "6434836068"
      upperBound:
        cpu: 2700m
        memory: "29600245912"
    - containerName: prometheus-config-reloader
      lowerBound:
        cpu: 10m
        memory: 50Mi
      target:
        cpu: 11m
        memory: 50Mi
      uncappedTarget:
        cpu: 11m
        memory: "23574998"
      upperBound:
        cpu: 47m
        memory: "101081840"
    - containerName: rules-configmap-reloader
      lowerBound:
        cpu: 10m
        memory: 50Mi
      target:
        cpu: 11m
        memory: 50Mi
      uncappedTarget:
        cpu: 11m
        memory: 11500k
      upperBound:
        cpu: 47m
        memory: 50Mi

it's probably the same issue as @rhysemmas pointed out but the issue seems to be not fixed

@bskiba
Copy link
Member

bskiba commented Nov 27, 2020

@jbartosik Can you take a look?

@jbartosik
Copy link
Collaborator

  1. This doesn't look like a problem with rounding. 237420533722 B is more than 221 GiB, way more than the limit of 115GiB. With rounding we could be ~1B/pod above the limit.
  2. This looks really weird. Sum of target recommendations is ~7.3 Gi (1303507665 B + 6434836068 B + 50 Mi + 50 Mi = 7851786432 B). This is much, much lower than the limit.

@surajnarwade can you give me some more details? For example what limits did VPA set for the pod?

@surajnarwade
Copy link

@jbartosik updateMode is set to Initial so pod was not scheduled. after some time, it scheduled the pod with following requests/limits,

container1:

    resources:
      limits:
        memory: "19229497008"
      requests:
        cpu: 410m
        memory: "1201843563"


container2:


   resources:
      limits:
        cpu: 11m
        memory: 50Mi
      requests:
        cpu: 11m
        memory: 50Mi

container3:

    resources:
      limits:
        cpu: 11m
        memory: 50Mi
      requests:
        cpu: 11m
        memory: 50Mi

container4:

    resources:
      limits:
        memory: 25094292172800m
      requests:
        cpu: 126m
        memory: "245061447"

@surajnarwade
Copy link

Hello, @jbartosik I am seeing this issue again, I thought it should respect limit range limits and assign accordingly, right?

here's my error:

  Warning  FailedCreate      7m22s (x2 over 12m)  statefulset-controller  create Pod prometheus-monitoring-1 in StatefulSet prometheus-monitoring failed error: pods "prometheus-monitoring-1" is forbidden: maximum memory usage per Pod is 115Gi, but limit is 125652498096

here's the limitrange:

Type        Resource  Min  Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---  ---    ---------------  -------------  -----------------------
Pod         memory    5Mi  115Gi  -                -              -
Container   memory    5Mi  -      256Mi            512Mi          -

here's VPA definition:

spec:
  targetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: prometheus-monitoring
  updatePolicy:
    updateMode: Initial

here's the VPA recommendation:

Recommendation:
   Container Recommendations:
     Container Name:  prometheus-config-reloader
     Lower Bound:
       Cpu:     10m
       Memory:  11478785
     Target:
       Cpu:     11m
       Memory:  11500k
     Uncapped Target:
       Cpu:     11m
       Memory:  11500k
     Upper Bound:
       Cpu:           21m
       Memory:        22122193
     Container Name:  rules-configmap-reloader
     Lower Bound:
       Cpu:     10m
       Memory:  11478785
     Target:
       Cpu:     11m
       Memory:  11500k
     Uncapped Target:
       Cpu:     11m
       Memory:  11500k
     Upper Bound:
       Cpu:           21m
       Memory:        22122193
     Container Name:  config-reloader
     Lower Bound:
       Cpu:     5m
       Memory:  12745998
     Target:
       Cpu:     11m
       Memory:  23574998
     Uncapped Target:
       Cpu:     11m
       Memory:  23574998
     Upper Bound:
       Cpu:           3971m
       Memory:        8510574278
     Container Name:  prometheus
     Lower Bound:
       Cpu:     378m
       Memory:  4273146021
     Target:
       Cpu:     476m
       Memory:  4281023392
     Uncapped Target:
       Cpu:     476m
       Memory:  4281023392
     Upper Bound:
       Cpu:           981m
       Memory:        8225152428
     Container Name:  thanos-sidecar
     Lower Bound:
       Cpu:     108m
       Memory:  976469948
     Target:
       Cpu:     182m
       Memory:  1038683533
     Uncapped Target:
       Cpu:     182m
       Memory:  1038683533
     Upper Bound:
       Cpu:     390m
       Memory:  1995628054
Events:          <none>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants