vertical-pod-autoscaler 0.3.0 on AWS EKS - admission controller doesn't kick in #1547

piontec · 2019-01-02T10:13:26Z

HI!
I'm running VPA on EKS cluster in AWS. It supports mutating webhooks, as claimed by AWS. Now, I have the following configuration (to test "hamster" deployment in "Initial" mode):

$ ksdev get deploy,pod | grep vpa
deployment.extensions/vpa-admission-controller             1         1         1            1           1d
deployment.extensions/vpa-recommender                      1         1         1            1           1d
deployment.extensions/vpa-updater                          1         1         1            1           1d

pod/vpa-admission-controller-58977d995f-knwrr             1/1       Running     0          29m
pod/vpa-recommender-6bf9f87f85-6zz86                      1/1       Running     0          29m
pod/vpa-updater-6df84c89dd-pfb29                          1/1       Running     0          28m

Webhook is registered and seems in place:

$ ksdev get mutatingwebhookconfiguration.v1beta1.admissionregistration.k8s.io -o yaml       
apiVersion: v1
items:
- apiVersion: admissionregistration.k8s.io/v1beta1
  kind: MutatingWebhookConfiguration
  metadata:
    creationTimestamp: 2019-01-02T09:39:29Z
    generation: 1
    name: vpa-webhook-config
    namespace: ""
    resourceVersion: "39148745"
    selfLink: /apis/admissionregistration.k8s.io/v1beta1/mutatingwebhookconfigurations/vpa-webhook-config
    uid: 4cec7d97-0e72-11e9-889f-127fc02963b2
  webhooks:
  - clientConfig:
      caBundle: [CUT]
      service:
        name: vpa-webhook
        namespace: kube-system
    failurePolicy: Ignore
    name: vpa.k8s.io
    namespaceSelector: {}
    rules:
    - apiGroups:
      - ""
      apiVersions:
      - v1
      operations:
      - CREATE
      resources:
      - pods
    - apiGroups:
      - autoscaling.k8s.io
      apiVersions:
      - v1beta1
      operations:
      - CREATE
      - UPDATE
      resources:
      - verticalpodautoscalers
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Hamster pods are running, VPA is created and successfully updated by Recommender:

$ kdev get verticalpodautoscalers.autoscaling.k8s.io -o yaml
apiVersion: v1
items:
- apiVersion: autoscaling.k8s.io/v1beta1
  kind: VerticalPodAutoscaler
  metadata:   
    clusterName: ""
    creationTimestamp: 2019-01-02T09:50:32Z
    generation: 1
    name: hamster-vpa
    namespace: default
    resourceVersion: "39155158"
    selfLink: /apis/autoscaling.k8s.io/v1beta1/namespaces/default/verticalpodautoscalers/hamster-vpa
    uid: d8733e00-0e73-11e9-9516-0a01d9a5380e
  spec:
    selector:
      matchLabels:
        app: hamster
    updatePolicy:
      updateMode: Initial
  status:
    conditions:
    - lastTransitionTime: 2019-01-02T09:51:05Z
      status: "True"
      type: RecommendationProvided
    recommendation:
      containerRecommendations:
      - containerName: hamster
        lowerBound:
          cpu: 560m
          memory: 262144k
        target:
          cpu: 587m
          memory: 262144k
        uncappedTarget:
          cpu: 587m
          memory: 262144k
        upperBound:
          cpu: 15428m
          memory: "282975409"

But actual admission controller seems to do nothing: the only logs I get are (repeated over and over):

I0102 10:05:21.020578       1 reflector.go:357] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:89: Watch close - *v1beta1.VerticalPodAutoscaler total 8 items received
I0102 10:05:21.020906       1 round_trippers.go:383] GET https://172.20.0.1:443/apis/autoscaling.k8s.io/v1beta1/verticalpodautoscalers?resourceVersion=39153926&timeoutSeconds=431&watch=true
I0102 10:05:21.021195       1 round_trippers.go:390] Request Headers:
I0102 10:05:21.021284       1 round_trippers.go:393]     Accept: application/json, */*
I0102 10:05:21.021460       1 round_trippers.go:393]     User-Agent: admission-controller/v0.0.0 (linux/amd64) kubernetes/$Format
I0102 10:05:21.021566       1 round_trippers.go:393]     Authorization: Bearer [XXX]
I0102 10:05:21.029808       1 round_trippers.go:408] Response Status: 200 OK in 8 milliseconds
I0102 10:05:21.029968       1 round_trippers.go:411] Response Headers:
I0102 10:05:21.030170       1 round_trippers.go:414]     Audit-Id: a812bbcd-1c26-49a2-9f7e-da07a60b7d51
I0102 10:05:21.030291       1 round_trippers.go:414]     Content-Type: application/json
I0102 10:05:21.030468       1 round_trippers.go:414]     Date: Wed, 02 Jan 2019 10:05:21 GMT

When new pods matching the selector are created, their default resurces are not changes nor there's anything showing up in logs. HOw can I investigate this problem?

The text was updated successfully, but these errors were encountered:

bskiba · 2019-01-02T10:41:12Z

Which Kubernetes version?
You can try curling the VPA admission webhook service from within the cluster and see if any requests appear in the admission-controller logs.
If you have access to the master, you can also take a look at the API server logs - it should note if there are any errors on calling the webhook. It think the log to look at is kube-controller-manager.log

piontec · 2019-01-02T11:11:01Z

Kubernetes version is "v1.10.11-eks". I did just that in the meantime. I'm pretty sure it's wrong EKS config - in-cluster service URL works fine, but I get no calls from apiserver when pods are created (checked with tcpdump - nothing, so it's not just lack of log entries). I'm in contact with AWS support, and I will update this case once I learn more. Currently, in EKS there's no way you can get control plane logs :|

bskiba · 2019-01-02T11:44:41Z

I see, that's a bummer :( One thing you can also try to do in the meantime is change the failurePolicy of VPA webhook to Fail (instead of Ignore). This should cause a pod creation failure if API server fails to call the webhook and might get some cause for that failure (though I wouldn't expect anything too verbose).

d-nishi · 2019-01-07T18:23:04Z

/sig aws

safanaj · 2019-01-30T11:49:50Z

Apparently looks like that on AWS EKS the admission controller pod have to listen onto 443 (no matter if the service is ok to forward to any other port), looks like they are using a weird way to resolve endpoint (maybe they are not using this https://github.com/kubernetes/kubernetes/blob/release-1.11/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/config/serviceresolver.go)

Applying this https://github.com/kubernetes/autoscaler/pull/1613/files#diff-741c9c09f72b481cf3cb277a6a2ee929 and passing --port=443 it works

bskiba · 2019-01-30T11:54:48Z

@safanaj Thanks for the update! I'll take a look at your PR today hopefully.

brycecarman · 2019-01-31T22:48:47Z

Verify the rules on the security groups you use for the cluster control plane and for the worker nodes. In particular, verify that the control plane security group allows egress to the worker node security group on port 8000 and the worker nodes allow ingress on 8000 from the control plane.

The default node group template allows port 8000 by default.

piontec · 2019-02-28T08:06:07Z

Yes, we have checked our security groups, it seems you have to use port 443, as @safanaj mentioned above.

fejta-bot · 2019-05-29T08:19:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bskiba · 2019-05-29T08:22:15Z

I think this is fixed already, since the change by @safanaj has been released.
/close

k8s-ci-robot · 2019-05-29T08:22:17Z

@bskiba: Closing this issue.

In response to this:

I think this is fixed already, since the change by @safanaj has been released.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

5cat · 2023-03-12T08:26:46Z

I faced this issue as well and was getting in the kube-apiserver logs the following:

failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook.kube-system.svc:443/?timeout=30s": context deadline exceeded

Changing the 8000 port to 10250 fixed the issue in my EKS cluster.
https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/admission-controller-deployment.yaml#L58,L42

this requires the addition of --port=10250 since the default port is 8000 in the admission-controller container.

5cat · 2023-03-12T10:49:15Z

Verify the rules on the security groups you use for the cluster control plane and for the worker nodes. In particular, verify that the control plane security group allows egress to the worker node security group on port 8000 and the worker nodes allow ingress on 8000 from the control plane.

The default node group template allows port 8000 by default.

@brycecarman Actually you were right, instead of using port 10250 I can use the default 8000 but I needed to add a security group rule to allow the traffic, which wasnt on by default. I used the eks terraform module, and those were the default security group rules, 8000 isnt one of them.

I couldint use the port --port=443 for some reason it told me it couldint bind to that port.

…after the workload is evicted (kubernetes#1547) Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

bskiba added the area/vertical-pod-autoscaler label Jan 2, 2019

k8s-ci-robot added the sig/aws label Jan 7, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2019

k8s-ci-robot closed this as completed May 29, 2019

bskiba mentioned this issue Feb 3, 2020

VPA continuously restarting PODs for hamster app #2789

Closed

5cat mentioned this issue Mar 12, 2023

feat(vertical-pod-autoscaler): add the ability to choose the containerPort for https service in admission-controller cowboysysop/charts#402

Closed

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024

Fix flaky test: add a dedicated finalizer to verify the pod's status …

c3639e9

…after the workload is evicted (kubernetes#1547) Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vertical-pod-autoscaler 0.3.0 on AWS EKS - admission controller doesn't kick in #1547

vertical-pod-autoscaler 0.3.0 on AWS EKS - admission controller doesn't kick in #1547

piontec commented Jan 2, 2019

bskiba commented Jan 2, 2019

piontec commented Jan 2, 2019

bskiba commented Jan 2, 2019

d-nishi commented Jan 7, 2019

safanaj commented Jan 30, 2019

bskiba commented Jan 30, 2019

brycecarman commented Jan 31, 2019

piontec commented Feb 28, 2019

fejta-bot commented May 29, 2019

bskiba commented May 29, 2019

k8s-ci-robot commented May 29, 2019

5cat commented Mar 12, 2023

5cat commented Mar 12, 2023 •

edited

Loading

vertical-pod-autoscaler 0.3.0 on AWS EKS - admission controller doesn't kick in #1547

vertical-pod-autoscaler 0.3.0 on AWS EKS - admission controller doesn't kick in #1547

Comments

piontec commented Jan 2, 2019

bskiba commented Jan 2, 2019

piontec commented Jan 2, 2019

bskiba commented Jan 2, 2019

d-nishi commented Jan 7, 2019

safanaj commented Jan 30, 2019

bskiba commented Jan 30, 2019

brycecarman commented Jan 31, 2019

piontec commented Feb 28, 2019

fejta-bot commented May 29, 2019

bskiba commented May 29, 2019

k8s-ci-robot commented May 29, 2019

5cat commented Mar 12, 2023

5cat commented Mar 12, 2023 • edited Loading

5cat commented Mar 12, 2023 •

edited

Loading