HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

geerlingguy · 2018-12-16T19:47:57Z

See:

# kubectl get hpa drupal8
NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   <unknown>/50%   1         10        1          4h30m

And then describing it:

# kubectl describe hpa drupal8
...
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  73s (x341 over 86m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

The text was updated successfully, but these errors were encountered:

geerlingguy · 2018-12-27T03:52:55Z

So at least on Vagrant/VirtualBox, I was getting the above until I inspected the metrics-server pod logs, and then found this issue: kubernetes-sigs/metrics-server#131

From this particular comment, I found that adding the explicit flags to ignore TLS errors and to use InternalIP resolution instead of the default DNS resolution seemed to fix the problem, since hostnames like kube4 were not resolving.

So I did a quick:

kubectl -n kube-system edit deploy metrics-server

And added the following command to spec.template.spec.containers:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

After a minute or so...

# kubectl get hpa -n drupal8
NAME      REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   0%/50%    1         10        1          22h

geerlingguy · 2018-12-27T04:36:15Z

Testing override of the metrics-server-deployment.yaml file for all platforms (not just ARM)...

geerlingguy · 2018-12-27T05:24:46Z

Pushed the above commit, which fixes things in Vagrant. Need to test it in Docker (Travis build running now), and on the Pi Cluster (doing that tomorrow).

geerlingguy · 2018-12-27T05:29:43Z

Travis worked, yay!

geerlingguy · 2018-12-27T22:12:23Z

Hmm... on the Pis themselves I'm getting:

# kubectl get hpa -n drupal8
NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   <unknown>/50%   1         10        1          2d15h

and the metrics-server pod logs:

E1227 22:11:05.003861       1 summary.go:97] error while getting metrics summary from Kubelet kube4(10.0.100.64:10255): Get http://10.0.100.64:10255/stats/summary/: dial tcp 10.0.100.64:10255: getsockopt: connection refused
E1227 22:11:05.013041       1 summary.go:97] error while getting metrics summary from Kubelet kube1(10.0.100.61:10255): Get http://10.0.100.61:10255/stats/summary/: dial tcp 10.0.100.61:10255: getsockopt: connection refused
E1227 22:11:05.018198       1 summary.go:97] error while getting metrics summary from Kubelet kube3(10.0.100.63:10255): Get http://10.0.100.63:10255/stats/summary/: dial tcp 10.0.100.63:10255: getsockopt: connection refused
E1227 22:11:05.035226       1 summary.go:97] error while getting metrics summary from Kubelet kube2(10.0.100.62:10255): Get http://10.0.100.62:10255/stats/summary/: dial tcp 10.0.100.62:10255: getsockopt: connection refused
E1227 22:11:05.036363       1 summary.go:97] error while getting metrics summary from Kubelet kube5(10.0.100.65:10255): Get http://10.0.100.65:10255/stats/summary/: dial tcp 10.0.100.65:10255: getsockopt: connection refused
I1227 22:11:15.849119       1 reststorage.go:140] No metrics for container drupal8 in pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:15.849232       1 reststorage.go:93] No metrics for pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:30.912339       1 reststorage.go:140] No metrics for container drupal8 in pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:30.912436       1 reststorage.go:93] No metrics for pod drupal8/drupal8-856c9fcc77-s7hdc

geerlingguy · 2018-12-27T22:22:39Z

Found [this issue|https://github.com/kubernetes-sigs/metrics-server/issues/77#issuecomment-402909289]. For some reason this is necessary for the ARM container image but not the AMD64 one... Grr.

Basically, edit the metrics-server deployment spec:

kubectl edit -n kube-system deployment metrics-server

Change spec.template.spec.containers.command so it is like:

...
      - command:
        - /metrics-server
        - --source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true
...

This might not be the best long-term fix, but I'm guessing it will work the same as the AMD64 image once the ARM image is updated to work the same. I should hope, at least.

geerlingguy · 2018-12-27T22:27:35Z

Actually scratch that. I was still on v0.2.1 of the metrics-server-arm image. Testing on v0.3.1 now, and getting:

E1227 22:26:34.154388       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kube3: unable to fetch metrics from Kubelet kube3 (kube3): Get https://kube3:10250/stats/summary/: dial tcp: lookup kube3 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube5: unable to fetch metrics from Kubelet kube5 (kube5): Get https://kube5:10250/stats/summary/: dial tcp: lookup kube5 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube1: unable to fetch metrics from Kubelet kube1 (kube1): Get https://kube1:10250/stats/summary/: dial tcp: lookup kube1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube2: unable to fetch metrics from Kubelet kube2 (kube2): Get https://kube2:10250/stats/summary/: dial tcp: lookup kube2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube4: unable to fetch metrics from Kubelet kube4 (kube4): Get https://kube4:10250/stats/summary/: dial tcp: lookup kube4 on 10.96.0.10:53: no such host]

So I'll try using the InternalIP like I do with Vagrant now.

geerlingguy · 2018-12-27T22:35:02Z

Just updating the command in the template itself now:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

Seems to work fine with v0.3.1.

…ment.

Fixes #128: HPA revamp and fixes for metrics-server 0.3.1 and K8s 1.13

mabushey · 2019-01-29T01:16:14Z

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

with 0.3.1 still gives me:

E0129 01:13:39.980774 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-132-11-127.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-11-127.us-west-2.compute.internal (10.132.11.127): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-10-233.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-10-233.us-west-2.compute.internal (10.132.10.233): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-10-63.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-10-63.us-west-2.compute.internal (10.132.10.63): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-9-84.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-9-84.us-west-2.compute.internal (10.132.9.84): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-11-28.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-11-28.us-west-2.compute.internal (10.132.11.28): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-9-104.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-9-104.us-west-2.compute.internal (10.132.9.104): request failed - "401 Unauthorized", response: "Unauthorized"]

It does seem to fix some errors though.

If I try adding --source I get an error that --source is not a valid option.
This is on AMD64, not ARM

geerlingguy mentioned this issue Dec 16, 2018

Upgrade to Kubernetes 1.13 #127

Closed

3 tasks

geerlingguy added a commit that referenced this issue Dec 27, 2018

Fixes #128: HPA metrics-server revamp.

56485e7

geerlingguy added a commit that referenced this issue Dec 27, 2018

Fixes #128: Use InternalIP and insecure-tls for metrics-server deploy…

9079513

…ment.

geerlingguy mentioned this issue Dec 27, 2018

Fixes #128: HPA revamp and fixes for metrics-server 0.3.1 and K8s 1.13 #130

Merged

geerlingguy closed this as completed in #130 Dec 28, 2018

geerlingguy added a commit that referenced this issue Dec 28, 2018

Merge pull request #130 from geerlingguy/128-hpa-revamp

786018b

Fixes #128: HPA revamp and fixes for metrics-server 0.3.1 and K8s 1.13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

geerlingguy commented Dec 16, 2018

geerlingguy commented Dec 27, 2018 •

edited

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

mabushey commented Jan 29, 2019 •

edited

HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

Comments

geerlingguy commented Dec 16, 2018

geerlingguy commented Dec 27, 2018 • edited

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

geerlingguy commented Dec 27, 2018

mabushey commented Jan 29, 2019 • edited

geerlingguy commented Dec 27, 2018 •

edited

mabushey commented Jan 29, 2019 •

edited