Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

HPA is unable to get metrics for resource cpu after Kubernetes 1.13 upgrade #128

Closed
geerlingguy opened this issue Dec 16, 2018 · 9 comments

Comments

@geerlingguy
Copy link
Owner

See:

# kubectl get hpa drupal8
NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   <unknown>/50%   1         10        1          4h30m

And then describing it:

# kubectl describe hpa drupal8
...
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  73s (x341 over 86m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
@geerlingguy
Copy link
Owner Author

geerlingguy commented Dec 27, 2018

So at least on Vagrant/VirtualBox, I was getting the above until I inspected the metrics-server pod logs, and then found this issue: kubernetes-sigs/metrics-server#131

From this particular comment, I found that adding the explicit flags to ignore TLS errors and to use InternalIP resolution instead of the default DNS resolution seemed to fix the problem, since hostnames like kube4 were not resolving.

So I did a quick:

kubectl -n kube-system edit deploy metrics-server

And added the following command to spec.template.spec.containers:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

After a minute or so...

# kubectl get hpa -n drupal8
NAME      REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   0%/50%    1         10        1          22h

@geerlingguy
Copy link
Owner Author

Testing override of the metrics-server-deployment.yaml file for all platforms (not just ARM)...

@geerlingguy
Copy link
Owner Author

Pushed the above commit, which fixes things in Vagrant. Need to test it in Docker (Travis build running now), and on the Pi Cluster (doing that tomorrow).

@geerlingguy
Copy link
Owner Author

Travis worked, yay!

@geerlingguy
Copy link
Owner Author

Hmm... on the Pis themselves I'm getting:

# kubectl get hpa -n drupal8
NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
drupal8   Deployment/drupal8   <unknown>/50%   1         10        1          2d15h

and the metrics-server pod logs:

E1227 22:11:05.003861       1 summary.go:97] error while getting metrics summary from Kubelet kube4(10.0.100.64:10255): Get http://10.0.100.64:10255/stats/summary/: dial tcp 10.0.100.64:10255: getsockopt: connection refused
E1227 22:11:05.013041       1 summary.go:97] error while getting metrics summary from Kubelet kube1(10.0.100.61:10255): Get http://10.0.100.61:10255/stats/summary/: dial tcp 10.0.100.61:10255: getsockopt: connection refused
E1227 22:11:05.018198       1 summary.go:97] error while getting metrics summary from Kubelet kube3(10.0.100.63:10255): Get http://10.0.100.63:10255/stats/summary/: dial tcp 10.0.100.63:10255: getsockopt: connection refused
E1227 22:11:05.035226       1 summary.go:97] error while getting metrics summary from Kubelet kube2(10.0.100.62:10255): Get http://10.0.100.62:10255/stats/summary/: dial tcp 10.0.100.62:10255: getsockopt: connection refused
E1227 22:11:05.036363       1 summary.go:97] error while getting metrics summary from Kubelet kube5(10.0.100.65:10255): Get http://10.0.100.65:10255/stats/summary/: dial tcp 10.0.100.65:10255: getsockopt: connection refused
I1227 22:11:15.849119       1 reststorage.go:140] No metrics for container drupal8 in pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:15.849232       1 reststorage.go:93] No metrics for pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:30.912339       1 reststorage.go:140] No metrics for container drupal8 in pod drupal8/drupal8-856c9fcc77-s7hdc
I1227 22:11:30.912436       1 reststorage.go:93] No metrics for pod drupal8/drupal8-856c9fcc77-s7hdc

@geerlingguy
Copy link
Owner Author

Found [this issue|https://github.com/kubernetes-sigs/metrics-server/issues/77#issuecomment-402909289]. For some reason this is necessary for the ARM container image but not the AMD64 one... Grr.

Basically, edit the metrics-server deployment spec:

kubectl edit -n kube-system deployment metrics-server

Change spec.template.spec.containers.command so it is like:

...
      - command:
        - /metrics-server
        - --source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true
...

This might not be the best long-term fix, but I'm guessing it will work the same as the AMD64 image once the ARM image is updated to work the same. I should hope, at least.

@geerlingguy
Copy link
Owner Author

Actually scratch that. I was still on v0.2.1 of the metrics-server-arm image. Testing on v0.3.1 now, and getting:

E1227 22:26:34.154388       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kube3: unable to fetch metrics from Kubelet kube3 (kube3): Get https://kube3:10250/stats/summary/: dial tcp: lookup kube3 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube5: unable to fetch metrics from Kubelet kube5 (kube5): Get https://kube5:10250/stats/summary/: dial tcp: lookup kube5 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube1: unable to fetch metrics from Kubelet kube1 (kube1): Get https://kube1:10250/stats/summary/: dial tcp: lookup kube1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube2: unable to fetch metrics from Kubelet kube2 (kube2): Get https://kube2:10250/stats/summary/: dial tcp: lookup kube2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:kube4: unable to fetch metrics from Kubelet kube4 (kube4): Get https://kube4:10250/stats/summary/: dial tcp: lookup kube4 on 10.96.0.10:53: no such host]

So I'll try using the InternalIP like I do with Vagrant now.

@geerlingguy
Copy link
Owner Author

Just updating the command in the template itself now:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

Seems to work fine with v0.3.1.

@mabushey
Copy link

mabushey commented Jan 29, 2019

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

with 0.3.1 still gives me:

E0129 01:13:39.980774 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-132-11-127.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-11-127.us-west-2.compute.internal (10.132.11.127): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-10-233.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-10-233.us-west-2.compute.internal (10.132.10.233): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-10-63.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-10-63.us-west-2.compute.internal (10.132.10.63): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-9-84.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-9-84.us-west-2.compute.internal (10.132.9.84): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-11-28.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-11-28.us-west-2.compute.internal (10.132.11.28): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-10-132-9-104.us-west-2.compute.internal: unable to fetch metrics from Kubelet ip-10-132-9-104.us-west-2.compute.internal (10.132.9.104): request failed - "401 Unauthorized", response: "Unauthorized"]

It does seem to fix some errors though.

If I try adding --source I get an error that --source is not a valid option.
This is on AMD64, not ARM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants