Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0 wont work - reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01" #143

Closed
gabrielfsousa opened this issue Sep 25, 2018 · 25 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@gabrielfsousa
Copy link

gabrielfsousa commented Sep 25, 2018

updated v0.2.1 to v0.3.0.
on 0.2.1 was working

on v0.3.0 gives this error:

 E0925 16:47:40.840589       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01"
E0925 16:47:40.840616       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev02": no metrics known for node "k8sdev02"
E0925 16:47:40.840621       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev03": no metrics known for node "k8sdev03"
E0925 16:47:40.874031       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 
E0925 16:46:55.894491       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: dial tcp: lookup k8sdev02 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: dial tcp: lookup k8sdev03 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: dial tcp: lookup k8sdev01 on 10.96.0.10:53: server misbehaving]
@gabrielfsousa gabrielfsousa changed the title 3.0 wont work - Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 3.0 wont work - Unable to authenticate the request due to an error: [x509: Sep 25, 2018
@gabrielfsousa gabrielfsousa changed the title 3.0 wont work - Unable to authenticate the request due to an error: [x509: 3.0 wont work - reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01" Sep 25, 2018
@DirectXMan12
Copy link
Contributor

metrics-server doesn't know how to trust the serving certificates on your kubelets. First, double check what happens with the --insecure-kubelet. Then, if that works, it probably means you're using self-signed kubelet certs, or something similar.

@gabrielfsousa
Copy link
Author

Error: unknown flag: --insecure-kubelet

tried with --kubelet-insecure-tls, and i have the same error

 E0925 17:27:26.732588       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
E0925 17:27:28.408445       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:28.408467       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:28.408471       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:29.060940       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:29.060960       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:29.060964       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:29.459832       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:29.459850       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:29.459854       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:33.030281       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 

@gabrielfsousa
Copy link
Author

on version 0.2.1 works , even with the error

 E0925 17:27:26.732588       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]

on version 0.2.1 i have the certificate but works

[root@k8sdev03 ~]$ kubectl top nodes
NAME       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
k8sdev01   433m         8%        6977Mi          51%
k8sdev02   362m         7%        7028Mi          51%
k8sdev03   374m         7%        7237Mi          53%

@gabrielfsousa
Copy link
Author

E0925 17:50:02.785406       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: dial tcp: lookup k8sdev02 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: dial tcp: lookup k8sdev01 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: dial tcp: lookup k8sdev03 on 10.96.0.10:53: server misbehaving]

@DirectXMan12
Copy link
Contributor

for the last message, something's wrong with your DNS setup (by default, metrics-server looks up node DNS names to connect to the kubelet, but this can be changed with one of the metrics-server flags)

for the authentication.go:62] Unable to authenticate the request due to an error message, something controller is trying to connect without a proper cert set up. Make sure you turn on the per-controller service accounts.

@gabrielfsousa
Copy link
Author

gabrielfsousa commented Sep 25, 2018

what is the parameter to change to ip ?

is strange with version 0.2.1 works

i have deploy all the files in deploy/1.8+/

@gabrielfsousa
Copy link
Author

tried --kubelet-preferred-address-types=ExternalIP but same problem

@gabrielfsousa
Copy link
Author

gabrielfsousa commented Sep 25, 2018

did this:

lifecycle:
         postStart:
           exec:
             command:
               - "/bin/sh"
               - "-ec"
               - |
                 echo "10.7.68.20 k8sdev01" >> /etc/hosts
                 echo "10.7.68.21 k8sdev02" >> /etc/hosts
                 echo "10.7.68.22 k8sdev03" >> /etc/hosts

tested on pod if can resolve the name

/ # nc -zv k8sdev01 10250
k8sdev01 (10.7.68.20:10250) open

but i have the same error

@DirectXMan12
Copy link
Contributor

which error? The lookup error?

@gabrielfsousa
Copy link
Author

gabrielfsousa commented Sep 25, 2018

no, sorry

 E0925 19:49:16.320673       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: x509: certificate signed by unknown authority]
E0925 19:49:16.454403       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 19:49:16.454429       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 19:49:16.454433       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node 

@glitch-k8s
Copy link

glitch-k8s commented Oct 9, 2018

I am facing the same issue with metrics server.
I tried --
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --kubelet-insecure-tls

But metrics-server is looking by hostname only:
E1009 05:37:26.888923 1 reststorage.go:129] unable to fetch node metrics for node "k8s-master.cluster.k8.local": no metrics known for node
E1009 05:37:26.889005 1 reststorage.go:129] unable to fetch node metrics for node "node03.cluster.k8.local": no metrics known for node
E1009 05:37:26.889022 1 reststorage.go:129] unable to fetch node metrics for node "node02.cluster.k8.local": no metrics known for node

@prodanlabs
Copy link

I also encountered this problem.
When the name of the node is changed to IP, it is working normally.

When the name of the node is the host name, metrics-server cannot resolve the host.

E1017 08:56:31.136817       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host
E1017 08:57:01.047583       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host
E1017 08:57:31.049286       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host

@cdyue
Copy link

cdyue commented Oct 17, 2018

same problem

@DirectXMan12
Copy link
Contributor

If you want to prioritize certain address types, please use the --kubelet-preferred-address-types flag

@DirectXMan12
Copy link
Contributor

As for the certificate errors, it's probably due to a different kubelet CA from the main cluster CA. We've got an issue open to add a flag for supporting a setup where trusting the main cluster CA doesn't automatically trust the kubelet CA.

@fentas
Copy link

fentas commented Feb 20, 2019

Having exactly the situation as @d-peng describes.
Is there currently a workaround or do I need to rename all my nodes?

edit nvm I recreated the cluster.

@alfonmga
Copy link

alfonmga commented Mar 5, 2019

I solved this by adding --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP flags.

If you installed metrics-server using Helm then you can apply this change by running kubectl edit deployment metrics-server -n <your-namespace>.

It should look like this:

screen shot 2019-03-05 at 1 10 03 am

@qianliusi
Copy link

I solved this by adding node ip to coredns:
kubectl edit configmap coredns -n kube-system

apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
192.168.199.100 master.qls.com
192.168.199.220 node01.qls.com
192.168.199.215 node02.qls.com
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap

@cainzhong
Copy link

I am facing the same issue with metrics server.
I tried --
command:

  • /metrics-server
  • --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
  • --kubelet-insecure-tls

But metrics-server is looking by hostname only:
E1009 05:37:26.888923 1 reststorage.go:129] unable to fetch node metrics for node "k8s-master.cluster.k8.local": no metrics known for node
E1009 05:37:26.889005 1 reststorage.go:129] unable to fetch node metrics for node "node03.cluster.k8.local": no metrics known for node
E1009 05:37:26.889022 1 reststorage.go:129] unable to fetch node metrics for node "node02.cluster.k8.local": no metrics known for node

command:
- /metrics-server
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --kubelet-insecure-tls

works for me.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 8, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 7, 2019
@wajdi-datalvo
Copy link

Same problem here. Pod log error message:

unable to fetch node metrics for node "k8s-master-1": no metrics known for node

@benileo
Copy link

benileo commented Oct 1, 2019

I have the same issue. However, I'm able to get metrics for 1 node. I can confirm that the only node reporting metrics has the metrics-server running on it.

$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-50-10-149.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-10-203.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-10-243.ca-central-1.compute.internal   Ready    <none>   14h   v1.15.4
ip-10-50-20-157.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-20-170.ca-central-1.compute.internal   Ready    <none>   14h   v1.15.4

$ kubectl top nodes
NAME                                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
ip-10-50-20-170.ca-central-1.compute.internal   141m         7%     1606Mi          86%         
ip-10-50-20-157.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-149.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-203.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-243.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               

$ ssh 10.50.20.170 'docker ps | grep k8s_metrics-server'
2fd50cb6148e        k8s.gcr.io/metrics-server-amd64                                "/metrics-server --m…"   8 minutes ago       Up 8 minutes                            k8s_metrics-server_metrics-server-67c6d6566d-8bzfx_kube-system_652f2ab3-139f-427f-8f98-e772d6e97a54_0

From logs:
unable to fetch node metrics for node "ip-10-50-20-157.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-243.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-149.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-203.ca-central-1.compute.internal": no metrics known for node

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests