Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0 wont work - reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01" #143

Closed
gabrielfsousa opened this issue Sep 25, 2018 · 25 comments

Comments

@gabrielfsousa
Copy link

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

updated v0.2.1 to v0.3.0.
on 0.2.1 was working

on v0.3.0 gives this error:

 E0925 16:47:40.840589       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01"
E0925 16:47:40.840616       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev02": no metrics known for node "k8sdev02"
E0925 16:47:40.840621       1 reststorage.go:101] unable to fetch node metrics for node "k8sdev03": no metrics known for node "k8sdev03"
E0925 16:47:40.874031       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 
E0925 16:46:55.894491       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: dial tcp: lookup k8sdev02 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: dial tcp: lookup k8sdev03 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: dial tcp: lookup k8sdev01 on 10.96.0.10:53: server misbehaving]
@gabrielfsousa gabrielfsousa changed the title 3.0 wont work - Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 3.0 wont work - Unable to authenticate the request due to an error: [x509: Sep 25, 2018
@gabrielfsousa gabrielfsousa changed the title 3.0 wont work - Unable to authenticate the request due to an error: [x509: 3.0 wont work - reststorage.go:101] unable to fetch node metrics for node "k8sdev01": no metrics known for node "k8sdev01" Sep 25, 2018
@DirectXMan12

This comment has been minimized.

Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Sep 25, 2018

metrics-server doesn't know how to trust the serving certificates on your kubelets. First, double check what happens with the --insecure-kubelet. Then, if that works, it probably means you're using self-signed kubelet certs, or something similar.

@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

Error: unknown flag: --insecure-kubelet

tried with --kubelet-insecure-tls, and i have the same error

 E0925 17:27:26.732588       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
E0925 17:27:28.408445       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:28.408467       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:28.408471       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:29.060940       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:29.060960       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:29.060964       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:29.459832       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node
E0925 17:27:29.459850       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 17:27:29.459854       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 17:27:33.030281       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")] 
@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

on version 0.2.1 works , even with the error

 E0925 17:27:26.732588       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]

on version 0.2.1 i have the certificate but works

[root@k8sdev03 ~]$ kubectl top nodes
NAME       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
k8sdev01   433m         8%        6977Mi          51%
k8sdev02   362m         7%        7028Mi          51%
k8sdev03   374m         7%        7237Mi          53%
@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

E0925 17:50:02.785406       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: dial tcp: lookup k8sdev02 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: dial tcp: lookup k8sdev01 on 10.96.0.10:53: server misbehaving, unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: dial tcp: lookup k8sdev03 on 10.96.0.10:53: server misbehaving]
@DirectXMan12

This comment has been minimized.

Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Sep 25, 2018

for the last message, something's wrong with your DNS setup (by default, metrics-server looks up node DNS names to connect to the kubelet, but this can be changed with one of the metrics-server flags)

for the authentication.go:62] Unable to authenticate the request due to an error message, something controller is trying to connect without a proper cert set up. Make sure you turn on the per-controller service accounts.

@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

what is the parameter to change to ip ?

is strange with version 0.2.1 works

i have deploy all the files in deploy/1.8+/

@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

tried --kubelet-preferred-address-types=ExternalIP but same problem

@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

did this:

lifecycle:
         postStart:
           exec:
             command:
               - "/bin/sh"
               - "-ec"
               - |
                 echo "10.7.68.20 k8sdev01" >> /etc/hosts
                 echo "10.7.68.21 k8sdev02" >> /etc/hosts
                 echo "10.7.68.22 k8sdev03" >> /etc/hosts

tested on pod if can resolve the name

/ # nc -zv k8sdev01 10250
k8sdev01 (10.7.68.20:10250) open

but i have the same error

@DirectXMan12

This comment has been minimized.

Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Sep 25, 2018

which error? The lookup error?

@gabrielfsousa

This comment has been minimized.

Copy link
Author

@gabrielfsousa gabrielfsousa commented Sep 25, 2018

no, sorry

 E0925 19:49:16.320673       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8sdev03: unable to fetch metrics from Kubelet k8sdev03 (k8sdev03): Get https://k8sdev03:10250/stats/summary/: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:k8sdev01: unable to fetch metrics from Kubelet k8sdev01 (k8sdev01): Get https://k8sdev01:10250/stats/summary/: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:k8sdev02: unable to fetch metrics from Kubelet k8sdev02 (k8sdev02): Get https://k8sdev02:10250/stats/summary/: x509: certificate signed by unknown authority]
E0925 19:49:16.454403       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev02": no metrics known for node
E0925 19:49:16.454429       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev03": no metrics known for node
E0925 19:49:16.454433       1 reststorage.go:129] unable to fetch node metrics for node "k8sdev01": no metrics known for node 
@nishantsh77

This comment has been minimized.

Copy link

@nishantsh77 nishantsh77 commented Oct 9, 2018

I am facing the same issue with metrics server.
I tried --
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --kubelet-insecure-tls

But metrics-server is looking by hostname only:
E1009 05:37:26.888923 1 reststorage.go:129] unable to fetch node metrics for node "k8s-master.cluster.k8.local": no metrics known for node
E1009 05:37:26.889005 1 reststorage.go:129] unable to fetch node metrics for node "node03.cluster.k8.local": no metrics known for node
E1009 05:37:26.889022 1 reststorage.go:129] unable to fetch node metrics for node "node02.cluster.k8.local": no metrics known for node

@ProdanLabs

This comment has been minimized.

Copy link

@ProdanLabs ProdanLabs commented Oct 17, 2018

I also encountered this problem.
When the name of the node is changed to IP, it is working normally.

When the name of the node is the host name, metrics-server cannot resolve the host.

E1017 08:56:31.136817       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host
E1017 08:57:01.047583       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host
E1017 08:57:31.049286       1 manager.go:102] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:kube-master: unable to fetch metrics from Kubelet kube-master (kube-master): Get http://kube-master:10255/stats/summary/: dial tcp: lookup kube-master on 10.254.0.2:53: no such host
@cdyue

This comment has been minimized.

Copy link

@cdyue cdyue commented Oct 17, 2018

same problem

@DirectXMan12

This comment has been minimized.

Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Nov 8, 2018

If you want to prioritize certain address types, please use the --kubelet-preferred-address-types flag

@DirectXMan12

This comment has been minimized.

Copy link
Contributor

@DirectXMan12 DirectXMan12 commented Nov 8, 2018

As for the certificate errors, it's probably due to a different kubelet CA from the main cluster CA. We've got an issue open to add a flag for supporting a setup where trusting the main cluster CA doesn't automatically trust the kubelet CA.

@fentas

This comment has been minimized.

Copy link

@fentas fentas commented Feb 20, 2019

Having exactly the situation as @D-PENG describes.
Is there currently a workaround or do I need to rename all my nodes?

edit nvm I recreated the cluster.

@alfonmga

This comment has been minimized.

Copy link

@alfonmga alfonmga commented Mar 5, 2019

I solved this by adding --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP flags.

If you installed metrics-server using Helm then you can apply this change by running kubectl edit deployment metrics-server -n <your-namespace>.

It should look like this:

screen shot 2019-03-05 at 1 10 03 am

@qianliusi

This comment has been minimized.

Copy link

@qianliusi qianliusi commented Mar 28, 2019

I solved this by adding node ip to coredns:
kubectl edit configmap coredns -n kube-system

apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
192.168.199.100 master.qls.com
192.168.199.220 node01.qls.com
192.168.199.215 node02.qls.com
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap

@cainzhong

This comment has been minimized.

Copy link

@cainzhong cainzhong commented May 10, 2019

I am facing the same issue with metrics server.
I tried --
command:

  • /metrics-server
  • --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
  • --kubelet-insecure-tls

But metrics-server is looking by hostname only:
E1009 05:37:26.888923 1 reststorage.go:129] unable to fetch node metrics for node "k8s-master.cluster.k8.local": no metrics known for node
E1009 05:37:26.889005 1 reststorage.go:129] unable to fetch node metrics for node "node03.cluster.k8.local": no metrics known for node
E1009 05:37:26.889022 1 reststorage.go:129] unable to fetch node metrics for node "node02.cluster.k8.local": no metrics known for node

command:
- /metrics-server
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --kubelet-insecure-tls

works for me.

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented Aug 8, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented Sep 7, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@wajditcb

This comment has been minimized.

Copy link

@wajditcb wajditcb commented Sep 25, 2019

Same problem here. Pod log error message:

unable to fetch node metrics for node "k8s-master-1": no metrics known for node

@benileo

This comment has been minimized.

Copy link

@benileo benileo commented Oct 1, 2019

I have the same issue. However, I'm able to get metrics for 1 node. I can confirm that the only node reporting metrics has the metrics-server running on it.

$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-50-10-149.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-10-203.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-10-243.ca-central-1.compute.internal   Ready    <none>   14h   v1.15.4
ip-10-50-20-157.ca-central-1.compute.internal   Ready    master   14h   v1.15.4
ip-10-50-20-170.ca-central-1.compute.internal   Ready    <none>   14h   v1.15.4

$ kubectl top nodes
NAME                                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
ip-10-50-20-170.ca-central-1.compute.internal   141m         7%     1606Mi          86%         
ip-10-50-20-157.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-149.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-203.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               
ip-10-50-10-243.ca-central-1.compute.internal   <unknown>                           <unknown>               <unknown>               <unknown>               

$ ssh 10.50.20.170 'docker ps | grep k8s_metrics-server'
2fd50cb6148e        k8s.gcr.io/metrics-server-amd64                                "/metrics-server --m…"   8 minutes ago       Up 8 minutes                            k8s_metrics-server_metrics-server-67c6d6566d-8bzfx_kube-system_652f2ab3-139f-427f-8f98-e772d6e97a54_0

From logs:
unable to fetch node metrics for node "ip-10-50-20-157.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-243.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-149.ca-central-1.compute.internal": no metrics known for node
unable to fetch node metrics for node "ip-10-50-10-203.ca-central-1.compute.internal": no metrics known for node

@fejta-bot

This comment has been minimized.

Copy link

@fejta-bot fejta-bot commented Oct 31, 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Oct 31, 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mritd added a commit to mritd/kubeadm-config that referenced this issue Feb 15, 2020
fix "no metrics known for node" error

refs kubernetes-sigs/metrics-server#143

Signed-off-by: mritd <mritd@linux.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.