Metrics server fails to pull metrics from Windows workers #91575
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/windows
Categorizes an issue or PR as relevant to SIG Windows.
/sig windows
What happened:
I have a Kubernetes cluster with 2 Windows worker nodes. When I run
kubectl top nodes
the Windows nodes report as unknown. I did some investigating and I am seeing errors in the logs.What you expected to happen:
When I do
kubectl top nodes
orkubectl top pods
the Windows workers are included.How to reproduce it (as minimally and precisely as possible):
No idea, One Windows node was reporting metrics yesterday but today neither are reporting metrics.
Anything else we need to know?:
Environment:
kubectl version
): 1.17.5Rancher managed cluster of VMs on vSphere. 2 CentOS 7 managers, 2 CentOS 7 workers, 2 Windows Server 2019 1809 workers
cat /etc/os-release
): CentOS Linux 7 (Core)uname -a
): 3.10.0-1062.12.1.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/LinuxLogs from metric server
I saw an error trying to fetch the metrics so from the node running the metrics server pod (
qa-k8sm-02
I rancurl -v -k https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true
The connection to the server worked so I looked closer at the error and it was a 500 so I thought there was an issue with the pod on the Windows server so I views the logs of
cattle-node-agent-windows-gkrf9
So I didn't see any errors there so I logged into the Windows server and started looking at container logs and when I viewed the logs for the kubelet I saw a bunch of errors...
So the underlying cause appears to be
A virtual machine or container with the specified identifier does not exist
The text was updated successfully, but these errors were encountered: