Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request #157

Closed
abizake opened this issue Oct 12, 2018 · 30 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@abizake
Copy link

abizake commented Oct 12, 2018

API Server Logs :-

1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1012 08:23:25.282353 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I1012 08:23:25.282377 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E1012 08:23:25.396126 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1012 08:23:25.991550 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1012 08:23:46.469237 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1012 08:23:55.440941 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1012 08:23:55.789103 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1012 08:24:25.477704 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1012 08:24:25.705399 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1012 08:24:55.516394 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1012 08:24:55.719712 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1012 08:25:13.395961 1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.105.54.184:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I1012 08:25:25.282682 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1012 08:25:25.282944 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[X-Content-Type-Options:[nosniff] Content-Type:[text/plain; charset=utf-8]]
I1012 08:25:25.282969 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E1012 08:25:25.563266 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Controller Logs :-
E1012 08:26:57.910695 1 memcache.go:134] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1012 08:27:13.214427 1 resource_quota_controller.go:430] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W1012 08:27:17.126343 1 garbagecollector.go:647] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Metric Server Logs :-

I1012 08:22:11.248135 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/10/12 08:22:12 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1012 08:22:12.537437 1 serve.go:96] Serving securely on [::]:443

Kubernetes Version :- 1.12.1

Metric Server Deployment YAML :-

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

Any help is appreciated.

@cdenneen
Copy link

cdenneen commented Nov 2, 2018

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

@ag237
Copy link

ag237 commented Dec 6, 2018

We are also seeing this issue on Kubernetes version 1.10.11, metrics-server v0.3.1.

The error doesn't occur all the time, but seemingly randomly.

HPA is also not working:

Warning FailedGetResourceMetric 12s (x200 over 100m) horizontal-pod-autoscaler unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Seeing a lot of these errors in the metrics-server logs:

I1206 21:52:20.330969 1 round_trippers.go:386] curl -k -v -XPOST -H "User-Agent: metrics-server/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "Authorization: Bearer 8493204" 'https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews' I1206 21:52:20.336659 1 round_trippers.go:405] POST https://100.64.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 5 milliseconds I1206 21:52:20.336730 1 round_trippers.go:411] Response Headers: I1206 21:52:20.336753 1 round_trippers.go:414] Content-Type: application/json I1206 21:52:20.336823 1 round_trippers.go:414] Content-Length: 260 I1206 21:52:20.336850 1 round_trippers.go:414] Date: Thu, 06 Dec 2018 21:52:20 GMT I1206 21:52:20.336924 1 request.go:897] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}} I1206 21:52:20.337051 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.337169 1 wrap.go:42] GET /: (6.608284ms) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.342685 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.342881 1 wrap.go:42] GET /: (283.678µs) 403 [[Go-http-client/2.0] 100.103.86.128:53412] I1206 21:52:20.348443 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.348594 1 wrap.go:42] GET /: (211.166µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472] I1206 21:52:20.353395 1 authorization.go:73] Forbidden: "/", Reason: "" I1206 21:52:20.353521 1 wrap.go:42] GET /: (225.997µs) 403 [[Go-http-client/2.0] 10.150.238.46:34472]

And about the time the 'unable to handle request' error gets thrown, we see this in the API server logs:

{"timestamp":1544123089981,"log":"E1206 19:04:49.196449 1 available_controller.go:295] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again","stream":"stdout","time":"2018-12-06T19:04:49.196573682Z","docker":{"container_id":"193297c980e6dd2380e420d23c023d20c80422ef8fa1ca1d26d23c64c13cbc42"},"kubernetes":{"container_name":"kube-apiserver","namespace_name":"kube-system","pod_name":"kube-apiserver-ip-.ec2.internal","pod_id":"4d0c7c67-f971-11e8-87f5-0edeec0b08fa","labels":{"k8s-app":"kube-apiserver"},"host":"","master_url":"https://100.64.0.1:443/api","namespace_id":"1598f027-deb5-11e8-8c15-020b58ac630a"}}

@ysolis
Copy link

ysolis commented Dec 9, 2018

I had this problem, in my case i am using Kops 1.10 with a Gossip based cluster, i added to lines in my deploy/1.8+/metrics-server-deployment.yaml file:

     containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        imagePullPolicy: Always
        # changed to use kubelet unsecure tls and internal ip
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

and after this, kubectl top... worked after 5 minutes

@ag237
Copy link

ag237 commented Feb 6, 2019

Coming back around to this, I am still seeing these errors with metrics-server.

Here is my config:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:44:19Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
spec:
  containers:
  - command:
    - /metrics-server
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP
image: gcr.io/google_containers/metrics-server-amd64:v0.3.1

If I spam kubectl top node I will most times get a response, however randomly I will get this error:

I0206 10:58:13.064778   40360 helpers.go:198] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server is currently unable to handle the request (get nodes.metrics.k8s.io)",
  "reason": "ServiceUnavailable",
  "details": {
    "group": "metrics.k8s.io",
    "kind": "nodes",
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "service unavailable"
      }
    ]
  },
  "code": 503
}]
F0206 10:58:13.064830   40360 helpers.go:116] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

I'm seeing this in the apiserver logs:

kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.697886       1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://1.1.1.1:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
kube-apiserver-ip-1-1-1-1.ec2.internal:kube-apiserver E0206 15:57:32.729065       1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again

@abizake
Copy link
Author

abizake commented Apr 17, 2019

The issue still exists in V 1.13.3

@ag237
Copy link

ag237 commented Apr 17, 2019

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

@abizake
Copy link
Author

abizake commented Apr 17, 2019

@ag237 Thanks for sharing this . Any reasons on how did this got fix when you enabled host network

@luckymagic7
Copy link

did you solve the problem? I have a same issue.

kops: Version 1.11.1 (git-0f2aa8d30)
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

The master node's api-server log says:OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
v1beta1.metrics.k8s.io failed with: Get https://$CLUSTER-IP:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers).
My master nodes' SG have 443 port access from everywhere.

Any ideas?

@ghost
Copy link

ghost commented Jul 10, 2019

@abizake I think you are also unable to reach pods on the other nodes.
If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes.
Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

@abizake
Copy link
Author

abizake commented Jul 12, 2019

@abizake I think you are also unable to reach pods on the other nodes.
If so, ensure UDP ports 8285 and 8472 ( For Flannel ) are open on all nodes.
Ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

In the above mentioned step , I am actually using calico. The respective ports of calico are open and pods from other nodes are reachable.

@Constantin07
Copy link

Having the same issue with Kubernetes 1.15.2 on Ubuntu 18.04 nodes.
Using Calico as SDN.

kubectl logs metrics-server-ddd54b5c5-mxxb7
I0812 20:56:27.161337       1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0812 20:56:32.059998       1 manager.go:95] Scraping metrics from 0 sources
I0812 20:56:32.060020       1 manager.go:150] ScrapeMetrics: time: 1.003µs, nodes: 0, pods: 0
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] listing is available at https://:8443/swaggerapi
[restful] 2019/08/12 20:56:32 log.go:33: [restful/swagger] https://:8443/swaggerui/ is mapped to folder /swagger-ui/
I0812 20:56:32.467070       1 serve.go:96] Serving securely on [::]:8443
I0812 20:57:32.060888       1 manager.go:95] Scraping metrics from 4 sources
I0812 20:57:32.063876       1 manager.go:120] Querying source: kubelet_summary:master-node2.internal
I0812 20:57:32.068071       1 manager.go:120] Querying source: kubelet_summary:worker-node1.internal
I0812 20:57:32.068425       1 manager.go:120] Querying source: kubelet_summary:worker-node2.internal
I0812 20:57:32.088240       1 manager.go:120] Querying source: kubelet_summary:master-node1.internal
I0812 20:57:32.256135       1 manager.go:150] ScrapeMetrics: time: 195.03251ms, nodes: 4, pods: 23
 kubectl  top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

@AlexRRR
Copy link

AlexRRR commented Sep 10, 2019

I can confirm #157 (comment) helps

Not using the helm chart, I added to the manifest under spec/template/spec hostNetwork: true and now it is working.

Also I am using the flags

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP    

@edsonmarquezani
Copy link

edsonmarquezani commented Sep 25, 2019

I've been having several problems in the cluster because of this, including HPA. #157 (comment) seems to nail it, indeed, but I'm still wondering what's the actual problem. Setting hostNetwork=true shouldn't be necessary at all.

@Lincoln-dac
Copy link

i added hostNetwork: true but my problem no fix the apiserver report log "kube-controller-manager: E1011 13:37:24.015616 33182 resource_quota_controller.go:407] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request“”

@serathius
Copy link
Contributor

I don't think metrics-server was meant to run in host network. I think it's a problem with particular overlay network, but it's not my expertise.

Metrics Server uses https://github.com/kubernetes/kube-aggregator to register into Apiserver maybe you could find answers there?

Still it would be useful to document on how metrics server provides Metrics API and what requirements it poses on Network

@AshishThakur
Copy link

The comment in values.yaml https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml mentions that might be required if we use Weave network on EKS. We faced a similar problem in EKS using AWS CNI and this issue seems to fix the problem. I believe this is more a band-aid solution and the root is somewhere else.

hostNetwork:
  # Specifies if metrics-server should be started in hostNetwork mode.
  #
  # You would require this enabled if you use alternate overlay networking for pods and
  # API server unable to communicate with metrics-server. As an example, this is required
  # if you use Weave network on EKS
  enabled: false

@ctran
Copy link

ctran commented Dec 6, 2019

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

Thanks, this is gold!!!

@serathius serathius added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2019
@mojiewhy
Copy link

mojiewhy commented Jan 5, 2020

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

Thanks, this is gold!!!

What is SG?

@seh
Copy link

seh commented Jan 5, 2020

Probably "security group," in the context of AWS EC2.

@Vishal2696
Copy link

Check that your ControlPlane can reach your DataPlane on 443 (I had to modify SG for both to allow this and it worked)

How to check this? I have my cluster hosted on Azure AKS

@serathius serathius added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 7, 2020
@serathius
Copy link
Contributor

Closing per Kubernetes issue triage policy

GitHub is not the right place for support requests.
If you're looking for help, check Stack Overflow and the troubleshooting guide.
You can also post your question on the Kubernetes Slack or the Discuss Kubernetes forum.
If the matter is security related, please disclose it privately via https://kubernetes.io/security/.

@lampnick
Copy link

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

It works, thanks!

@philippefutureboy
Copy link

philippefutureboy commented Jan 31, 2023

Note that if you are using GKE (Google Cloud Kubernetes Engine) and that your cluster has been without containers for a long time (multiple days), then GKE decommissions the nodes from the cluster (so as to save you costs). As such, without nodes, the control plane processes cannot start. So if that's your case, all is good!
Just run an image or deploy a deployment and everything should start working as per usual :)

@solomonshorser
Copy link

@philippefutureboy I'm having this problem in GKE, and yes, my cluster was idle, but I've run two DAGs over the last hour and still it does not work. Is there any other way to revive it?

@philippefutureboy
Copy link

No unfortunately the issue has started persisting through spinning new pods on my side as well 😕

@solomonshorser
Copy link

No unfortunately the issue has started persisting through spinning new pods on my side as well 😕

Oh. I'm trying to delete a namespace and it can't be deleted because of

'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
unable to handle the request'

@solomonshorser
Copy link

Ah, there's a GKE trouble-shooting guide here: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#namespace_stuck_in_terminating_state

@Kipkemoii
Copy link

As an update to this, all my issues with metrics-server went away after I set

hostNetwork:
  enabled: true

in the stable helm chart

https://github.com/helm/charts/tree/master/stable/metrics-server

Thanks for sharing this. It worked for me

@paolo-depa
Copy link

Same issue: turning off the firewall worked for me (...yeah, quite overkilling but I have no time for fine tuning right now...)

@mhemken-vts
Copy link

mhemken-vts commented Aug 23, 2023

My solution was this:

❯ kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

I did not have the metrics server installed, nor did I need it. At some point somebody installed it and uninstalled it. But the uninstallation was not complete. We had these lingering resources:

clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests