Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAfter GKE 1.6 upgrade kubernetes nodes metrics endpoint returns 401 #2606
Comments
This comment has been minimized.
This comment has been minimized.
|
Your kubelet needs client certs for authorization. Not sure where you can get those on GKE, but you need to insert them in your Also note that since RBAC is on by default in 1.6, you need to make sure that the Come to think about it, it would probably be a good idea to include canonical |
This comment has been minimized.
This comment has been minimized.
|
The token and ca.crt are present in the prometheus container, so apparently the default token no longer has access to the node metrics. This person tried various things with ClusterRole without any effect: http://serverfault.com/questions/843751/kubernetes-node-metrics-endpoint-returns-401/ What would it need to look like? |
This comment has been minimized.
This comment has been minimized.
|
There are two separate problems: scraping the kubelet requires the client certificates the apiserver uses to communicate with it, and an RBAC role so Prometheus is authorized to access the In CoreOS Tectonic we have a separate secret that contains the certificates the apiserver uses to communicate with the kubelet. We mount that secret into the Prometheus container for Prometheus to be able to authenticate with the kubelet and scrape it. No RBAC is required for this part. The |
This comment has been minimized.
This comment has been minimized.
|
Thanks, I'll work on setting up the ServiceAccount, ClusterRole and ClusterRoleBinding. For the untrusted kubelet certificate I still resort to 'insecure_skip_verify: true', but will have a look at a solution similar to what you use. I'll post the extra manifests I needed to this ticket once it's up and running and will close after. |
This comment has been minimized.
This comment has been minimized.
This is still an unsolved problem for us as well, as the kubelets still create their TLS certs on the fly (therefore self-signed), only in addition requiring client certs. |
This comment has been minimized.
This comment has been minimized.
|
I think we can close here as this is mainly an operational issue, if anything else comes up, please take it to the prometheus users mailing list as it has better visibility than here and other users can better from it benefit. Thanks! |
brancz
closed this
Apr 11, 2017
JorritSalverda
referenced this issue
Apr 11, 2017
Closed
GKE 1.6 kubelet /metrics endpoint unauthorized over https #44330
This comment has been minimized.
This comment has been minimized.
|
It turned out that for some reason the kubelet is no longer accessible over https on port 10250. Changing the scrape address to use http and port 10255 provides an acceptable workaround for now:
|
JorritSalverda
referenced this issue
Apr 12, 2017
Closed
Switch kubelet scraping to http and port 10255 #2613
This comment has been minimized.
This comment has been minimized.
|
I'd like to have this reopened, as according to comment kubernetes/kubernetes#44330 (comment) GKE is not going to support RBAC based authentication for the kubelet while closing of anonymous access. It either needs to be proxied via apiserver or use a client side certificate created by certificates api. |
This comment has been minimized.
This comment has been minimized.
|
How does "hitting the apiserver proxy" work? What URLs need to be hit? |
matthiasr
reopened this
Apr 14, 2017
This comment has been minimized.
This comment has been minimized.
|
I don't think there is anything we can fundamentally so if the kubelet doesn't expose the metrics in a usable form, but if there is a way we should document it. |
This comment has been minimized.
This comment has been minimized.
|
@JorritSalverda maybe you can use relabeling to bend the discovered nodes into being scraped through the apiserver? Similar to how the blackbox and SNMP exporters are addressed. |
This comment has been minimized.
This comment has been minimized.
mikedanese
commented
Apr 14, 2017
•
|
You can access node metrics by hitting the kubernetes master, e.g.:
Or you can use TLS client auth |
This comment has been minimized.
This comment has been minimized.
|
Thanks @mikedanese that actually works. Now I have to figure out how to get master ip during node relabeling and than I'm set for now and the future. @matthiasr do you have an idea for this? The documentation only shows the following meta labels to be available for the node role:
|
This comment has been minimized.
This comment has been minimized.
mikedanese
commented
Apr 18, 2017
|
Do you need the actual IP or can you use the 'kubernetes' service in the default namespace? |
This comment has been minimized.
This comment has been minimized.
yuvaldrori
commented
Apr 18, 2017
•
|
This worked for me:
|
This comment has been minimized.
This comment has been minimized.
|
For my Prometheus server running inside GKE I now have it running with the following relabeling:
And the following ClusterRole bound to the service account used by Prometheus:
Because the GKE cluster still has an ABAC fallback in case RBAC fails I'm not 100% sure yet this covers all required permissions. |
This comment has been minimized.
This comment has been minimized.
|
I don't think we have a general way to get the master IP, but configuring the Kubernetes discovery requires it anyway. The example config assumes Prometheus running in the cluster, so in that case the server address can just be @JorritSalverda @yuvaldrori would you mind opening a PR against the example configuration to add this for the node job, and a comment mentioning the necessary role? Add the role as a separate file in the same directory. (At some point maybe we should reorganise the example directory to group kubernetes stuff, but that will break inbound links, to let's not tie it in with this.) |
This comment has been minimized.
This comment has been minimized.
yuvaldrori
commented
Apr 19, 2017
|
@matthiasr some points to consider before I open a PR:
|
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
yuvaldrori
commented
Apr 19, 2017
|
This comment has been minimized.
This comment has been minimized.
|
Sure, I'll work on the PR and leave this ticket open until that's fixed and merged. |
This comment has been minimized.
This comment has been minimized.
|
PR is at #2641 |
This comment has been minimized.
This comment has been minimized.
bviolier
commented
Apr 22, 2017
•
|
I ran into this issue and tried to resolve it by applying the rbac-setup.yaml as IAM user on kubectl (which has all rights, by default, right?), but I got the error:
Am I doing anything wrong? |
This comment has been minimized.
This comment has been minimized.
|
@bviolier as a safety measure you're only allowed to hand out permissions your own account alread has. In Google Cloud IAM you should assign yourself the Container Engine Cluster Admin role, which should allow you to apply this manifest. It seems to take a while before this is effective though. A faster way that worked for me is to add a ClusterRoleBinding for your own user to assign yourself the cluster-admin role using the following manifest: apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: additional-cluster-admins
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: User
name: <your email address as used in GCE> |
This comment has been minimized.
This comment has been minimized.
discostur
commented
Apr 27, 2017
|
I just deployed a new k8s 1.6 cluster (on premise) with prometheus and the following RBAC rules:
My API-server and custom endpoints get scraped, but prometheus is not able to scrape the node endpoints. It still gets the following error:
Can you tell me what rules i have to add to my RBAC? In the Log-Output i can't see any errors ... |
This comment has been minimized.
This comment has been minimized.
mikedanese
commented
Apr 27, 2017
|
YOu can't hit the nodes directly. You need to use the config here: |
This comment has been minimized.
This comment has been minimized.
mikedanese
commented
Apr 27, 2017
|
Also see the PR that updates the documentation: |
This comment has been minimized.
This comment has been minimized.
discostur
commented
Apr 27, 2017
|
@mikedanese after changing the relabel config as you said i still get the error:
Did i miss something? |
This comment has been minimized.
This comment has been minimized.
discostur
commented
Apr 27, 2017
•
|
Forgot the
rule in the cluster role, now it is working! Thanks @mikedanese for helping me out! |
kujenga
added a commit
to kujenga/charts
that referenced
this issue
May 15, 2017
viglesiasce
added a commit
to helm/charts
that referenced
this issue
Jul 6, 2017
This comment has been minimized.
This comment has been minimized.
|
I believe this is all resolved now. |
brian-brazil
closed this
Jul 14, 2017
yanns
pushed a commit
to yanns/charts
that referenced
this issue
Jul 28, 2017
zatricky
referenced this issue
Aug 31, 2017
Merged
Updates for 1.7 cadvisor changes and rbac requirements #73
TinySong
added a commit
to TinySong/kubernete-mainifest
that referenced
this issue
Sep 9, 2017
songbinliu
referenced this issue
Nov 8, 2017
Closed
[feature] support google container engine -- GKE #120
champtar
referenced this issue
Jan 17, 2018
Closed
Query kubelet via proxy / metricRelabelings issue #902
This comment has been minimized.
This comment has been minimized.
deepti-cloudibility
commented
Feb 11, 2019
Hi how did you changed the port from 10250 to 10255, since for me its not working on 10255 but when I'm curlig on ip:10250 it gives me output. |
JorritSalverda commentedApr 11, 2017
•
edited
What did you do?
After upgrading a GKE cluster - both master and nodes - to 1.6.0 the
job_name: 'kubernetes-nodes'as specified in the k8s configuration example results in all the node /metrics endpoints returningWhat did you expect to see?
The node /metrics endpoints to be scraped as before upgrading to 1.6.0 (previous version was 1.5.6).
What did you see instead? Under which circumstances?
All the endpoints for kubernetes-nodes as down with the
server returned HTTP status 401 Unauthorizederror.Environment