Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch kubelet scraping to http and port 10255 #2613

Closed
JorritSalverda opened this Issue Apr 12, 2017 · 15 comments

Comments

Projects
None yet
4 participants
@JorritSalverda
Copy link
Contributor

JorritSalverda commented Apr 12, 2017

Since kubernetes version 1.6 the encrypted kubelet metrics endpoint on port 10250 requires authorization which makes it unusable, see #2606 and kubernetes/kubernetes#11816 (comment). This is done to beef up security for communication to the kubelet.

The metrics endpoint will keep being available over http on port 10255 according to the kubernetes ticket.

A viable workaround is to rewrite the endpoint in relabel_configs like below:

      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__address__]
        action: replace
        target_label: __address__
        regex: ([^:;]+):(\d+)
        replacement: ${1}:10255
      - source_labels: [__scheme__]
        action: replace
        target_label: __scheme__
        regex: https
        replacement: http

It would be nice though to switch to the 10255 port in the Prometheus code though. Unfortunately currently the nodeSpec doesn't expose that port anywhere, so until that's there the workaround works fine. Perhaps add it to your kubernetes config example?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

Thanks for pointing this out. There's a new comment kubernetes/kubernetes#11816 (comment) which suggests that we shouldn't count on the unsecured port 10255 endpoint to be around in the future.

Prometheus does support auth for scrape requests though. Does this part of the example Kubernetes config not work as advertised? https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L56-L71 (it should set the right CA and bearer token from the automounted /var/run/secrets/kubernetes.io/serviceaccount/... files).

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

Note that in this example file, that section is missing from the node (kubelet) discovery section (https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L73-L78), so maybe it just needs to be added there too?

@hvalle

This comment has been minimized.

Copy link

hvalle commented Apr 12, 2017

@juliusv The problem I've seen is that now kubelet does not allow anonymous authentications since 1.6 by default. --anonymous-auth=false

That means that even when adding the tls and bearer token, it won't authenticate.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

@hvalle Not a k8s auth expert, but if you create a right clusterrole and binding with access to the /metrics resource, and you run Prometheus under a serviceaccount with that role, it should work?

Check out these example configs for the cluster role, cluster role binding, and service account: https://github.com/coreos/prometheus-operator/tree/master/example/rbac/prometheus

@hvalle

This comment has been minimized.

Copy link

hvalle commented Apr 12, 2017

Thanks @juliusv all this RBAC is new for me since 1.6. I will try to get it working with that ;)

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

Yeah, I generally recommend reading https://kubernetes.io/docs/admin/authorization/rbac/ in full.

There is also a way you can revert to a fully permissive mode again by doing this:

kubectl create clusterrolebinding permissive-binding \
  --clusterrole=cluster-admin \
  --user=admin \
  --user=kubelet \
  --group=system:serviceaccounts

Then things should just start working again as in 1.5 - but be aware that they will be just as unsecured as in 1.5 :)

@JorritSalverda do all the things described above solve your problem as well?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Apr 12, 2017

RBAC is not required for scraping kubelet metrics. When scraping the kubelet from the secure port you need to use a client certificate to authenticate against it - no RBAC involved. RBAC is only important for scraping the apiserver, in which case the manifests @juliusv pointed to are helpful. There is also extensive documentation on what every single role is necessary for.

Regarding the kubelet though, you need to specify the client certificate and key in the TLS config of the kubernetes_sd_config. We have done this for CoreOS Tectonic, and I can ensure that it works once properly configured.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

@brancz Thank you for the clarification! So we should probably update the node discovery example in https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml to include a working TLS config with client cert?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Apr 12, 2017

As I pointed out in #2606, we should in addition to that also document the required RBAC roles here.

Unfortunately obtaining the client cert depends on how one brought up the Kubernetes cluster. In a self-hosted cluster, there is definitely already a secret that contains this data as the apiserver needs the certificate to talk to the kubelets as well. What that means is that we would have to provide documentation for a couple of cluster types, which is fine in general, but certainly time consuming.

Do you think we should just be documenting this in the prometheus/prometheus repo or put some Kubernetes related docs on prometheus.io @juliusv ?

I would also offer to use the Prometheus Operator documentation which is hosted on coreos.com, but it is somewhat specific to running Prometheus with the Prometheus Operator, so in reality slightly different from running it without the Prometheus Operator.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 12, 2017

Eek, ok!

Do you think we should just be documenting this in the prometheus/prometheus repo or put some Kubernetes related docs on prometheus.io?

If the optionality doesn't get too crazy, I would recommend extending the example config. If that gets too complex, a section somewhere on prometheus.io (in the k8s SD docs or linked from there) would become necessary...

@JorritSalverda

This comment has been minimized.

Copy link
Contributor Author

JorritSalverda commented Apr 12, 2017

@juliusv with help from @brancz in issue #2606 I already tried the RBAC route, but so far unsuccessful. The unencrypted port provided a workaround for now. But it seems I have to get RBAC to work one way or the other :)

The ClusterRole as specified in https://github.com/coreos/prometheus-operator/blob/6689f72b6950bb4a697f865e0ee83a161cf55f94/example/rbac/prometheus/prometheus-cluster-role.yaml doesn't work for me. Even though I switched to the apiVersion: rbac.authorization.k8s.io/v1beta1.

To further debug I need to figure out how to see the actual permissions associated with my mounted token. Running it through https://jwt.io/ shows me some basic stuff, but not the actual permissions. Probably because it's not stored in the token in the first place.

I'll pick it up with Google support, because it's with GKE in particular. I'll post my solution once I get it to work - without http workarounds :) - but will close the issue, because if that route should work the http endpoint should definitely not be used.

@JorritSalverda

This comment has been minimized.

Copy link
Contributor Author

JorritSalverda commented Apr 12, 2017

According to comment kubernetes/kubernetes#11816 (comment) the ClusterRole needs access to nodes/metrics.

Right now I have trouble to create that ClusterRole though because it attempts to grant more privileges than I have. Picking this up with Google support. More when this succeeds :)

@JorritSalverda

This comment has been minimized.

Copy link
Contributor Author

JorritSalverda commented Apr 18, 2017

See #2606 (comment) for how to scrape the kubelet metrics via the API node proxy.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.