Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes SD fails with x509 name mismatch #1822

Closed
atombender opened this Issue Jul 19, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@atombender
Copy link
Contributor

atombender commented Jul 19, 2016

What did you do?

  • Started Prometheus on Kubernetes with kubernetes_sd_configs, according to official example.
  • Kubelet is on AWS, and not running with --hostname_override. Therefore, each node gets an unqualified host name ip-10-0-4-126 or whatever.
  • The self-signed cert that Kubelet generates as /var/run/kubernetes/kubelet.crt is therefore set up like so:
$ openssl x509 -in /var/run/kubernetes/kubelet.crt -text | grep "Subject:|Subject Alt|DNS:"
Subject: CN=ip-10-0-4-126@1468894729
            X509v3 Subject Alternative Name: 
                DNS:ip-10-0-4-126

What did you expect to see?

Prometheus should be able to connect to Kubelet.

What did you see instead? Under which circumstances?

Instead, gets x509: cannot validate certificate for 10.0.4.126 because it doesn't contain any IP SANs trying to connect to https://10.0.4.126:10250. Note use of IP instead of host name. This will never work.

Environment

  • Prometheus version: 1.0.
  • Prometheus configuration file: Used the example pretty much verbatim, except I had to add role: node to scrape the nodes.

Related to #1654 and #1013. However, setting server_name is not possible as a workaround, obviously.

FWIW, running --hostname-override=$fqdn doesn't work either. I also tried --hostname-override=$ip, which K8s didn't like at all, although it may have been some kind of state management bug that doesn't like it when you change the hostnames after a node has previously been registered. Still, it shouldn't be necessary to change the host name. Nor should it be necessary to generate a custom cert on each Kubelet node that has the IP as the name.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

This is a deployment issue really & there isn't any option I can think of if you don't like the idea of creating a certificate with a valid IP SAN other than to disable certificate verification - see https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L22-L28 , which should also include the case where node certs don't have a valid IP SAN.

@atombender

This comment has been minimized.

Copy link
Contributor Author

atombender commented Jul 19, 2016

Couldn't there be an option to use the host name rather than the IP? The API provides the node data with both the kubernetes.io/hostname annotation as well as the name.

There are two reasons why I think this is necessary:

  1. Kubelet's default behaviour, as far as I can tell, is to write a cert that contains the unqualified hostname if one is available — not the IP.
  2. Various scripts (kube-up and so on) set --hostname-override on platforms where this is sensible. This host name seems to be used to generate the cert.

For these reasons, the cert cannot be assumed to have the IP as its SAN. In fact, from what I can tell, the only logical default for Prometheus is to use the kubernetes.io/hostname annotation.

I probably have time to cook up a PR if that sounds reasonable to you.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

Using kubernetes.io/hostname does sound reasonable, I just wish it was part of the defined API rather than an annotation which makes guarantees a bit loose. This can be done via relabelling for now? Using a hostname also means the name must be resolvable from Prometheus which I guess is an OK thing to ask for. I'm not 100% sure about making it default though.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

BTW the Kubernetes SD was originally tracking what Heapster did to retrieve metrics from nodes, hence the current use of IP.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

Can you also verify if nodes have the same issuer as if they use self-signed certs then you're going to have to turn off certificate validation anyway...

@atombender

This comment has been minimized.

Copy link
Contributor Author

atombender commented Jul 19, 2016

Aw dang, Kubelet doesn't use the CA at all. It's self-signed, so you're right, cert validation makes no sense.

In that case, this really needs to be documented properly — unless you're running somewhere like GKE, where I believe each node is set up to sign a cert with the master's CA, then Kubelet's default behavior is to generate a self-signed cert, and insecure_skip_verify has to be enabled.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

Looks like we can probably rely on kubernetes.io/hostname annotation being present - see http://kubernetes.io/docs/user-guide/node-selection/#built-in-node-labels , but there is no guarantee that it will be resolvable (hostnames obviously don't have to be in DNS) & I don't think that is used for any comms from API server to kubelet (e.g. for logs, proxying, etc). If it's not good enough for the API server...

In that case, this really needs to be documented properly

We discussed this & didn't like the idea of disabling cert validation by default: felt dirty to have an insecure default. But it is going to be something that affects quite a number of users so a PR making this clearer would be very welcome I'm sure.

@atombender

This comment has been minimized.

Copy link
Contributor Author

atombender commented Jul 19, 2016

Indeed. Re DNS, is that a requirement? Prometheus can still talk to the IP, as long as it provides the right host name in the TLS handshake.

My challenge with Kubelet is that I want to be able to automate node creation (autoscaling and so forth). Doing automatic cert creation securely that way is a challenge I haven't yet solved. Unless I use a wildcard cert (*.ec2.internal is stupid, so each node would need a new local domain).

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 19, 2016

You're right that it is possible to override the expected server name in cert verification, but unfortunately there is no way to do that with Prometheus discovery AFAIK.

If I was doing automated node creation & wanted to have validateable certs I'd probably look at cfssl or a custom installation of letsencrypt's boulder.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jul 21, 2016

FWIW, we just scrape the --read-only-port (default 10255) to sidestep this.

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 21, 2016

That's another option if you don't mind having an insecure port open on the kubelet. Some people see that port as opportunity for info leakage, & there have been plans to get rid of it for quite a while, although it's never happened (& may never).

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.