Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error polling from Kubernetes API server #1499

Closed
guoshimin opened this Issue Mar 23, 2016 · 16 comments

Comments

Projects
None yet
7 participants
@guoshimin
Copy link

guoshimin commented Mar 23, 2016

Getting this error polling the api server:

Get https://kubernetes.default.svc:443/metrics: x509: certificate signed by unknown authority

Here's the snippet from my config:

  - job_name: kubernetes
    scrape_interval: 30s
    scrape_timeout: 10s
    kubernetes_sd_configs:
      - api_servers:
        - https://kubernetes.default.svc
        in_cluster: true
        tls_config:
          insecure_skip_verify: true

version: 0.17.0

Certificate served by api-server (removed some clutter):

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            59:8f:55:59:f9:b3:bc:3d:96:90:e1:9e:cd:d1:b3:59
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Databricks, CN=Sub CA
        Validity
            Not Before: Jan 20 01:01:15 2016 GMT
            Not After : Jan 19 01:01:15 2017 GMT
        Subject: C=US, O=Databricks, CN=master.central-a.kube.dev.databricks.com
        X509v3 extensions:

            X509v3 Authority Key Identifier: 
                keyid:A2:F2:F1:0C:48:76:3C:88:98:0F:85:A1:38:42:21:83:B5:49:6E:15

            X509v3 Basic Constraints: critical
                CA:FALSE

            X509v3 Extended Key Usage: 
                TLS Web Client Authentication, TLS Web Server Authentication
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Subject Key Identifier: 
                1F:A9:8A:75:57:78:C9:D0:57:DD:07:F3:5D:99:49:0E:5D:B0:7F:C6
            X509v3 Subject Alternative Name: 
                DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, 

ca.crt in serviceaccount (again, removed some clutter):

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            88:56:59:48:78:d9:2e:0f:ee:4a:b9:ac:8a:86:a7:b6
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Databricks, CN=Root CA
        Validity
            Not Before: Jan 13 23:21:52 2016 GMT
            Not After : Jan 12 23:21:52 2017 GMT
        Subject: C=US, O=Databricks, CN=Sub CA

        X509v3 extensions:
            X509v3 Authority Key Identifier: 
                keyid:39:FA:60:37:05:A0:F8:34:0B:4A:2B:4E:03:8F:B8:C0:B9:26:55:54

            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:0
            X509v3 Extended Key Usage: 
                TLS Web Client Authentication, TLS Web Server Authentication
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign
            X509v3 Subject Key Identifier: 
                A2:F2:F1:0C:48:76:3C:88:98:0F:85:A1:38:42:21:83:B5:49:6E:15

Note that X509v3 Authority Key Identifier in the server cert matches the X509v3 Subject Key Identifier in the CA cert.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 28, 2016

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Mar 28, 2016

Hmm that is interesting. I've seen this in a couple of totally unrelated libraries in other langs (Java & Ruby) - wondering if it's a bug in Kubernetes mounted secrets rather than in Prometheus.

In the Prometheus pod can you check that the ca file in /var/run/secrets/kubernetes.io/serviceaccount/ca.crt matches the CA file used by the API server?

@brian-brazil brian-brazil added the bug label Mar 30, 2016

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented Apr 1, 2016

Yes the ca files are the same:

The one used by the pod:

$ kubectl exec prometheus-ybpwf cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt | md5
5a7eb6afa9c19ccc4eaac675c00cfc03

The one used by the API server:

$ md5sum /etc/kubernetes/ssl/ca.pem
5a7eb6afa9c19ccc4eaac675c00cfc03  /etc/kubernetes/ssl/ca.pem

@fabxc fabxc added this to the v1.0.0 milestone Apr 25, 2016

@fabxc fabxc added kind/bug and removed bug labels Apr 28, 2016

@bharrisau

This comment has been minimized.

Copy link

bharrisau commented May 3, 2016

Just a note - the tls_config is attached in the wrong spot. It is part of the job, not the kubernetes_sd_configs.

And I'm getting the same error, even though all my endpoints are appearing. It looks like the endpoints are being scraped correctly, but the https://kubernetes.default.svc:443/metrics appears again and tries to get scraped without using authentication. I am missing the api_server metrics which would come from that endpoint.

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented May 3, 2016

The example config at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml works:

  - job_name: 'kubernetes-cluster'
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 23, 2016

@guoshimin so was this is a misconfiguration and is resolved?

@guoshimin

This comment has been minimized.

Copy link
Author

guoshimin commented Jun 23, 2016

According to the documentation, if I set in_cluster to be true, it should automatically use the CA cert and token mounted in the secret volume. Why do I still need to set the CA cert explicitly?

That said, it's working for us now, so not a big deal for us.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 23, 2016

@jimmidyson would be the expert here

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jun 23, 2016

"Expert". It should work as you mentioned with in_cluster = true, I'll try to look at it why it's not soon unless someone else gets there first (here's hoping :-b))

@mindfulmonk

This comment has been minimized.

Copy link

mindfulmonk commented Jul 1, 2016

I"m hitting the same bug. I looks like a prometheus problem when using in_cluster=false

The following curl works:

curl -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer token-here" https://kubernetes.default.svc/api/v1/pods

The prometheus looks like this:

tls_config:
  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token: token-here

and gives the following error:

time="2016-07-01T08:36:33Z" level=error msg="Cannot initialize pods collection: unable to list Kubernetes pods: unable to query any API servers: Get https://kubernetes.default.svc/api/v1/pods: x509: certificate signed by unknown authority" source="discovery.go:153"

I tried with:

tls_config:
  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token: token-here
  insecure_skip_verify: true

but that gives the same error.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 1, 2016

There are two levels of TLS config – once for scraping and once for talking to the API server.
Can you provide the full configuration so we can see how you are using each?

@mindfulmonk

This comment has been minimized.

Copy link

mindfulmonk commented Jul 1, 2016

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'kubernetes-cluster'

    tls_config:
      insecure_skip_verify: true
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token: token-here

    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc'
      in_cluster: false

    relabel_configs:
    - source_labels: [__meta_kubernetes_role]
      action: keep
      regex: (?:apiserver|node)
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - source_labels: [__meta_kubernetes_role]
      action: replace
      target_label: kubernetes_role

  - job_name: 'kubernetes-service-endpoints'

    tls_config:
      insecure_skip_verify: true
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token: token-here

    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc'
      in_cluster: false

    relabel_configs:
    - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: endpoint;true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+)(?::\d+);(\d+)
      replacement: $1:$2
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_role]
      action: replace
      target_label: kubernetes_role
    - source_labels: [__meta_kubernetes_service_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name

  - job_name: 'kubernetes-services'

    tls_config:
      insecure_skip_verify: true
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token: token-here

    metrics_path: /probe
    params:
      module: [http_2xx]

    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc'
      in_cluster: false

    relabel_configs:
    - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_probe]
      action: keep
      regex: service;true
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: blackbox
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_role]
      target_label: kubernetes_role
    - source_labels: [__meta_kubernetes_service_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      target_label: kubernetes_name

  - job_name: 'kubernetes-pods'

    tls_config:
      insecure_skip_verify: true
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token: token-here

    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc'
      in_cluster: false

    relabel_configs:
    - source_labels: [__meta_kubernetes_role, __meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: pod;true
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: ${1}:${2}
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_pod_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name
@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Jul 1, 2016

You've set in_cluster to false & using https as the scheme for the API server. Prometheus won't be able to verify the API server certificate unless the CA certificate is in your server system store, which I guess it won't be. You can either copy over the CA certificate from the Kubernetes API server to your Prometheus server & update discovery config or disable certificate verification, both of which should be done in the discovery tls_config, similar to https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L16-L28 but in the discovery config (https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L31-L34)

@mindfulmonk

This comment has been minimized.

Copy link

mindfulmonk commented Jul 1, 2016

Thanks @jimmidyson I finally understand the difference between scraping and service discovery. The following config works fine:

scrape_configs:
  - job_name: 'kubernetes-cluster'

    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token: <hidden>

    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc'
      in_cluster: false

      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token: <hidden>

The kubernetes_sd_configs section is for discovery of targets, it this case it queries the kubernetes api and returns a list of targets like https://kubernetes.default.svc:443/metrics then prometheus scrapes that target.

In this case the target and discovery endpoints are the same, but if you had the kubernetes_sd_configs discover other application target nodes, those might have different certs and bearer_tokens.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 1, 2016

Okay, I'm closing this issue for now as it seems most reports boil down to configuration.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.