Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes client New() function has incorrect logic #2334

Closed
verdverm opened this Issue Jan 10, 2017 · 20 comments

Comments

Projects
None yet
4 participants
@verdverm
Copy link

verdverm commented Jan 10, 2017

What did you do?

Tried to use basic-auth with prometheus

What did you expect to see?

Prometheus up and running

What did you see instead? Under which circumstances?

An error from the k8s client code about specifying both bearer and basic auths

I believe the issue is a logical error in the k8s client config creation:

func New(l log.Logger, conf *config.KubernetesSDConfig) (*Kubernetes, error) {

Bearer token can be set in the else clause (without any configuration in the yaml), and then basic-auth will be set, but the bearer never unset. This Config struct with two auth methods filled in is then passed to k8s client code, which returns an error.

Environment

  • System information:

    quay.io docker image running in k8s 1.5.1

  • Prometheus version:

    Starting prometheus (version=1.4.1, branch=master, revision=2a89e8733f240d3cd57a6520b52c36ac4744ce12)

  • Alertmanager version:

    insert output of alertmanager -version here (if relevant to the issue)

  • Prometheus configuration file:

...
    - job_name: 'kubernetes-apiservers'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      basic_auth:
        username: ...
        password: ...
      kubernetes_sd_configs:
      - role: endpoints
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        basic_auth:
          username: ...
          password: ...
...
  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:

lost these, but... was the error coming from the k8s client library when both basic and bearer auth are specified

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 10, 2017

I think something like kcfg.BearerToken = "" on line 112 would fix this issue, though I don't have an easy way to test

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 16, 2017

Thanks for reporting this @verdverm.

I was able to reproduce your problem.
There error message I'm seeing in the logs is

 Cannot create Kubernetes discovery: username/password or bearer token may be set, but not both  source=discovery.go:84

Does this resemble your results?

Working on a fix now.

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 16, 2017

Yes, that looks the same

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 17, 2017

@verdverm
Could you please provide a little bit more context as to why are you trying to use basic HTTP authentication while running Prometheus inside a Kubernetes cluster?

The canonical approach to authenticating API clients running inside a cluster is to use a so-called service account and collect it's credentials via the Secret volume mounted into the pod. This is entirely managed by Kubernetes and described here: https://kubernetes.io/docs/admin/service-accounts-admin/
According to that documentation the current implementation of credentials configuration in Prometheus is slightly wrong. Patching it to remove the token when basic auth is configured, like you suggested, would push if further down that path since the token we'd be removing is the one configured through the correct Kubernetes service account mechanism.

I'm almost done with a PR to implement the correct service-account behaviour and I would like to know more about your use case so I can test against it.

Thanks a lot for helping out on this one.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jan 17, 2017

@verdverm what requires you to set the ca_file and basic auth explicitly here? When not specifing an API server, the SD sets itself up with the in cluster configuration using service accounts. In the general case, you shouldn't have to specify anything but role.
Is your setup customized to that degree?

Didn't refresh – Alex actually asked the same already.

@fabxc fabxc added this to the v1.5 milestone Jan 17, 2017

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

I was messing around with config scenarios trying to get authz working in Prometheus.

I've converted to the k8s supplied files and have the following error I've been trying to work out.
(ignore node-8, hardware failure)

screen shot 2017-01-17 at 10 47 18 am


deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    name: prometheus-deployment
  name: prometheus
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - image: "{{ docker-registry }}/krobots/prometheus:v1.4.1"
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "-config.file=/etc/prometheus/prometheus.yml"
        - "-storage.local.path=/prometheus"
        - "-storage.local.retention=24h"
        - "-alertmanager.url=http://alertmanager:9093/api/v1"
        ports:
        - containerPort: 9090
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 2500Mi
        volumeMounts:
        - name: data-volume
          mountPath: "/prometheus"
        - name: config-volume
          mountPath: "/etc/prometheus"
        - name: rules-volume
          mountPath: "/etc/prometheus-rules"
      volumes:
      - name: data-volume
        hostPath:
          path: /data/prometheus
      - name: config-volume
        configMap:
          name: prometheus-config
      - name: rules-volume
        configMap:
          name: prometheus-rules

configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |

    global:
      scrape_interval: 30s
      scrape_timeout: 30s

    rule_files:
      - "/etc/prometheus-rules/*.rules"

    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-service-endpoints'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
    - job_name: 'kubernetes-pods'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: pod
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

@alexsomesan Do I need to create a service account specifically for Prometheus / kube-state-metrics / alertmanager ?

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 17, 2017

You can start off with the default one, which is always there on every cluster that is version 1.4.x or newer.
You would create a specific one if, say you want to set more restrictive permissions for Prometheus when accessing the API.

From the looks of your targets status I'd say some of you API instances are configured inconsistently. I suspect that is the reason why you only see one of them 'UP'. Only one of them accepts your credentials. Check logs of the kube-apiserver process on the failed nodes for authentication errors. and start from there.

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

I've removed the tls/token fields from the kubernetes_sd_config, same results which is good and bad.

The reason for the x509 errors I believe is the CA's are self-signed, is there a way around that?

The systemd configuration, auth.jsonl, and token.csv are all the same on the three k8s-master nodes. Still getting the auth issues, the one up node is the elected leader. Also of interest, the TLS certs for the k8s masters are being accepted. These are from the same self-hosted CA. All machines have a unique TLS certificate.

Thoughts?

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

@alexsomesan x509 woes continue, adding insecure_skip_verify to the tls_config causes Prometheus to "Failed to list" everything...

changing anything in the auth sections seems to cause prometheus to be unable to talk to the API server. Yet it is not auth'n correctly

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

I have to walk back another statement...

removing the tls_config and bearer_token_file from the kubernetes_sd_config causes Prometheus to not be able to list anything. Putting the config back in resolves the unlistable issue

@alexsomesan It seems the kubernetes_sd_config tls and bearer fields are required after all

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

ok, walk this back once again, both versions of the config map work.

Depending on the sequence of create/apply/delete ConfigMap, Deployment, and the Prometheus pod determines success or not

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

Is there anyway to output the actual config map prometheus is seeing?

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

It seems we may also be suffering from stale data in k8s.

After deleting and re-creating the ConfigMap, we kill the Prom pod to pick up the new config.

It seems as though the new pod picks up the old config, after deleting a second time, it goes back to working with a known "working" config (despite our other authz issues).

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 17, 2017

It seems that the secrets/default-token-XXXXX in the namespace becomes stale or bad as well?

@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 18, 2017

oh, and the two apiservers which are reported down are having a TLS handshake error come out of the logs.

kube-apiserver[1176]: E0117 05:45:29.007801    1176 handlers.go:58] Unable to authenticate the request due to an error: crypto/rsa: verification error
@verdverm

This comment has been minimized.

Copy link
Author

verdverm commented Jan 18, 2017

@alexsomesan , I got everything working!!

post-mortem:

  1. it seems the fundamental issue at play was the service account tokens
  2. Verification was flowed because we used a different SA token pem on each master
  3. Kubernetes has issues with syncing the tokens, thus we had many working configs which appeared at first to be broken.

solution:

  1. Fix the divergent pem files for token verification
  2. To get prometheus ( and any other container ) to pick up fresh tokens...
  3. make changes to config (optional)
  4. delete the default service account token
  5. delete the kube-system service account token ( only if making changes to k8s config)
  6. wait one minute
  7. delete the Prom (or other) pod one or more times (average is 2 for me)
  8. See Prom go, go Prom go

Most of the other auth setups worked, our final working configMap:

notes:

  • insecure_skip_verify works in the scrape config, not in the kubernetes_sd_config
    (our original non-working issue was the SA token)
  • we can probably clean this up now that we know the SA tokens were an issue
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |

    global:
      scrape_interval: 30s
      scrape_timeout: 30s

    rule_files:
      - "/etc/prometheus-rules/*.rules"

    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      scheme: https
      tls_config:
        ca_file: /var/lib/kubernetes/ca.pem
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints
        tls_config:
          ca_file: /var/lib/kubernetes/ca.pem
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/lib/kubernetes/ca.pem
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
        tls_config:
          ca_file: /var/lib/kubernetes/ca.pem
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-service-endpoints'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
    - job_name: 'kubernetes-pods'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: pod
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 14, 2017

@fabxc @verdverm Can this be closed?

@fabxc fabxc closed this Jul 3, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.