Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus not discovering from kubernetes #2763

Closed
prasenforu opened this Issue May 24, 2017 · 11 comments

Comments

Projects
None yet
5 participants
@prasenforu
Copy link

prasenforu commented May 24, 2017

Running kubernetes 1.6.2 using kubeadm

Getting following error

time="2017-05-24T11:14:21Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:210: Failed to list *v1.Node: User \"system:serviceaccount:default:default\" cannot list nodes at the cluster scope. (get nodes)" component="kube_client_runtime" source="kubernetes.go:73"
time="2017-05-24T11:14:22Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:165: Failed to list *v1.Pod: User \"system:serviceaccount:default:default\" cannot list pods at the cluster scope. (get pods)" component="kube_client_runtime" source="kubernetes.go:73"
time="2017-05-24T11:14:22Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:163: Failed to list *v1.Endpoints: User \"system:serviceaccount:default:default\" cannot list endpoints at the cluster scope. (get endpoints)" component="kube_client_runtime" source="kubernetes.go:73"
time="2017-05-24T11:14:22Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:164: Failed to list *v1.Service: User \"system:serviceaccount:default:default\" cannot list services at the cluster scope. (get services)" component="kube_client_runtime" source="kubernetes.go:73"
time="2017-05-24T11:14:22Z" level=error msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:184: Failed to list *v1.Pod: User \"system:serviceaccount:default:default\" cannot list pods at the cluster scope. (get pods)" component="kube_client_runtime" source="kubernetes.go:73"

prometheus-rbac.yml file content

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

Configmap file content -

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: prometheus
  name: prometheus-configmap
data:
  prometheus.yml: |-
    global:
      scrape_interval:     5s
      evaluation_interval: 5s
      external_labels:
        monitor: 'prometheus.cloudapps.cloud-cafe.in'
        type: 'kubernetes'
    rule_files:
    - '/etc/prometheus-rules/alert.rules'
    scrape_configs:

    - job_name: 'kubernetes-cluster'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # Prometheuse query will be disable from URL if you uncomment below line
        #insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
      kubernetes_sd_configs:
      - role: endpoints

    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # Prometheuse query will be disable from URL if you uncomment below line
        #insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics

    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        regex: (.+)
        target_label: __metrics_path__
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)(?::\d+);(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

    - job_name: 'kubernetes-services'
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: []
        target_label: __address__
        replacement: blackbox
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
      - source_labels: [__meta_kubernetes_service_name,__meta_kubernetes_service_namespace]
        target_label: __param_target
        regex: ([^;]+);(.+)
        replacement: $1.$2.svc
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_service_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+):(?:\d+);(\d+)
        replacement: ${1}:${2}
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_pod_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
@brancz

This comment has been minimized.

Copy link
Member

brancz commented May 24, 2017

Were you able to successfully create the ClusterRole and ClusterRoleBinding from the rbac.yaml? This sounds like a problem with RBAC.

@prasenforu

This comment has been minimized.

Copy link
Author

prasenforu commented May 24, 2017

Expected..

I took it from https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml

Output as follows..

kubectl get ClusterRole prometheus -o yaml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  creationTimestamp: 2017-05-24T12:34:15Z
  name: prometheus
  resourceVersion: "77254"
  selfLink: /apis/rbac.authorization.k8s.io/v1beta1/clusterrolesprometheus
  uid: 4c1c3442-407d-11e7-b541-025837d8b363
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get

kubectl get ClusterRoleBinding prometheus -o yaml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2017-05-24T12:34:15Z
  name: prometheus
  resourceVersion: "77256"
  selfLink: /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindingsprometheus
  uid: 4c25a2e6-407d-11e7-b541-025837d8b363
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

@brancz

This comment has been minimized.

Copy link
Member

brancz commented May 24, 2017

So when you list kubectl get clusterroles, kubectl get clusterrolebindings and kubectl get serviceaccounts do you see the resources you tried to create? I'm trying to find out whether there was a problem creating them, because depending on your cluster you may not have permissions yourself to give those permissions. This particularly happens on GKE clusters.

@prasenforu

This comment has been minimized.

Copy link
Author

prasenforu commented May 24, 2017

I am a cluster admin, not sure what you trying to find out.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 24, 2017

You being cluster admin doesn't matter. What matters is which service account Prometheus is running with, as it needs permissions to access the k8s API. Your error logs from the initial post indiciate that it's running on the default service account rather than prometheus, which you created for that purpose.

@prasenforu

This comment has been minimized.

Copy link
Author

prasenforu commented May 24, 2017

I think SA I use prometheus and NS I use default

apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default

As I told earlier I took it from https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml

Nothing edited.

But in logs I can see "system:serviceaccount:default:default"

@brancz

This comment has been minimized.

Copy link
Member

brancz commented May 24, 2017

Yes you need to specify in your deployment which ServiceAccount to use.

@prasenforu

This comment has been minimized.

Copy link
Author

prasenforu commented May 24, 2017

Ya now I got it :)
missing that in my deployment yaml.

Thanks all 👍

@fabxc fabxc closed this May 24, 2017

@marccarre

This comment has been minimized.

Copy link

marccarre commented Jul 5, 2017

To be explicit, and in case someone else faces the same difficulties and stumbles on this GitHub issue:

[...]
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: prometheus        <--- this is the missing bit
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        [...]

simon-mo added a commit to simon-mo/clipper that referenced this issue May 4, 2018

doi-t added a commit to doi-t/ex-gcp-k8s-prometheus that referenced this issue Jul 14, 2018

add cluster role permissions
Otherwhise I get the following error.
> prometheus-69c87c489c-qk2d9 prometheus level=error ts=2018-07-14T02:13:11.467415065Z caller=main.go:218 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:289: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list pods at the cluster scope: Unknown user \"system:serviceaccount:monitoring:default\""

Ref. prometheus/prometheus#2763 (comment)
@alexellis

This comment has been minimized.

Copy link

alexellis commented Jan 27, 2019

Hey @brancz I wondered if you knew if we could add only a Role rather than ClusterRole to Prometheus for scraping from multiple replicas of a Deployment within a set namespace? Right now scraping with DNS SD doesn't give the expected result on Kubernetes, so it's lead us to look at the pod-level discovery.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Jan 28, 2019

@alexellis by default the Kubernetes SD performs list/watch for all namespaces, therefore a ClusterRole is required. However, you can specify the specific namespaces you want to monitor as well, then you can just use a Role for that specific namespace. See the namespaces field in the reference docs: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.