Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to scrape endpoints in kubernetes using prometheus and blackbox #2175

Closed
Priyanka098 opened this Issue Nov 8, 2016 · 35 comments

Comments

Projects
None yet
6 participants
@Priyanka098
Copy link

Priyanka098 commented Nov 8, 2016

What did you do?
Deployed prometheus in kubernetes and it scraping nodes and showing status on dashboard.
What did you expect to see?
Scraping services and endpoints of k8s
What did you see instead? Under which circumstances?
kubernetes-service-endpoints and kubernetes-services are down. I dont understand if its the problem of k8s configuration in my cluster or blackbox configuration
Environment
Prometheus container v1.3.0

  • System information:
    Linux 3.13.0-92-generic x86_64

  • Prometheus version:
    prometheus, version 1.1.3
    build user: root@3e392b8b8b44
    build date: 20160916-11:36:30
    go version: go1.6.3

  • Prometheus configuration file:

global:
  scrape_interval: 30s
  scrape_timeout: 30s
scrape_configs:
- job_name: 'prometheus'
  static_configs:
    - targets: ['localhost:9090']
- job_name: 'kubernetes-cluster'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: apiserver
- job_name: 'kubernetes-nodes'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-service-endpoints'
  scheme: http
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: endpoint
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_service_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
- job_name: 'kubernetes-services'
  metrics_path: /probe
  params:
    module: [http_2xx]
  tls_config:
    #ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: service
  relabel_configs:
  - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_probe]
    action: keep
    regex: service;true
  - source_labels: [__address__]
    target_label: __param_target
  - target_label: __address__
    replacement: blackbox
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    target_label: kubernetes_role
  - source_labels: [__meta_kubernetes_service_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    target_label: kubernetes_name
- job_name: 'kubernetes-pods'
  scheme: https
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+):(?:\d+);(\d+)
    replacement: ${1}:${2}
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_pod_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name

k8s-prometheus

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 8, 2016

Note that the kubernetes SD config had a breaking change in the v1.3.0 release. It seems like your config is the old format. Have a look at the sample config for guidance.

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 9, 2016

when i try sample config , it throws
time="2016-11-09T06:13:19Z" level=info msg="Starting prometheus (version=1.1.3, branch=master, revision=ac374aa6748e1382dbeb72a00abf47d982ee8fff)" source="main.go:73"
time="2016-11-09T06:13:19Z" level=info msg="Build context (go=go1.6.3, user=root@3e392b8b8b44, date=20160916-11:36:30)" source="main.go:74"
time="2016-11-09T06:13:19Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:221"
time="2016-11-09T06:13:19Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): Kubernetes SD configuration requires at least one Kubernetes API server" source="main.go:126"

then i add api_servers configs and target is created for nodes but not for apiserver
prometheus conf file:

scrape_configs:
- job_name: 'kubernetes-apiservers'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: apiserver
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

- job_name: 'kubernetes-nodes'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 9, 2016

Are you sure you are using the v1.3.0 container? The log output says differently

version=1.1.3, branch=master, revision=ac374aa6748e1382dbeb72a00abf47d982ee8fff
@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 9, 2016

My mistake, changed it to 1.3.0 in deployment file.
But still endpoint are not getting scrape
this is the error - "server returned HTTP status 404 Not Found" for endpoint -http://10.244.2.10:80/metrics
When testing this url in k8s
curl -v http://10.244.2.10:80//xyz/metric is working from k8s(response status - 200)
but not http://10.244.2.10:80/metrics(response status - 404)

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 9, 2016

If http://10.244.2.10:80//xyz/metric works and http://10.244.2.10:80/metric doesn't, then that sounds like you want your metrics path to be different. You'll probably want to look into relabelling those targets' metric endpoints. If your metrics path is consistently different you can just set the value you need into the __metrics__ meta label using relabelling. If your metrics path is variable you'll want to relabel based on two annotations of your Endpoints object from kubernetes. All annotations of the Endpoints object are converted into Prometheus meta labels. A common practice is to use one annotation to declare that this target should be kept and one that defines the metrics path. Note that you may need to duplicate that section of the config if you want to mix a default and custom metrics paths.

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 9, 2016

I want flexibility of creating different metrics for services.
So I add prometheus.io/path: "/xyz/metrics" in service file and

relable config in prometheus.yml file

   - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: keep
    target_label: __metrics_path__

But the endpoint does not change(http://10.244.3.4:80/metrics)

Do u have any example to refer to?
I tried something like this also
https://movio.co/blog/prometheus-service-discovery-kubernetes/

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 9, 2016

hey that worked by replacing action to "replace"

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 9, 2016

Yes, the "keep" action only makes Prometheus keep that particular target, whereas "replace" by default takes the content of the source_labels and puts it in target_label.

Does that solve your questions?

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 9, 2016

kubernetes-apiservers and kubernetes-pods jobs are not getting displayed on prometheus dashboard
I kept all the configuration as is from sample yaml file and added prometheus.io/scrape,etc. annotations in service as well as in deployment file.
Is it intended to add annotation in deployment file also?

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 9, 2016

Make sure that the annotations are set in the PodTemplate in the Deployment as those are the annotations that will show up in the Pod created by the ReplicaSet which in turn is created by the Deployment. Could you besides checking that provide your config in a gist or equivalent? (if you can show me the deployment manifest that would be even better)

@Krylon360

This comment has been minimized.

Copy link

Krylon360 commented Nov 9, 2016

it doesn't look like you have the label map action on the api-server nor the pod job.
btw; I probably wouldn't have the pods show on the targets page. (I did it a few versions back; and it was a never ending page)

This is what I have for my api-server job. (Note: we use Ansible to set the ${VALUES}; and we are running it out of cluster. (Note2: this is pre-1.3.0; we are on 1.2.2)

- job_name: kubernetes-apiserver scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /bearer_token kubernetes_sd_configs: - api_servers: - ${API_SERVER} in_cluster: false tls_config: insecure_skip_verify: true bearer_token_file: /bearer_token role: apiserver relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role

Also; the screenshot you posted in the OP; has your API server. You have your job named: - job_name: 'kubernetes-cluster' however; the 2nd config you posted; renamed the job_name tp - job_name: 'kubernetes-apiservers'. Did you hit the /-/reload endpoint to reload the new configuration?

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@brancz
This is my deployment file

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: apm-monitoring
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/probe: "true"
    prometheus.io/port: "80"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: xyz
    spec:
      containers:
      - name: xyz
        securityContext:
          privileged: true
        image: xyz:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          name: http
        - containerPort: 443
          name: https
@Krylon360

This comment has been minimized.

Copy link

Krylon360 commented Nov 10, 2016

Looks like you're setting the prometheus.io/port to port 80; however, you have the pod scraping job set to probe HTTPS.
Try commenting out this line on the pod scraping job.
scheme: https

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@Krylon360
api server was there in my older version but when i updated to 1.3.0 ,its not showing
and role: apiserver is not working with new version.

I will try updating to http bcoz default is https

@boj

This comment has been minimized.

Copy link

boj commented Nov 10, 2016

I'm seeing the same thing as @Priyanka098 - I've simply loaded the sample config into a ConfigMap and have a Deployment set to pull the latest Prometheus image.

@Krylon360

This comment has been minimized.

Copy link

Krylon360 commented Nov 10, 2016

I'll need to load a local instance of 1.3 and do some testing tomorrow. I know I've been getting the "Kubernetes SD configuration requires at least one Kubernetes API server" error occurring at random times on 1.2.2 as well.
What does your current Prometheus configmap look like after the changes you've made?

Also, what version of k8s? I ask due to breaking changes made when they cut over to kube-dns.
inCluster now uses "name.namespace.type.cluster.local."
Service Example: kubernetes.default.svc.cluster.local.
Pod Example:
xyz.apm-monitoring.pod.cluster.local.
It's a pain.
I'd also check the logs for the kube-dns pods. We found a bug with SkyDNS that was truncating in_cluster dns queries causing internal lookups to drop.
We patched it on our cluster; but found today we need to revert it now that we're on k8s v1.4.4 and it's causing issues. Ugh

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

Sorry there's been a few different issues in here that have been worked through & I've lost track of what issue you're hitting at the moment. Could you clarify please?

Last I see is the apiserver role not working? That was removed in 1.3.0 in favour of static sd or preferably kubernetes endpoints role. Just checked example config & looks like that endpoint role is missing from apiservers job. See https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L10 ff

@boj

This comment has been minimized.

Copy link

boj commented Nov 10, 2016

@jimmidyson I'm currently getting the "Kubernetes SD configuration requires at least one Kubernetes API server" error. My cluster is running k8s v1.4.5, prometheus:latest, and the sample config verbatim.

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@jimmidyson
I dont see any error in my deployment but apiserver job is not getting created.

looks like that endpoint role is missing from apiservers job

does this mean something like below in apiserver job?

  kubernetes_sd_configs:
  - role: endpoints
@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

@Priyanka098 Yes try that.

@boj That looks like you're using an old version - that error message doesn't appear in 1.3.1 AFAIK & can't see it in config validation. Please use the tagged version :v1.3.1 to ensure you get a stable version. Looks like your infra hasn't pulled down :latest tag correctly (possibly you have a IfNotPresent pull policy?).

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

I tried that and then api-server job appeared on dashboard with error
Get http://172.20.0.9:443/metrics: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
for endpoint - http://172.20.0.9:443/metrics

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

hey i resolved that....I was using http instead of https

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

So it's working for you? I'll send in a PR with the updated example config.

@Krylon360

This comment has been minimized.

Copy link

Krylon360 commented Nov 10, 2016

Awesome.
One thing I can see coming out of this, it to build in support to use
Kube.conf w/ kubectl which would have all the necessary information for the
cluster. Just a thought.

Bryce Walter

On Nov 10, 2016 2:37 AM, "Priyanka098" notifications@github.com wrote:

hey i resolved that....I was using http instead of https


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2175 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AATiztRf4_TuU7TBWZM5vZZdymypZhcPks5q8uXtgaJpZM4Ksa3k
.

@boj

This comment has been minimized.

Copy link

boj commented Nov 10, 2016

@jimmidyson Ah, thanks for the hint. Turns out I was using quay.io/coreos/prometheus and not quay.io/prometheus/prometheus

@boj

This comment has been minimized.

Copy link

boj commented Nov 10, 2016

I now get the following error:

2016-11-10T09:56:14.805294529Z time="2016-11-10T09:56:14Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): relabel configuration for replace action requires 'target_label' value" source="main.go:149"

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@jimmidyson

So it's working for you?
yes

I have added scheme: http in my job kubernetes-service-endpoints
some of my services are exposing port 80 and others with 443.
If I add prometheus.io/port: "443" in service.yml file in k8s, then the services are showing status down.
How to deal with this?

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

@Priyanka098 Add another annotation to the service: prometheus.io/scheme: https

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

All jobs are up except kubernetes-pods
Here is my deployment file

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: xyz
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/probe: "true"
    prometheus.io/port: "80"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: xyz
    spec:
      containers:
      - name: xyz
        securityContext:
          privileged: true
        image: xyz:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          name: http
        - containerPort: 443
          name: https
@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@boj
check your indentation in config file (last line of your file),
I had similar error when i paste sample config in .yml file

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

@Priyanka098 Those annotations should be in spec.template.metadata.annotations to be applied to the pod, not the deployment itself.

@Priyanka098

This comment has been minimized.

Copy link
Author

Priyanka098 commented Nov 10, 2016

@jimmidyson
thanks, i solved my problem by changing place of annotations.
All jobs are working now :-)

@jimmidyson

This comment has been minimized.

Copy link
Member

jimmidyson commented Nov 10, 2016

@Priyanka098 You're welcome. Can you close this now please?

@Vishal-Gaur

This comment has been minimized.

Copy link

Vishal-Gaur commented May 19, 2018

hiii,
i had setup prometheus outside kubernetes cluster but having error
<unable to use specified CA cert /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory>

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.