Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s apiservers target detection broken after upgrade to v1.5.0 #2375

Closed
dannyk81 opened this Issue Jan 27, 2017 · 12 comments

Comments

Projects
None yet
3 participants
@dannyk81
Copy link

dannyk81 commented Jan 27, 2017

What did you do?
Upgraded Prometheus from v1.4.1 to v1.5.0 (Configuration file unchanged)

What did you expect to see?

All targets discovered prior to the upgrade to be available.

What did you see instead? Under which circumstances?

kubernetes-apiservers targets are missing, however all other targets were discovered (kubernetes-[nodes|pods|service-endpoints])

Once reverted back to v1.4.1 the missing targets reappeared.

Environment
Prometheus v1.4.1 upgraded to v1.5.0
Running as a service in K8s cluster version 1.5.2

  • System information: Official docker image prom/prometheus

  • Prometheus version: v1.5.0

  • Prometheus configuration file:

global:
  external_labels:
    datacenter: <dc-name>
rule_files:
- "/etc/prometheus-rules/*.rules"
scrape_configs:
- job_name: 'kubernetes-apiservers'
  kubernetes_sd_configs:
  - role: endpoints
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: instance
    replacement: ${1}
- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+)(?::\d+);(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
- job_name: 'kubernetes-services'
  metrics_path: /probe
  params:
    module: [http_2xx]
  kubernetes_sd_configs:
  - role: service
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    action: keep
    regex: true
  - source_labels: [__address__]
    target_label: __param_target
  - target_label: __address__
    replacement: blackbox
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    target_label: kubernetes_name
- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: (.+):(?:\d+);(\d+)
    replacement: ${1}:${2}
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_pod_node_name
  - source_labels: [kubernetes_pod_node_name]
    regex: (.+)
    target_label: instance
    replacement: ${1}
- job_name: 'kubernetes-master-nodes'
  static_configs:
  - targets: ['<master-1>:9100', '<master-2>:9100', '<master-3>:9100']
    labels:
      node_type: master
      metrics_source: node_exporter
  relabel_configs:
  - source_labels: [__address__]
    regex: (.+):(.+)
    target_label: instance
    replacement: ${1}
  • Logs:
2017-01-27 19:01:07	prometheus-3701298219-knglc	prometheus	time="2017-01-27T19:01:07Z" level=info msg="See you next time!" source="main.go:237"
2017-01-27 19:01:07	prometheus-3701298219-knglc	prometheus	time="2017-01-27T19:01:07Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:230"
2017-01-27 19:00:17	prometheus-3701298219-knglc	prometheus	time="2017-01-27T19:00:17Z" level=info msg="Done checkpointing in-memory metrics and chunks in 2.059339237s." source="persistence.go:638"
2017-01-27 19:00:15	prometheus-3701298219-knglc	prometheus	time="2017-01-27T19:00:15Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:611"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:94"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Starting target manager..." source="targetmanager.go:61"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="Listening on :9090" source="web.go:259"
2017-01-27 18:50:14	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:14Z" level=info msg="50500 series loaded." source="storage.go:378"
2017-01-27 18:50:13	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:13Z" level=info msg="Loading series map and head chunks..." source="storage.go:373"
2017-01-27 18:50:12	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:12Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
2017-01-27 18:50:12	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:12Z" level=info msg="Build context (go=go1.7.4, user=root@a04ed5b536e3, date=20170123-13:56:24)" source="main.go:76"
2017-01-27 18:50:12	prometheus-3701298219-knglc	prometheus	time="2017-01-27T18:50:12Z" level=info msg="Starting prometheus (version=1.5.0, branch=master, revision=d840f2c400629a846b210cf58d65b9fbae0f1d5c)" source="main.go:75"
@brancz

This comment has been minimized.

Copy link
Member

brancz commented Jan 28, 2017

@alexsomesan do you think this could have anything to do with your changes? At first sight I don't see how, but maybe we can look into it. It's odd that all other targets are discovered though.

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 28, 2017

That could be a possibility. I will try to reproduce it.

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 31, 2017

@dannyk81 I tried reproducing this on a 1.5.2 cluster using the exact Prometheus configuration that you provided here, and I got the api servers discovered in both 1.4.1 and 1.5.0 without issues.

Here's a screenshot of my discovered API server with Prometheus 1.5.0 (I only have one master in this test cluster).

screen shot 2017-01-31 at 13 59 14

Is this what you would expect to see ?

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 31, 2017

That was premature. After leaving the cluster to run for a short while, the api servers are no longer discovered.

screen shot 2017-01-31 at 14 09 22

@dannyk81

This comment has been minimized.

Copy link
Author

dannyk81 commented Jan 31, 2017

I don't quite remember how long it took after the upgrade before I checked the Targets page, but indeed it could have been a few minutes.

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Jan 31, 2017

Yes, it seems to be intermittent. After a while my api-servers got discovered again.

I'll keep looking into it.

@dannyk81

This comment has been minimized.

Copy link
Author

dannyk81 commented Jan 31, 2017

Interesting,,. I guess I didn't wait long enough for them to reappear.

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Feb 1, 2017

It looks like this could be the same issue discovered here: #2377

@alexsomesan

This comment has been minimized.

Copy link
Contributor

alexsomesan commented Feb 7, 2017

@dannyk81 1.5.1 was released containing a patch that might just be the fix for this issue. Can you try it out?

@dannyk81

This comment has been minimized.

Copy link
Author

dannyk81 commented Feb 7, 2017

Of course, will test and update! 👍

@dannyk81

This comment has been minimized.

Copy link
Author

dannyk81 commented Feb 7, 2017

@alexsomesan Confirm, v1.5.1 resolves this issue.

Thanks!

@brancz brancz closed this Feb 7, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.