Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2.0: spurious errors in consul service discovery #3413

Closed
thesamet opened this Issue Nov 4, 2017 · 10 comments

Comments

Projects
None yet
7 participants
@thesamet
Copy link

thesamet commented Nov 4, 2017

What did you do?
Start prometheus

What did you expect to see?
No errors in the logs

What did you see instead? Under which circumstances?
From time to time prometheus would print a log message indicating that it failed fetching service information from consul. There is little impact, since it does seem to be able to connect to it at start time.

Environment
Prometheus 2.0.0-rc2 on Linux. Service discovery using consul.

  • System information:

      Linux 4.4.0-57-generic x86_64
    
  • Prometheus version:

  build user:       root@a6d2e4a7b8da
  build date:       20171025-18:42:54
  go version:       go1.9.1
  • Alertmanager version:
    Not installed.

  • Prometheus configuration file:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'


scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: consul
    consul_sd_configs:
    - server: consul:8500
      services:
      - myservice
    relabel_configs:
    - source_labels: ["__meta_consul_address", "__meta_consul_service_port"]
      separator: ":"
      target_label: __address__
      regex: "(.*):9(.*)"
      replacement: "$1:19$2"
    - source_labels: ["__meta_consul_service"]
      target_label: job
    - source_labels: ["__meta_consul_tags"]
      regex: ".*(prod|stage).*"
      target_label: env
  • Logs:
level=error ts=2017-11-04T19:20:21.399644298Z caller=consul.go:283 component="target manager" discovery=consul msg="Error refreshing service" service=trends err="Get http://consul:8500/v1/catalog/service/trends?index=9598981&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error ts=2017-11-04T19:20:21.399854827Z caller=consul.go:186 component="target manager" discovery=consul msg="Error refreshing service list" err="Get http://consul:8500/v1/catalog/services?index=9598981&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 4, 2017

Duplicate of #3353.

@grobie grobie closed this Nov 4, 2017

@grobie grobie reopened this Nov 4, 2017

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 4, 2017

Soo, this should have been fixed with 2.0.0-rc.2 actually. Can you confirm that Consul was available for the full time?

@thesamet

This comment has been minimized.

Copy link
Author

thesamet commented Nov 4, 2017

Yes,consul has been running the entire time. I can't reproduce this problem by curling the URL it prints out. So I assume the issue isn't on the consul's side.

How often does it try to connect? Seems like more than once per minute. Is there a way to control this?

@zemek

This comment has been minimized.

Copy link
Contributor

zemek commented Nov 7, 2017

This was fixed in rc3, not rc2 afaik

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 7, 2017

Thanks @zemek! That's correct. Please report if not fixed in rc-3 (or the 2.0 release which will come very soon).

@grobie grobie closed this Nov 7, 2017

@tangyong

This comment has been minimized.

Copy link

tangyong commented Feb 3, 2018

The issue happened again, the following error happened while using prometheus to obtain services registered in consul, and I have confirmed
that consul has been running the entire time and I can obtain services by curling url.

level=error ts=2018-02-03T10:45:31.677093539Z caller=consul.go:283 component="target manager" discovery=consul msg="Error refreshing service" service=promether-exporter err="Get http://10.27.136.227:9996/v1/catalog/service/promether-exporter?index=116543&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

The error caused the grafana dashboard did not display any data.

My prometheus version is:

Version 2.0.0
Revision 0a74f98
Branch HEAD
BuildUser root@615b82cb36b6
BuildDate 20171108-07:11:59

@tangyong

This comment has been minimized.

Copy link

tangyong commented Feb 4, 2018

@grobie could you please see the problem again? thanks!

@nhuray

This comment has been minimized.

Copy link

nhuray commented Apr 1, 2018

@grobie I met the same issue with Prometheus version 2.2.0:

level=error ts=2018-04-01T13:18:20.417694994Z caller=consul.go:327 component="discovery manager scrape" discovery=consul msg="Error refreshing service" service=log-agent err="Get http://consul:8500/v1/catalog/service/log-agent?index=9845271&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

Consul is running and when I run a curl it works !

@vishksaj

This comment has been minimized.

Copy link

vishksaj commented Sep 24, 2018

facing same issue in 2.2.1
level=error ts=2018-09-24T15:16:31.126666572Z caller=consul.go:326 component="discovery manager scrape" discovery=consul msg="Error refreshing service" err="Get http://consul:8500/v1/catalog/service/######?index=55173847&wait=30000ms: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

@hbagdi

This comment has been minimized.

Copy link

hbagdi commented Sep 26, 2018

Facing the same issue as posted by @vishksaj with 2.2.1 with no apparent pattern

@lock lock bot locked and limited conversation to collaborators Mar 25, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.