Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul log spam in 2.0rc2 #3353

Closed
zemek opened this Issue Oct 25, 2017 · 8 comments

Comments

Projects
None yet
5 participants
@zemek
Copy link
Contributor

zemek commented Oct 25, 2017

The timeout added in https://github.com/prometheus/prometheus/pull/3303/files is the same value as the watch timeout: https://github.com/cstyan/prometheus/blob/ceb01dcc427864d79d6a8333a024246617d4558d/discovery/consul/consul.go#L37

That means if there are no changes to the consul environment, the request will take as long as the watch timeout to return. Since the client timeout and the watch timeout are the same, you get a bunch of
(Client.Timeout exceeded while awaiting headers)
errors

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 26, 2017

@cstyan any suggestions? Can we just catch and mute that known error or does picking a longer timeout help?

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Oct 26, 2017

@fabxc I don't entirely understand why having the same value for both timeouts is causing an issue, but I don't see any problem with increasing the consul clients timeout. Let me know if there's any investigation you want me to do here.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 26, 2017

I suppose the request timeout is happening shortly before the Consul watch timeout, which makes Consul think there was a legtimiate error.

I suppose making the timeout notably longer could help. But not sure.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Oct 26, 2017

Hmm okay, that makes sense.

@zemek if you can provide an example config or setup where you saw this happening, I can play with the timeouts to get a better sense of what's happening and decide if the consul timeout should change.

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Oct 26, 2017

@cstyan here is a minimal config:
prometheus config.yml

scrape_configs:
  - job_name: 'self'
    consul_sd_configs:
      - server: '127.0.0.1:8500'
        services: ['prometheus']

consul service config

{
  "service": {
    "name": "prometheus",
    "port": 9090
  }
}

you should pretty much get the errors immediately every 30s:

level=info ts=2017-10-26T21:10:33.737933501Z caller=main.go:371 msg="Server is ready to receive requests."
level=error ts=2017-10-26T21:11:03.741840614Z caller=consul.go:186 component="target manager" discovery=consul msg="Error refreshing service list" err="Get http://127.0.0.1:8500/v1/catalog/services?index=472699&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error ts=2017-10-26T21:11:03.743824816Z caller=consul.go:283 component="target manager" discovery=consul msg="Error refreshing service" service=prometheus err="Get http://127.0.0.1:8500/v1/catalog/service/prometheus?index=472699&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
level=error ts=2017-10-26T21:11:48.74239267Z caller=consul.go:186 component="target manager" discovery=consul msg="Error refreshing service list" err="Get http://127.0.0.1:8500/v1/catalog/services?index=472699&wait=30000ms: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
@krushmike

This comment has been minimized.

Copy link

krushmike commented Oct 27, 2017

Thought I would chime in as well. Seeing precisely the same issue...

Rolling back to prometheus-1.8.1.windows-amd64, no issues.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 4, 2017

Fixed in #3368.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.