Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operation was canceled during DNS lookup with remote_write #3510

Closed
errordeveloper opened this Issue Nov 24, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@errordeveloper
Copy link

errordeveloper commented Nov 24, 2017

What did you do?

With Prometheus v2.0.0 on Kubernetes I've set global: { scrape_inteval: 5s } and enabled remote_write and remote_read, with scrape_confgs previosly used with v1.7.1.

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

  • System information: GKE

  • Prometheus version: prometheus/prom:v2.0.0

  • Prometheus configuration file:

        global:
          scrape_interval: 5s
        remote_write:
          - url: 'https://cloud.weave.works/api/prom/push'
            basic_auth:
              password: <secret>
        remote_read:
          - url: 'https://cloud.weave.works/api/prom/read'
            basic_auth:
              password: <secret>
        scrape_configs: <ommited>
  • Logs:

I see the following line repeated very frequently.

evel=warn ts=2017-11-24T12:41:28.49545111Z caller=queue_manager.go:485 component=remote msg="Error sending samples to remote storage" count=100 err="Post https://cloud.weave.works/api/prom/push: dial tcp: lookup cloud.weave.works on 10.59.240.10:53: dial udp 10.59.240.10:53: operation was canceled"
@bboreham

This comment has been minimized.

Copy link
Contributor

bboreham commented Nov 24, 2017

I see that Prometheus 2.0 has set MaxIdleConns; however Go has rather unexpected behaviour when you open several sockets to the same address, as may well happen with scraping: it runs around closing sockets and I believe this behaviour results in the "operation was canceled" message too.

So I would recommend to add a line setting MaxIdleConnsPerHost to something higher, maybe 100?

@errordeveloper

This comment has been minimized.

Copy link
Author

errordeveloper commented Nov 24, 2017

@bboreham as per our discussion offline, I've made this change and tested it, can confirm it solves the issue (see errordeveloper@1f74477). I'm happy to open a PR and we can take it from there.

@bboreham

This comment has been minimized.

Copy link
Contributor

bboreham commented Apr 3, 2018

FWIW I now believe the underlying issue is golang/go#22724, fixed in Go 1.10.1, but it was valid to make the problem go away by raising the number of open connections.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 3, 2018

We'll pick that up in the next release then, as that'll be with that version of Go.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.