Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many warnings logged about remote storage queue being full #2177

Closed
jml opened this Issue Nov 8, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@jml
Copy link
Contributor

jml commented Nov 8, 2016

What did you do?

Ran a Prometheus instance configured to use remote storage on a system with a broken DNS configuration.

I don't have the exact command-line on me. I can construct a likely equivalent if necessary.

What did you expect to see?

Error messages in the logs informing me that the remote host couldn't be discovered due to DNS lookup failures.

What did you see instead? Under which circumstances?

$ cat logs | cut -f 1 -d ' ' --complement | sort | uniq -c | sort -n | grep -v -e '^\s*1\b'
      5 level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:549"
    198 level=warning msg="error sending 100 samples to remote storage: Post http://hostname:80/api/prom/push: dial tcp: lookup hostname on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout" source="queue_manager.go:246"
    369 level=warning msg="error sending 100 samples to remote storage: context deadline exceeded" source="queue_manager.go:246"
6660847 level=warning msg="Remote storage queue full, discarding sample." source="queue_manager.go:169"

That is, there were 30000 times as many Remote storage queue full, discarding sample messages as there were dial tcp: lookup hostname on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout failures.

This lead to the problem being misreported to me ("it says the queue is full") and made it harder than necessary to diagnose the root cause (i.e. that DNS was broken).

Environment

Ubuntu 16.04 LTS images running on EC2. Prometheus itself was running as a container in a Kubernetes cluster.

  • System information:

Alas, system has been wiped out now.

  • Prometheus version:

Alas, I didn't capture exact output, but it was reported to me as 1.3.1.

  • Prometheus configuration file:

Can provide equivalent on request.

  • Logs:

All I have left is the summary above.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Feb 23, 2017

I'm just putting together a PR for this.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.