Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more metrics when remote storage queue full #2486

Closed
goettl79 opened this Issue Mar 9, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@goettl79
Copy link

goettl79 commented Mar 9, 2017

What did you do?

When the remote_write service runs into problems prometheus does not deliver any more query results.

What did you expect to see?

I'd have expected prometheus to, even if the remote_write fails, behave as usual. So I'd have expected to see a normal prometheus grafana dashboard in fact.

What did you see instead? Under which circumstances?

No querying of data from prometheus was possible when the Remote storage queue was full. Scrape duration went foo.

prom_foo_scrape

Environment

Official docker container for prometheus v1.5.2 on docker coreos

  • System information:

    Linux 4.7.3-coreos-r3 x86_64

  • Prometheus version:

prometheus, version 1.5.2 (branch: master, revision: bd1182d29f462c39544f94cc822830e1c64cf55b)
  build user:       root@1a01c5f68840
  build date:       20170210-16:23:28
  go version:       go1.7.5
  • Config
remote_write:
  url: "http://XXXXXXX/prometheus"
  • Logs:
Mar 09 08:38:00 grzosprom01 docker[32481]: time="2017-03-09T08:32:06+01:00" level=warning msg="Remote storage queue full, discarding sample." source="queue_manager.go:169"
Mar 09 08:38:00 grzosprom01 docker[32481]: time="2017-03-09T08:32:06+01:00" level=warning msg="Remote storage queue full, discarding sample." source="queue_manager.go:169"
Mar 09 08:38:00 grzosprom01 docker[32481]: time="2017-03-09T08:32:06+01:00" level=warning msg="Remote storage queue full, discarding sample." source="queue_manager.go:169"
Mar 09 08:43:10 grzosprom01 docker[32481]: time="2017-03-09T08:43:10+01:00" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612"

@goettl79 goettl79 changed the title No more metrics collected wehen Remote storage queue full No more metrics when remote storage queue full Mar 9, 2017

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Mar 9, 2017

HI @goettl79 - a few questions:

  • Are those dashboards from the Prometheus doing the scraping (as opposed to the system you're sending the samples to)?
  • Can we have the full logs of the Prometheus process please?
  • What is your scrape interval? What does the rest of your config look like?

The remote storage client is very decoupled for Prometheus' own storage - if doesn't ever gibe feedback to the scrape loop - see https://github.com/prometheus/prometheus/blob/master/storage/remote/remote.go#L83. I'd be surprised if this is causing Prometheus to stop scraping. I suspect the system is getting overwhelmed by something else, and one of the symptoms is it can't flush samples to remote storage quickly enough.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 9, 2017

Indeed, I verified locally that just having a full storage queue does not cause ingestion to get throttled by itself. Is the machine generally overloaded (or even swapping) perhaps?

@goettl79

This comment has been minimized.

Copy link
Author

goettl79 commented Mar 13, 2017

Regrettably we deleted the prometheus Logs by now. the machine didn't have a high load. We generally do not use a swapfile. So either working or oom killer.

However the remote wasn't fully down, TCP sockets could be opened, as far as I remember. No data could be sent. Nginx reverse proxy...

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 27, 2017

If you have further debugging information, please reopen.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.