Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 1.2.0 suddenly stops scraping targets #2068
Comments
This comment has been minimized.
This comment has been minimized.
|
Anything in the log files? On Sun, Oct 9, 2016, 13:32 Sven Müller notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
Can you grab a goroutine dump when this happens? |
This comment has been minimized.
This comment has been minimized.
|
Hi, The prometheus instance just stopped scraping again all targets. Here are the resquested resources: full log: https://gist.github.com/svenmueller/f96bece4f7852e6d5e20be87d858b552 --Sven |
juliusv
added
kind/bug
kind/regression
labels
Oct 9, 2016
This comment has been minimized.
This comment has been minimized.
|
First of all, I see multiple problems in your log file:
Most relevant however, it looks like your storage is sometimes throttled: https://gist.github.com/svenmueller/f96bece4f7852e6d5e20be87d858b552#file-gistfile1-txt-L11 This means that targets will not get their samples stored anymore (at least intermittently), as the storage applies backpressure. If it's a new target, that will also cause it to stay in Looking at this part of the goroutine dump, the fact that it just says https://gist.github.com/svenmueller/223e903ac7703354364d9cef824f6541#file-gistfile1-txt-L293-L297 You can also see that scrapers for your targets are in principle running: https://gist.github.com/svenmueller/223e903ac7703354364d9cef824f6541#file-gistfile1-txt-L407-L429 But because the storage cannot keep up (probably disk IO problems?), it would hit this branch and not count the targets as scraped at all: https://github.com/prometheus/prometheus/blob/master/retrieval/scrape.go#L430 Are you seeing the I'm not sure if 1.2.0 changed relevant storage stuff, or whether it just so happens that something caused your storage to get overloaded around the same time... |
This comment has been minimized.
This comment has been minimized.
marcbradshaw
commented
Oct 10, 2016
|
I am seeing similar issues, I see a bunch of these errors in log @4000000057fb08421a02beac time="2016-10-09T23:17:12-04:00" level=error msg="Error refreshing service list: Unexpected response code: 500 (rpc error: rpc error: fail followed by @4000000057fb087b3a033fec time="2016-10-09T23:18:09-04:00" level=warning msg="Storage has entered rushed mode." chunksToPersist=402607 maxChunksToPersist=524288 max |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Hi, After switching back to version 1.1.3, prometheus keeps scraping the targets properly again. |
This comment has been minimized.
This comment has been minimized.
|
I think I found the issue. PR imminent. |
beorn7
self-assigned this
Oct 10, 2016
beorn7
added
the
priority/P0
label
Oct 10, 2016
beorn7
referenced this issue
Oct 10, 2016
Merged
Re-add counting of evict chunk ops and decrementing NumMemChunks #2071
This comment has been minimized.
This comment has been minimized.
klausenbusk
commented
Oct 10, 2016
|
I think I have the same issues.
Metrics from Prometheus: http://sprunge.us/ZVLI
Prometheus is started as:
Please say if you need more info. |
This comment has been minimized.
This comment has been minimized.
klausenbusk
commented
Oct 10, 2016
|
Goroutine dump: http://sprunge.us/YGbY |
This comment has been minimized.
This comment has been minimized.
|
I'm working on releasing 1.2.1 |
This comment has been minimized.
This comment has been minimized.
|
Release is there: https://github.com/prometheus/prometheus/releases/tag/v1.2.1 Binaries are built as I'm speaking… |
beorn7
closed this
Oct 10, 2016
This comment has been minimized.
This comment has been minimized.
commarla
commented
Oct 10, 2016
|
Thanks a lots, I have encountered the same issue twice this weekend |
shamil
referenced this issue
Oct 13, 2016
Closed
Multiple K8s SD instances causing scrape issues. #2020
This comment has been minimized.
This comment has been minimized.
raypettersen
commented
Oct 13, 2016
|
Same here. Our dev-prometheus stopped scraping twice in a short time. Thanks for the fix! |
This comment has been minimized.
This comment has been minimized.
metral
commented
Oct 29, 2016
|
I'm still seeing this issue on v1.2.1. I'm using quay.io/prometheus/prometheus:v1.2.1 on k8s v1.4.0 |
This comment has been minimized.
This comment has been minimized.
|
@metral that must be a different issue then. Could you file a new issue using the template and provide your diagnostics so that we have a chance to find out what's going on? |
This comment has been minimized.
This comment has been minimized.
sirhopcount
commented
Nov 18, 2016
|
We also see this issue on version 1.3.0. Prometheus stop scraping several (but not all) endpoints. The endpoints that are no longer being scraped all stop at the same time. We verified that Prometheus can reach the endpoints. We couldn't find any relevant errors in the logs. Version info:
|
This comment has been minimized.
This comment has been minimized.
|
What service discovery are you using? Any configuration reloads in-between? |
This comment has been minimized.
This comment has been minimized.
strzelecki-maciek
commented
Nov 28, 2016
•
|
Sorry, my bad! removed! |
This comment has been minimized.
This comment has been minimized.
|
@strzelecki-maciek You are describing a different issue. Posting it as a follow-up to a different and long fixed issue will not raise any attention. Please file a fresh issue using the template to make sure you are providing the information we need to investigate. Thank you. |
This comment has been minimized.
This comment has been minimized.
gvenka008c
commented
Jun 8, 2017
|
Anyone here seen the below error? We are seeing the similar issue where Prometheus stops scrapping after certain period of time. (say 24 hrs)
|
brian-brazil
added
kind/bug
and removed
kind/bug
labels
Jul 14, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
svenmueller commentedOct 9, 2016
What did you do?
Since upgrading from version 1.1.3 to version 1.2.0, Prometheus stops scraping all targets (node, prometheus) after some time. After restarting prometheus, it works properly for some time, but then it stops scraping all targets again.
What did you expect to see?
All targets should be in state "UP" and last scrape time should be less then 5 seconds (for node targets).
What did you see instead? Under which circumstances?
Targets show "UNKNOWN" instead of "UP".
Environment
Linux 3.13.0-95-generic x86_64
prometheus, version 1.2.0 (branch: master, revision: 522c933)
build user: root@c8088ddaf2a8
build date: 20161007-12:53:55
go version: go1.6.3