Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upOne scrape target is still marked as UP even though it's not been scraped for over a week #1776
Comments
This comment has been minimized.
This comment has been minimized.
brian-brazil
added
the
kind/bug
label
Jun 30, 2016
This comment has been minimized.
This comment has been minimized.
|
Can you confirm the contents of the file sd configs are identical with respect to that host? |
This comment has been minimized.
This comment has been minimized.
|
We've also had a few fixes to code in this area recently, could you see if 0.20.0 fixes this? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil Thanks for prompt response.
Yep. they are generated by same script from same Chef data. To be sure I copied them both to one host:
Yeah I'll wait a week or so and see if we get another case before I upgrade as that might make a result more conclusive! |
fabxc
modified the milestone:
v1.0.0
Jul 3, 2016
This comment has been minimized.
This comment has been minimized.
|
@banks Any news here? Do you still get this with 1.0.1? |
This comment has been minimized.
This comment has been minimized.
|
@juliusv Thanks for following up. I was intentionally waiting a little while to be more confident that an improvement in a newer version isn't just chance. In other words I've not upgraded yet. That said I still do see live examples of this happening: So I will upgrade as soon as I get a chance and will be somewhat confident that if it doen't recur within a few weeks it is "solved". |
This comment has been minimized.
This comment has been minimized.
|
@banks Thanks! |
This comment has been minimized.
This comment has been minimized.
|
FYI, I just upgraded. Everything good so far, I'll report back in a week and again in 2 weeks. If no more stalls found let's call this fixed. |
This comment has been minimized.
This comment has been minimized.
|
@banks did the problem reappear? |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the reminder - I even set myself a reminder to report back a week ago and then got side tracked before I did.. But I've not observed any more cases of this in 2 weeks, so I'm going to assume it's fixed. Thanks for the help. Great job fixing bugs before they are even reported :) |
banks
closed this
Aug 8, 2016
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |


banks commentedJun 30, 2016
Note restarting the host "fixed" this issue, but I thought I'd report it in case it's useful for debugging as there is clearly a bug somewhere to get into this state.
What did you do?
We have 2 prometheus servers with same scrape config scraping from several hundred nodes each with a telegraf instance exposing metrics.
What did you expect to see?
The same number of "UP" scrape targets on each prometheus server.
What did you see instead? Under which circumstances?
One server had one fewer target for a week.
After diffing the
upquery responses I found the target in question and then looked at it's status on the/statuspage:I't not been scraped for over a week but is still "UP"
Indeed if I look at any of it metrics on a graph, they just stopped being collected 10 days ago.

Note that the target in question is UP and being correctly scrape by the secondary prometheus host the whole time.
I can also curl the scrape endpont from the prometheus host just fine:
So prometheus instance seems to just be stuck on this scrape and not trying at all. To confirm, restarting the prometheus process in question "fixed" it.
Environment
System information:
Linux 3.13.0-74-generic x86_64
Prometheus version:
files in
/opt/prometheus/conf.djust list hosts updated from a cron script that queries Chef server.I didn't see anything obviously relevant there were a bunch of messages