Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upThrottled Ingestion sets up metric to 0 #2117
Comments
brian-brazil
added this to the
2.x milestone
Oct 25, 2016
This comment has been minimized.
This comment has been minimized.
|
This would arguably be a breaking change, so goes with 2.0. |
This comment has been minimized.
This comment has been minimized.
|
The 2.x milestone implies that there are cases where folks depend on a job's up metric to transition to zero to notify them of performance problems with the Prometheus instance itself. Is that the case? |
This comment has been minimized.
This comment has been minimized.
|
It's a semantic change, so it's breaking even if noone is depending on it. I would hope noone is monitoring Prometheus that way, metamon is the way to go here. |
This comment has been minimized.
This comment has been minimized.
|
For completeness, I've done the following to attempt to work around this issue in my alerting:
|
This comment has been minimized.
This comment has been minimized.
|
Just as a side note: If your Prometheus server regularly throttles ingestion, you have a big problem. Throttling ingestion is a last resort of the server to keep itself alive. It severely impedes your ability to monitor. It must be rare, and some alert should definitely wake somebody up if it ever happens. |
brian-brazil
added
the
kind/enhancement
label
Oct 26, 2016
This comment has been minimized.
This comment has been minimized.
|
This is no longer possible in 2.0, as there's no throttling. |
brian-brazil
closed this
May 15, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
jjneely commentedOct 25, 2016
What did you do?
For various reasons one of the Prometheus instances that scrapes the node exporter for a region hit throttled ingestion mode. When a scrape is skipped, the resulting up metric appears to be set to 0.
What did you expect to see?
I expected that the up metric would not be updated and that Prometheus would not fire false alerts. Brain appears to agree:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/prometheus-users/hXlvClKp_8w/m9xxcK-8AAAJ
What did you see instead? Under which circumstances?
What happened was that this Prometheus instance began issuing alerts that hundreds of machines where down.
Environment
System information:
Linux 3.13.0-85-generic x86_64
Prometheus version:
prometheus, version 1.2.1 (branch: XXX, revision: 1.2.1-1+JENKINS~trusty.20161012205023)
build user: jjneely@42lines.net
build date: 2016-10-12T20:50:55Z
go version: go1.7.1