Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPersist alert 'for' state across restarts #422
Comments
brian-brazil
added
the
feature-request
label
Jan 6, 2015
brian-brazil
referenced this issue
Oct 18, 2015
Closed
promql: Remove extrapolation from rate/increase/delta. #1161
fabxc
added
kind/enhancement
and removed
feature request
labels
Apr 28, 2016
beorn7
added
priority/P2
component/notify
labels
Jun 24, 2016
This comment has been minimized.
This comment has been minimized.
|
How would this work? Do we evaluate from the point we terminated on startup again as a way of catching up? For example take an alert A1 with Are we going to have a threshold? Something like |
This comment has been minimized.
This comment has been minimized.
No, and the alert probably wouldn't be firing during that period anyway as there'll be no data from the Prometheus going down. What we want is that we remember that 6h of the FOR has already been satisfied, and when it's satisfied for another 6h then we alert. We need to be a little careful here, for safety if the remaining time is <10m we should treat that to min(10m, for). To implement I was thinking an analouge to ALERTs that'd track when an alert started, and when Promethus restarts it'd check that metric (adjusting if needed for the 10m safety) and initialise internal state accordingly. One other thing is that it can take a while for a Prometheus to warm up and have done enough scrapes for alerts to start firing properly. Thus we should delay rule&alert evaluation for 2 scrape_intervals at starttup to mitigate this. |
This comment has been minimized.
This comment has been minimized.
|
Okay, biggest facepalm moment ever! How could I not see that the server is not running! Sorry about that! Coming to when to start evaluation, will 2 |
This comment has been minimized.
This comment has been minimized.
|
global scrape interval is all that makes sense. Range queries will usually be rate(), and two scrapes+slack should be enough for that to work. |
gouthamve
referenced this issue
Apr 6, 2017
Closed
Option to disable alert sending for x seconds after startup. #2592
brian-brazil
referenced this issue
Jun 27, 2017
Open
Consider adding a range variant of absent() #2882
brian-brazil
added
the
not-as-easy-as-it-looks
label
Jul 14, 2017
simonpasquier
pushed a commit
to simonpasquier/prometheus
that referenced
this issue
Oct 12, 2017
leth
referenced this issue
Apr 3, 2018
Open
Config updates cause alerting rules to forget firing state #493
This comment has been minimized.
This comment has been minimized.
|
Do we want to persist state? If state is persisted then it means that the alerts require a timeseries store locally to enforce the alerting. This is fine for prometheus itself (although it is storing data we might not care about) but does pose a problem for things that use the alerting libraries to enforce promql alerts to remote storage systems. I had hacked up an approach for backfilling on startup, but that doesn't cover the case that you mention here that you don't want to count "prom downtime" against the alert |
This comment has been minimized.
This comment has been minimized.
That's kind of the entire point of this PR. If you're running some non-standard system then you won't get the benefit of this feature, but alerting will continue to function as it does today. |
This comment has been minimized.
This comment has been minimized.
|
I assume then that this functionality would be optional (through some config)? |
This comment has been minimized.
This comment has been minimized.
|
I see no reason why this functionality would not be always on in the Prometheus server. The Prometheus server always has a local tsdb. |
This comment has been minimized.
This comment has been minimized.
|
Quoting from myself:
TLDR; What about others using the library? |
This comment has been minimized.
This comment has been minimized.
|
It's an internal API so we offer no guarantees there, but you can pass in whatever storage objects you like. |
This comment has been minimized.
This comment has been minimized.
|
Closed by #4061 |
gouthamve
closed this
Aug 3, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
brian-brazil commentedDec 10, 2014
Currently if Prometheus restarts, we lose the 'for' state for firing alerts. While this isn't an issue for short for clauses, it presents a problem for clauses that are in the hours to days range.
It'd be good to persist this state in some way, so that alerts don't have to start again from scratch. We probably don't want to count the time the prometheus server is down against the 'for' clause.