Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMake alerts stateless #1117
Comments
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
This is a dupe of #422 . I think we'll need something more nuanced to cover both the time when the prometheus server is down (we don't want an alert before 2 scrapes have been done so we have our bearings), and that doing the query historically may not have exactly the same results as doing it at each point in time. |
brian-brazil
closed this
Sep 26, 2015
This comment has been minimized.
This comment has been minimized.
|
Also, doing the query always over the entire |
This comment has been minimized.
This comment has been minimized.
|
That's something I strongly doubt without measurements indicating otherwise. As alerting rule evaluation has never been anything but white noise so far, If on the other hand, it would be an issue. This would be a clear sign to On Mon, Sep 28, 2015 at 12:54 PM Julius Volz notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
As |
This comment has been minimized.
This comment has been minimized.
|
I'm expecting to see for expressions up to a month in length in common usage, so I agree with Julius that I don't see this approach working out. |
This comment has been minimized.
This comment has been minimized.
|
At which evaluating at 1m intervals would be a highly questionable approach. On Mon, Sep 28, 2015 at 1:14 PM Brian Brazil notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
Even if you increase the interval by a bit, that'll be very confusing to a user, why suddenly |
This comment has been minimized.
This comment has been minimized.
|
Also, most rules should be able to be answered from memory alone. Once we need to go to disk, we are already dead for many of them (you can already see that after some beefy servers start up - they take a while to function properly after restart because the rules load so much from disk). |
This comment has been minimized.
This comment has been minimized.
|
So, the original reason we thought about doing it this way was to get sensible evaluation of FOR over restarts / reloads, without having to introduce yet another bit of storage that needs to be updated and cared for. We don't need to make the evaluation "really" stateless – what about using the timeseries data if the rule hasn't been evaluated since a restart / reload, but cache the result of that (the time when it started firing) in memory? |
This comment has been minimized.
This comment has been minimized.
Such as via |
This comment has been minimized.
This comment has been minimized.
barkerd427
commented
Sep 28, 2015
|
We run batch jobs weekly or monthly, so my for time goes back that far.
|
This comment has been minimized.
This comment has been minimized.
jinhang
commented
Nov 7, 2016
|
@brian-brazil why I write |
This comment has been minimized.
This comment has been minimized.
|
@jinhang Because alerts need at least one rule evaluation cycle to go from pending to firing. I would guess that your rule evaluation cycle is much longer. You can just omit |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
fabxc commentedSep 26, 2015
Alerting rules should always be evaluated as a range query over the entire hold duration so pending alerts are not lost on restart.