Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uppromtool test rules: off-by-one bug for functions applied to range vectors #4874
Comments
This comment has been minimized.
This comment has been minimized.
|
I don't see a problem here. Why do you think this is wrong? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil Same expression |
This comment has been minimized.
This comment has been minimized.
|
Are you running exact the same query with exact the same time offsets relative to the data? |
This comment has been minimized.
This comment has been minimized.
|
Yep, identical — same data, same query (and same range), no offsets or anything. I was investigating why alerts which passed tests don't work in production, and looks like they do work in production now after I tuned them according to this |
This comment has been minimized.
This comment has been minimized.
|
Time ranges are inclusive in PromQL, so it sounds like your alerts aren't resilient to jitter and/or failed scrapes. |
This comment has been minimized.
This comment has been minimized.
|
Not sure I got the point. For example, how can I test some really simple expression (just for the purpose of testing, it could be more complex and useful, but the problem will stay) like It's true that it is not resilient to So is this Thanks. |
This comment has been minimized.
This comment has been minimized.
|
Depending on races and what not, you might get anywhere between 0 and 11 points for the sum. |
This comment has been minimized.
This comment has been minimized.
|
Wow, OK. So my tests basically test nothing. From UX perspective I don't see any value in such tests though, because I'm not sure I can write alerts that will work reliably in both testing and production environments for such inconsistent data ranges. It doesn't work even with simplest expressions and inputs and prometheus instance with almost zero load and 1m scrape/eval intervals. You can close it I suppose? |
This comment has been minimized.
This comment has been minimized.
|
The tests are for PromQL in an idealised environment, if your expressions are fragile you may have unexpected results. |
This comment has been minimized.
This comment has been minimized.
|
Checked again on clean install of prometheus and it still seems weird. No exporters besides of prometheus itself on :9090, so very low load. Eval/scrape intervals 1min for both prometheus and promtool unittests. From prometheus At the same time timestamps (and values) for
If these timestamps are real, if I understand inclusiveness correctly, Is this really by design? If these -1's are so consistent maybe it will make sense from UX perspective to make unittests match OOB (basically idealised, but still real world) experience of prometheus? Because atm I found no way to make prometheus match unittesting results 1:1 for even a single time when working with time ranges. EDIT: I'm wrong about inclusiveness, it should actually return 1 result, as it does. Still inconsistency between prometheus and promtool persists. |
This comment has been minimized.
This comment has been minimized.
|
Hi, for the values you posted above:
If Having said that, Prometheus can return |
This comment has been minimized.
This comment has been minimized.
|
Maybe you are right and nothing should be changed. In the end I went for at least
so if it's not there would be nice to have a few words (in best practices maybe?) to not use things like Anyways, thanks for the explanations! |
bititanb commentedNov 16, 2018
Bug Report
What did you do?
promtool test rules test.ymlWhat did you expect to see?
What did you see instead?
Under which circumstances?
Environment
Modified example configs from docs:
https://gist.github.com/bititanb/1164058e966f1faf5b2aa4bdc3454249/revisions
More:
Same thing happens at least with
deriv()function.deriv(up[1m])works in tests but in prometheus expression browser it is kind of not a valid expression at all, as it always returnsno datapoints found. Looks likederiv(up[1m])just should bederiv(up[2m])andsum_over_time(up[1m])should besum_over_time(up[2m])and so forth in tests.Thanks. /CC @codesome