Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRecording rule and adhoc query produce different (floating point) result #2951
Comments
This comment has been minimized.
This comment has been minimized.
|
Queries run on the second, recording rules can happen at any millisecond within the second. |
brian-brazil
added
the
kind/question
label
Jul 14, 2017
This comment has been minimized.
This comment has been minimized.
|
The data has been scraped regularly but all values and dimensions have been constant over the last hour. All data comes from a single target. Can it still cause trouble if the rule evaluation happens concurrently to the scraping/ingesting? Or what are you implying with the second/milisecond precision? |
This comment has been minimized.
This comment has been minimized.
|
Hmm, you haven't demonstrated that the value hasn't changed. Try "changes" rather than "increase". |
This comment has been minimized.
This comment has been minimized.
|
Same result for changes. There are no changes in the date source
but several changes in the aggregate
|
This comment has been minimized.
This comment has been minimized.
|
Is the count() consistent? |
This comment has been minimized.
This comment has been minimized.
|
Yes, the count is consistent. I managed to track this down further. In the following examples I am in the query explorer on the UI and hit enter a couple of times in very short succession. As expected, the aggregated result does not change as it all happens within one rule evaluation interval:
After the time for a rule evaluation has passed, I get a slightly different result even though the underlying data has no changes:
If I do this for the non-aggregated ones I get different results on each query evaluation:
Sorting the resultset before hand gives a consistent result though, as we will make the same floating point error every time:
In total, this would explain why we get a different rule evaluation result every time, and thus lots of changes in the resulting time series. I believe it is fine that Prometheus makes these slight mistakes. I will have to correct my incoming data instead. |
This comment has been minimized.
This comment has been minimized.
|
This looks like normal floating point inaccuracy. |
StephanErb
referenced this issue
Jul 16, 2017
Open
Constant costs might decrease due to numerical instabilities #12
This comment has been minimized.
This comment has been minimized.
|
Thanks for your help! (I will opt for the mailing list next time. It looked like a bug to me at first which is why I jumped to the tracker) |
StephanErb
closed this
Jul 16, 2017
This comment has been minimized.
This comment has been minimized.
|
@StephanErb In case you were still wondering why the result was stable for ad-hoc queries, but not for recording rules: within a single range query, all the individual time resolution steps share the same ordering for the underlying time series because they get attached to the AST of the expression in a particular order in the query preparation phase and then just used at every time step in that order. Rules are individual instant queries that get executed at every rule evaluation cycle, so multiple rule evaluations don't share the same underlying series order. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
StephanErb commentedJul 14, 2017
What did you do?
Define an aggregation rule for an expensive query.
The underlying data looks something like this (simplified label set). Please note the large number of digits.
The metric has lots of dimensions:
The metric has not changed within the last hour:
What did you expect to see?
The underlying metric consists of several very slow moving counters. I therefore expect that the aggregation rule and the adhoc query produce the same results.
What did you see instead? Under which circumstances?
Plotting the adhoc query produces a flat line where as plotting the aggregated timeseries produces a non-linear one.
The actual difference is small but still noticeable.

Looking at the data this probably boils down to rounding errors in floating point math. But why does it differ for the recording rule and the adhoc query?
Environment
System information:
Linux 3.16.0-4-amd64 x86_64
Prometheus version:
prometheus, version 1.7.1 (branch: master, revision: 3afb3ff)
build user: root@0aa1b7fc430d
build date: 20170612-11:44:05
go version: go1.8.3
Prometheus configuration file: