Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delta, rate, irate over-report by up to 100% #3086

Closed
vgough opened this Issue Aug 17, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@vgough
Copy link

vgough commented Aug 17, 2017

What did you do?
delta(x[2m]) when collecting at 1m intervals shows 2x the expected values. The over-reporting appears to be based on a ratio of aggregation window and collection windows. If you aggregate over 5m window, then over-reporting drops to 25%. At 10m aggregation window, it drops to 11%.

Can also be seen using this test:

--- a/promql/testdata/functions.test
+++ b/promql/testdata/functions.test
@@ -117,6 +117,15 @@ eval instant at 20m delta(http_requests[20m])
 
 clear
 
+# Tests for delta().
+load 1m
+       http_requests{path="/foo"}      0 0 0 0 1 1 1 1
+
+eval instant at 290s delta(http_requests[2m])
+       {path="/foo"} 1
+
+clear
+
 # Tests for idelta().
 load 5m
        http_requests{path="/foo"}      0 50 100 150
--- FAIL: TestEvaluations (2.13s)
        promql_test.go:33: error running test testdata/functions.test: error in eval delta(http_requests[2m]): expected 1 for {path="/foo"} but got 2
FAIL

Seems like a problem with how sample bounds impact the delta computation: https://github.com/prometheus/prometheus/blob/master/promql/functions.go#L117

Environment
Linux, multiple versions, distributions, kernels.

  • Prometheus version:
    production versions of prometheus, and github master.
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 17, 2017

This behaviour is correct, see https://www.youtube.com/watch?v=67Ulrq6DxwA

You should also not use delta, you want rate, irate or increase here.

@vgough

This comment has been minimized.

Copy link
Author

vgough commented Aug 17, 2017

If you look at the test diff I sent, the difference from the test directly above it is that I'm asking for a delta on a range that doesn't land exactly upon a sample time. Almost all the other tests run exactly on a sample point, which doesn't exercise the extrapolation code. In real world usage, I've never seen promql returning results synchronized with samples.

Also, as the bug name states, rate and irate similarly report incorrect results. Here's what this looks like from a user's point of view. We have a gauge holding the number of open file descriptors. It isn't easy to see in this snapshot, but the value is almost always 123, sometimes down to 122, sometimes up to 124.

screenshot_20170817_161944

Now take a delta:

screenshot_20170817_162007

The delta always shows increments and decrements by factors of 2. That never occurs in this graph.

irate also shows increments as a factor of 2. These graphs are taken from prometheus 1.6.1's graph interface.

The video you linked is 40minutes long. If you'd like to reference a particular part, I'm happy to take a look. However the results we're seeing are creating misleading graphs, and clearly impossible graphs are how I discovered the issue. In the short term, we've started using correction factors to correct the graphs, eg divide by 2 when using 2m aggregation, divide by 1.25 when using 5m aggregation, etc.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 18, 2017

The entire talk is relevant, as it is far more likely that you are misunderstanding the behaviour rather than there being a bug.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.