Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delta function doing full vector scan #5218

Open
bla-bu opened this Issue Feb 14, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@bla-bu
Copy link

bla-bu commented Feb 14, 2019

Bug Report

What did you do?
Running PromQL queries with millions of data points and using delta function.

What did you expect to see?
Accordingly to the documentation, “delta(v range-vector) calculates the difference between the first and last value of each time series element in a range vector v, returning an instant vector with the given deltas and equivalent labels.[…] delta should only be used with gauges.”. However, looking at the code (functions.go), it seems that delta is doing full scan anyway:

for _, sample := range samples.Points {
  if isCounter && sample.V < lastValue {
	counterCorrection += lastValue
  }
  lastValue = sample.V
}

instead of simply taking the last value from the slice.

I would expect something like this (taken form the issue https://github.com/prometheus/prometheus/issues/3746)

if isCounter {
  for i := firstPoint; i < len(points); i++ {
    sample := points[i]
    if sample.V < lastValue {
	counterCorrection += lastValue
    }
    lastValue = sample.V
  }
}
resultValue := points[len(points)-1].V - points[firstPoint].V + counterCorrection

Environment

  • System information:
    Linux 4.15.0-1026-gcp x86_64

  • Prometheus version:
    prometheus, version 2.6.0 (branch: HEAD, revision: dbd1d58)
    build user: root@bf5760470f13
    build date: 20181217-15:14:46
    go version: go1.11.3

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Feb 15, 2019

IIRC we need the full scan for the extrapolation algorithm, i.e. to find out if a time series starts or stops within the range.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 15, 2019

The length of the slice, plus the start/stop timestamps should be enough for what we need to do.

What are you using delta for?

@bla-bu

This comment has been minimized.

Copy link
Author

bla-bu commented Feb 18, 2019

We have metrics representing "budget", they are gauges. Our users are often interested in how much budget they spent over e.g. last 15 days (single value). The query using delta function looks very natural, however, it is rather slow (in our set up). We are pretty sure we can go with a query where we are simply subtracting two value using offset, the question here is rather whether the delta should really do a full scan (especially when the documentation says "calculates the difference between the first and last value").

Cheers,

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 18, 2019

@bla-bu

This comment has been minimized.

Copy link
Author

bla-bu commented Feb 18, 2019

We're somehow deviating here from the issue. As you said "the length of the slice, plus the start/stop timestamps should be enough for what we need to do", yet, unless I'm terrible misreading the code, we are reading all the points from disk, pushing them to memory, iterating over them (where isCounter is always false) just to use the first and last point and length of the slice. If this is simply how the things are working, and e.g. we cannot calculate averageDurationBetweenSamples without having entire slice, we can then close the issue.

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.