Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling on query_range #2031

Closed
tcolgate opened this Issue Sep 26, 2016 · 16 comments

Comments

Projects
None yet
4 participants
@tcolgate
Copy link
Contributor

tcolgate commented Sep 26, 2016

Would people be interested in an implementation of down sampling for query_range? I've been considering the following:

  • add a downsample param to query_range, defaults to none, everything works as presently if it's not set.
  • downsample=METHOD, would select a downsampling method.
  • if a downsample method is requested, a samples parameter would select how many samples to return to the caller.
  • Add an internal Downsampler interface of roughly:
    func (ds sample.Matrix) Downsample(samples int) sample.Matrix

I'd then add LTTB downsampling (I have a tested implementation), and probably max/min/mean (any aggregator?) methods.

The primary motivation is to support graphing with minimal visual feature loss (that is what LTTB is really meant for).

If people are interested, I'd very much like to have a go at contributing this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 26, 2016

How would this work semantically? query_range already downsamples.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 26, 2016

Does it? That isn't clear from the documentation. The only this that sounds like downsampling is step
step=<duration>: Query resolution step width.
Which sounds more like it is skipping samples than aggregating them. If it's aggregating them in some way, it should say how they are being aggregated.
I've been assuming that step is skipping samples, and giving me the ones that fall every ``` between the start and end. If it's doing some downsampling of the data, then I'll have to think about this abit harder.

  • if it's not actually aggregating, then step and samples would just do both. Take the current data that woul be returned by step=... and downsample that to samples=X samples.
  • If it's doing some aggregation, I'll have to rethink the interface. Possibly just adding LTTB as an option, and defaulting to the existing method (then calculating the resolution from the (end - start) / duration
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 26, 2016

I think you misunderstand how query_range works. It evaluates the expression every step seconds and returns the result. If you want to downsample further, increase step. Query_range has no notion of samples.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 26, 2016

I think that's pretty much what I assumed was happening. I'm trying to cater for the case where step results in missing spikes on a graph when it's auto-calculated based on the graph resolution. You can set step manually, but then you are returning potentially more data to the client. e.g., I have a task:thing:rate1m rule , if I graph that for a week on a 800 pixel wide graph, that a step of something like 12m per pixel, plenty of space for me to lose a spike.
if I do a step=1m, then I get 10080 datapoints to squeeze onto my 800 pixel graph, and my graphing library gets to decide what I should see (and I don't neccesarily know what decision it made). I also am transferring a lot more data.
So step=1m, downsample=max, samples=800 would get me 800 the max the max of 800 buckets of the 10080.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 26, 2016

As a work around, you could emulate downsampling strategies by using avg_over_time, max_over_time and such.
They only work on time series, though, not on arbitrary expressions, so you might need a recording rule as a middle man if you want to apply the approach to a complex expression.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 26, 2016

yes, you'd need to need to know the time range you are typically
displaying, and the resolution of the graph you are displaying it on though.
Adding support seems like it should be reasonably easy, backward
compatible, and give people a bit more control. I think the grafana
integration could make good use of this.

On Mon, 26 Sep 2016 at 09:15 Björn Rabenstein notifications@github.com
wrote:

As a work around, you could emulate downsampling strategies by using
avg_over_time, max_over_time and such.
They only work on time series, though, not on arbitrary expressions, so
you might need a recording rule as a middle man if you want to apply the
approach to a complex expression.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#2031 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEo8zpmLhRZXzsJR9YluH9U82pEZqrWks5qt38QgaJpZM4KGNKY
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 26, 2016

I'm not sure this would be as useful as you think, we'd still want the limit on the number of steps evaluated in order to avoid queries of death/sloth.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 26, 2016

This is pretty much what I'm thinking about... master...tcolgate:downsampling

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 26, 2016

I'll avoid any further work on it until someone decides either way if this is wanted. I do think it has some value. OpenTSDB provides for it, I'm not sure if Influx or Graphite do. I think it gives a more complete set of features for querying.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 26, 2016

I don't think downsampling the final result like that in the web API will be very useful - most of the effort is happening already before that (loading the data from disk and evaluating the expression). In your case, you'd have to either start with a super high query resolution and then downsample on that in the end (but then you still do most of the work before that final step), or you start at a normal query resolution, but then you already skipped over a lot of raw sample data and there's not much more useful downsampling you can do.

The theoretically "right" way to do it would be to downsample at the raw sample layer. As in, have the storage layer already provide you with downsampled data that your expression can then work on faster. You still need to go through all the data on disk to downsample it correctly, but at least the query evaluation will have to process less.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 26, 2016

That sounds like we're getting into long term storage downsampling then.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 27, 2016

I wasn't really thinking of this as an optimization on the server side, more a convenience for the user (explicit control over downsampling, and minimizing data transfer to the client).
Downsampling when reading from storage seems more problematic. Yes, it minimizes effort in the query engine, but it'll also be happening before, for instance, a rate()/increase(), which seems like it could potentially result in very different query results, which could be even more confusing to users (LTTB would be very dubious if done pre-calculation, the result would be potentially very dubious, and unlikely to represent the LTTB of the original final result).
If there is some long term desire to downsample at storage read, then that probably needs to be included in the remote storage discussions?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 27, 2016

If there is some long term desire to downsample at storage read, then that probably needs to be included in the remote storage discussions?

That's an implementation detail for the remote storage, as downsampling in storage is explicitly out of scope for Prometheus itself.

@tcolgate

This comment has been minimized.

Copy link
Contributor Author

tcolgate commented Sep 27, 2016

@brian-brazil surely if the user is requesting downsampling, there has to be a means for that request to be passed to the storage layer (if the idea is for the query engine to process less data)? Any RPC would need to take that into account?
I'll close this ticket now, thanks for the time.

@tcolgate tcolgate closed this Sep 27, 2016

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 27, 2016

surely if the user is requesting downsampling, there has to be a means for that request to be passed to the storage layer (if the idea is for the query engine to process less data)?

I never said the user requested downsampling for long term storage. The issue is that long term storage must downsample in it's response for bandwidth reasons. How to do that transparently remains an open question, but seems like it should be possible.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.