Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDownsampling on query_range #2031
Comments
This comment has been minimized.
This comment has been minimized.
|
How would this work semantically? query_range already downsamples. |
This comment has been minimized.
This comment has been minimized.
|
Does it? That isn't clear from the documentation. The only this that sounds like downsampling is step
|
This comment has been minimized.
This comment has been minimized.
|
I think you misunderstand how query_range works. It evaluates the expression every |
This comment has been minimized.
This comment has been minimized.
|
I think that's pretty much what I assumed was happening. I'm trying to cater for the case where step results in missing spikes on a graph when it's auto-calculated based on the graph resolution. You can set step manually, but then you are returning potentially more data to the client. e.g., I have a task:thing:rate1m rule , if I graph that for a week on a 800 pixel wide graph, that a step of something like 12m per pixel, plenty of space for me to lose a spike. |
This comment has been minimized.
This comment has been minimized.
|
As a work around, you could emulate downsampling strategies by using |
This comment has been minimized.
This comment has been minimized.
|
yes, you'd need to need to know the time range you are typically On Mon, 26 Sep 2016 at 09:15 Björn Rabenstein notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
I'm not sure this would be as useful as you think, we'd still want the limit on the number of steps evaluated in order to avoid queries of death/sloth. |
This comment has been minimized.
This comment has been minimized.
|
This is pretty much what I'm thinking about... master...tcolgate:downsampling |
This comment has been minimized.
This comment has been minimized.
|
I'll avoid any further work on it until someone decides either way if this is wanted. I do think it has some value. OpenTSDB provides for it, I'm not sure if Influx or Graphite do. I think it gives a more complete set of features for querying. |
This comment has been minimized.
This comment has been minimized.
|
I don't think downsampling the final result like that in the web API will be very useful - most of the effort is happening already before that (loading the data from disk and evaluating the expression). In your case, you'd have to either start with a super high query resolution and then downsample on that in the end (but then you still do most of the work before that final step), or you start at a normal query resolution, but then you already skipped over a lot of raw sample data and there's not much more useful downsampling you can do. The theoretically "right" way to do it would be to downsample at the raw sample layer. As in, have the storage layer already provide you with downsampled data that your expression can then work on faster. You still need to go through all the data on disk to downsample it correctly, but at least the query evaluation will have to process less. |
This comment has been minimized.
This comment has been minimized.
|
That sounds like we're getting into long term storage downsampling then. |
This comment has been minimized.
This comment has been minimized.
|
I wasn't really thinking of this as an optimization on the server side, more a convenience for the user (explicit control over downsampling, and minimizing data transfer to the client). |
This comment has been minimized.
This comment has been minimized.
That's an implementation detail for the remote storage, as downsampling in storage is explicitly out of scope for Prometheus itself. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil surely if the user is requesting downsampling, there has to be a means for that request to be passed to the storage layer (if the idea is for the query engine to process less data)? Any RPC would need to take that into account? |
tcolgate
closed this
Sep 27, 2016
This comment has been minimized.
This comment has been minimized.
I never said the user requested downsampling for long term storage. The issue is that long term storage must downsample in it's response for bandwidth reasons. How to do that transparently remains an open question, but seems like it should be possible. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
tcolgate commentedSep 26, 2016
Would people be interested in an implementation of down sampling for query_range? I've been considering the following:
func (ds sample.Matrix) Downsample(samples int) sample.MatrixI'd then add LTTB downsampling (I have a tested implementation), and probably max/min/mean (any aggregator?) methods.
The primary motivation is to support graphing with minimal visual feature loss (that is what LTTB is really meant for).
If people are interested, I'd very much like to have a go at contributing this.