Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upQuery performance for large time series. #4391
Comments
This comment has been minimized.
This comment has been minimized.
|
Can you clarify exactly what you are reporting here? |
This comment has been minimized.
This comment has been minimized.
|
my point is when users uses large scale tsdb, and they trying to work with the Prometheus UI graph they probably faced a timeout due to the default resolution which is 3 sec.
|
This comment has been minimized.
This comment has been minimized.
|
There is no default setting for this, it's based on the resolution so will generally have roughly the same number of steps. |
This comment has been minimized.
This comment has been minimized.
|
As (snarkily) explained in the prometheus-users thread, the time it takes to calculate a range (as opposed to instant) query is directly proportional to the resolution/number of steps. If your step is 3 seconds, then that's going to take ~5x the amount of computation as for a 15 second step. While optimizing the query engine would help improve things overall, your query is still going to take 5x more at 3 second resolution than at 15 second resolution. I have proposed a handful of PRs that attempt to improve query performance, but they will only do so incrementally (e.g. even if a query turns out to be twice as fast with these changes, all it takes to undo that is twice the number of series; or As Brian pointed out, you need to optimize your queries. What I'm doing for the equivalent queries on my dashboards (which would otherwise take on the order of 30 seconds to load) is that I have recorded rules to precompute e.g. rates and pre-aggregate e.g. by handled or status code. It takes away a bit of the flexibility, so for a couple of important dashboards I have the fast version (which displays the output of said recorded rules) for general consumption and the debug version (which does everything on the fly and thus allows for a few more filters and knobs) for actual debugging. But I know that I can only use the latter for ranges of up to 1 hour and wait for 30 seconds after every change to the filters. |
This comment has been minimized.
This comment has been minimized.
|
This is stale, and there doesn't appear to be an issue here. |
brian-brazil
closed this
Dec 7, 2018
mrsiano
changed the title
Query execution time performance
Query performance for large time series.
Dec 17, 2018
This comment has been minimized.
This comment has been minimized.
valyala
commented
Mar 6, 2019
|
@mrsiano , Prometheus may write data to remote storage, so you can offload heavy queries to remote storage instead of Prometheus. While many remote storage solutions do not understand PromQL (so you need to use another query languages such as SQL, InfluxQL, Flux, etc.), there are a few remote storage solutions with native PromQL suppot (so you can just update Prometheus datasource url in Grafana for querying the remote storage): |
mrsiano commentedJul 16, 2018
In continuing to https://groups.google.com/forum/#!search/promethues/prometheus-users/kZ2DvZnYHUA/YDd81RhtBAAJ
we found the default resolution \ step size in seconds significantly affect the query execution time performance, specially on large scale setups.
we can change the default graph resolution to 15 or even 30 sec to make sure large results set will not reach the gateway timeout error.
as long the step size increased it will speed up the results, most of the impact is around innerEvalTime, see the following:
by this issue we trying to address and support reliable performance by high resolution in sec, specially on large scale.
currently change the resolution is sort of workaround and not an optimal solution.
Environment
running prometheus by docker on quite large host with 40 cores with no limits
System information:
Linux 3.10.0-907.el7.x86_64 x86_64
Prometheus version:
v2.3.2