New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample stats: total and peak #10369
Sample stats: total and peak #10369
Conversation
|
/prombench edfe657 |
|
Incorrect prombench syntax, please find correct syntax here. |
|
/prombench main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, in general, this LGTM.
Do we need to compute the peak and total? Could we let that to the caller instead?
|
|
||
| type Stats struct { | ||
| TimerStats *QueryTimers | ||
| SampleStats *QuerySamples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are concerned about impact, we could later on make that an interface, so when it is disabled we have a NoopSampleStats that has lesser impact. I see @beorn7 has launched prombench already, it might not be needed.
promql/engine.go
Outdated
| @@ -379,12 +394,12 @@ func (ng *Engine) SetQueryLogger(l QueryLogger) { | |||
| } | |||
|
|
|||
| // NewInstantQuery returns an evaluation query for the given expression at the given time. | |||
| func (ng *Engine) NewInstantQuery(q storage.Queryable, qs string, ts time.Time) (Query, error) { | |||
| func (ng *Engine) NewInstantQuery(q storage.Queryable, opts QueryOpts, qs string, ts time.Time) (Query, error) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this, because we want to jave later on per-query lookback delta, it could be a query opt as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have QueryOpts as a pointer ? we are passing that downstream mutliple times, creating un-needed allocs. it would also be nicer in tests.
|
Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
|
/prombench cancel I assume you have seen enough benchmark data. |
|
Benchmark cancel is in progress. |
|
Hi @roidelapluie, thanks a lot for looking into this.
I don't think is possible to compute the one stats from another as they are tracking different things (Peak is not the max(totalQueryableSamples) but instead is the max samples kept in memory during the query execution. The Peak stats show how close the query was to reach the |
I mean, can we only track them per step, and not compute the overall maximum. |
We're currently not recording peak samples per step, just the total samples queried. I think it's doable, so if you'd like to see it in this PR, I can take a shot at it. That said, because per-step stats are disabled by default, while overall stats are always on, I'd prefer to always compute overall stats. |
|
One thing I've been thinking about doing before landing this is changing the stats suffix from |
|
👍 for PerStep |
|
close/reopen to trigger circleci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I have got the time to review & test the pull request.
Somehow,
"samples": {
"peakSamples": 153,
"totalQueryableSamples": 126
},
are always reported. I think it's fine, we could even later add those fields to the UI.
In my review, I noted a wrong change in tsdb, and comments that do not end with full stop. The feature flag description is also incorrect.
Thanks!
|
Windows tests should be fixed by rebasing on top of main. |
We always track total samples queried and add those to the standard set of stats queries can report. We also allow optionally tracking per-step samples queried. This must be enabled both at the engine and query level to be tracked and rendered. The engine flag is exposed via a Prometheus feature flag, while the query flag is set when stats=all. Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com> Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com> Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
This allows other implementations to inject their own statistics that they're gathering in data linked from the context.Context. For example, Cortex can inject its stats.Stats value under the `cortex` key. Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
This exactly corresponds to the statistic compared against MaxSamples during the course of query execution, so users can see how close their queries are to a limit. Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com> Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com> Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
|
triggering ci |
|
Thanks! |
Per Julien's feedback on prometheus#10369, we're choosing to be consistent with data types inside the stats structure (ints) rather than with the points format that is part of the normal query responses (strings). We have this option because this data cannot be NaN/Inf. Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
Per Julien's feedback on #10369, we're choosing to be consistent with data types inside the stats structure (ints) rather than with the points format that is part of the normal query responses (strings). We have this option because this data cannot be NaN/Inf. Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
|
/prombench v2.34.0 |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
This pull request has memory issues but it's head has been rebased on top of #10450, which I believe to be the actual issue. |
prometheus/prometheus#10369 was part of v2.35.0 release and it exposes more stats. There's currently no way to access those stats since Query() and RangeQuery() returns only a subset of Prometheus API response. To avoid bloating Query() return values I'm adding this as a new Option where you can pass a pointer to a QueryStats struct that will hold all recorded stats, instead of adding another return value. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This PR covers most of what we proposed in #10181. It's a combination of my work, @alanprot's work, and @Harkishen-Singh's work.
It adds new stats:
totalQueryableSamples: The total number of samples read out of the underlying Queryable instance inclusive of any samples that are buffered between range steps.totalQueryableSamplesPerStep: The above statistic tracked per step in a range query. These will always add up to the value intotalQueryableSamples. The engine does not compute or report these by default. We only do the work and report the data if an engine flag is set (available via--enable-feature per-step-statsfrom the CLI) and a per-query flag is set (stats=allin the request). We introduce a newQueryOptsstruct in the Engine's API to make having these kinds of per-query options easier in the future.peakSamples: The high water mark of the value compared against MaxSamples during the course of query execution.Per @bboreham's feedback, this PR also changes the query stats response from a struct to an interface and adds a way to override that response. Implementations like Cortex can use this to read stats from the Context and insert them into the response that's sent to the user and/or remove information from the stats response.
This does not include the series stats we proposed because we have some more thinking to do about how best to represent them. More to come on that front.
The performance impact of the new statistics that are always on should be negligible per our local benchmarking. Even the per-step statistics don't have much impact outside the combination of very simple queries with many steps. I'll upload these benchmark runs in a bit.
An example JSON response with all these stats on for
max_over_time(up[5m])with 30-second steps over 5 minutes. The underlying data here started only a minute before running the query.{ "status": "success", "data": { "resultType": "matrix", "result": [ { "metric": { "instance": "localhost:9090", "job": "prometheus" }, "values": [ [ 1645813076, "1" ], [ 1645813106, "1" ], [ 1645813136, "1" ], [ 1645813166, "1" ], [ 1645813196, "1" ], [ 1645813226, "1" ], [ 1645813256, "1" ], [ 1645813286, "1" ], [ 1645813316, "1" ], [ 1645813346, "1" ], [ 1645813376, "1" ] ] } ], "stats": { "timings": { "evalTotalTime": 0.006922879, "resultSortTime": 2.677e-06, "queryPreparationTime": 0.003402187, "innerEvalTime": 0.002431834, "execQueueTime": 0.000819559, "execTotalTime": 0.008257353 }, "samples": { "totalQueryableSamplesPerStep": [ [ 1645813076, "4" ], [ 1645813106, "6" ], [ 1645813136, "8" ], [ 1645813166, "10" ], [ 1645813196, "12" ], [ 1645813226, "14" ], [ 1645813256, "16" ], [ 1645813286, "18" ], [ 1645813316, "20" ], [ 1645813346, "20" ], [ 1645813376, "20" ] ], "totalQueryableSamples": 148, "peakSamples": 31 } } } }