fix: correct duration dashboard panels to use proper PromQL queries#8528
Merged
zirain merged 5 commits intoenvoyproxy:mainfrom Apr 23, 2026
Merged
Conversation
The duration panels for status_update, resource_apply, and resource_delete metrics were showing cumulative _sum counter values instead of actual durations. - Avg panels: use rate(sum)/rate(count) for true average duration - Rename Max panels to p99: use histogram_quantile(0.99, rate(bucket)) - Rename Min panels to p50: use histogram_quantile(0.50, rate(bucket)) - All panels: use sum by(kind) for consistent grouping and lastNotNull calc Fixes envoyproxy#8439 Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com>
✅ Deploy Preview for cerulean-figolla-1f9435 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8528 +/- ##
==========================================
- Coverage 74.41% 74.40% -0.01%
==========================================
Files 246 246
Lines 39194 39194
==========================================
- Hits 29167 29164 -3
- Misses 8008 8010 +2
- Partials 2019 2020 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
|
@felipesabadini can you run |
Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>
Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>
jukie
approved these changes
Apr 23, 2026
Contributor
|
/retest |
zirain
approved these changes
Apr 23, 2026
skos-ninja
pushed a commit
to skos-ninja/envoy-gateway
that referenced
this pull request
May 1, 2026
…nvoyproxy#8528) * fix: correct duration dashboard panels to use proper PromQL queries The duration panels for status_update, resource_apply, and resource_delete metrics were showing cumulative _sum counter values instead of actual durations. - Avg panels: use rate(sum)/rate(count) for true average duration - Rename Max panels to p99: use histogram_quantile(0.99, rate(bucket)) - Rename Min panels to p50: use histogram_quantile(0.50, rate(bucket)) - All panels: use sum by(kind) for consistent grouping and lastNotNull calc Fixes envoyproxy#8439 Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com> * Update default.out.yaml Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com> * Update e2e.out.yaml Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com> --------- Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com> Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com> Co-authored-by: Isaac Wilson <isaac.wilson514@gmail.com> Signed-off-by: Jake Oliver <jake@truelayer.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes the Envoy Gateway Global dashboard duration panels that were displaying
cumulative
_sumcounter values instead of actual durations. Values were showingin minutes and growing over time instead of reflecting actual operation latency.
Changes:
_sumtosum by(kind) (rate(_sum[5m])) / sum by(kind) (rate(_count[5m]))for true average duration per operationhistogram_quantile(0.99, sum by(le, kind) (rate(_bucket[5m])))for 99th percentile latencyhistogram_quantile(0.50, sum by(le, kind) (rate(_bucket[5m])))for 50th percentile (median) latencycalcreducers tolastNotNullandeditorModetocodeAffected metrics:
status_update_duration_seconds,resource_apply_duration_seconds,resource_delete_duration_seconds.Which issue(s) this PR fixes:
Fixes #8439
Release Notes: No