Skip to content

fix: correct duration dashboard panels to use proper PromQL queries#8528

Merged
zirain merged 5 commits intoenvoyproxy:mainfrom
felipesabadini:fix/dashboard-duration-promql
Apr 23, 2026
Merged

fix: correct duration dashboard panels to use proper PromQL queries#8528
zirain merged 5 commits intoenvoyproxy:mainfrom
felipesabadini:fix/dashboard-duration-promql

Conversation

@felipesabadini
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixes the Envoy Gateway Global dashboard duration panels that were displaying
cumulative _sum counter values instead of actual durations. Values were showing
in minutes and growing over time instead of reflecting actual operation latency.

Changes:

  • Avg panels (82, 220, 224): Changed queries from raw _sum to sum by(kind) (rate(_sum[5m])) / sum by(kind) (rate(_count[5m])) for true average duration per operation
  • Max → p99 panels (83, 221, 225): Renamed and changed queries to histogram_quantile(0.99, sum by(le, kind) (rate(_bucket[5m]))) for 99th percentile latency
  • Min → p50 panels (84, 222, 226): Renamed and changed queries to histogram_quantile(0.50, sum by(le, kind) (rate(_bucket[5m]))) for 50th percentile (median) latency
  • All 9 panels: Updated Grafana calc reducers to lastNotNull and editorMode to code

Affected metrics: status_update_duration_seconds, resource_apply_duration_seconds, resource_delete_duration_seconds.

Which issue(s) this PR fixes:

Fixes #8439

Release Notes: No

  The duration panels for status_update, resource_apply, and resource_delete
  metrics were showing cumulative _sum counter values instead of actual durations.

  - Avg panels: use rate(sum)/rate(count) for true average duration
  - Rename Max panels to p99: use histogram_quantile(0.99, rate(bucket))
  - Rename Min panels to p50: use histogram_quantile(0.50, rate(bucket))
  - All panels: use sum by(kind) for consistent grouping and lastNotNull calc

  Fixes envoyproxy#8439

Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com>
@felipesabadini felipesabadini requested a review from a team as a code owner March 15, 2026 16:58
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 15, 2026

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit c311d86
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/69e9a4a542fa800008ad0d66
😎 Deploy Preview https://deploy-preview-8528--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.40%. Comparing base (669d714) to head (c311d86).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8528      +/-   ##
==========================================
- Coverage   74.41%   74.40%   -0.01%     
==========================================
  Files         246      246              
  Lines       39194    39194              
==========================================
- Hits        29167    29164       -3     
- Misses       8008     8010       +2     
- Partials     2019     2020       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@arkodg arkodg added this to the v1.8.0-rc.1 Release milestone Mar 23, 2026
@jukie
Copy link
Copy Markdown
Contributor

jukie commented Apr 8, 2026

@felipesabadini can you run make gen-check and commit the changes? This LGTM but is failing CI

jukie added 3 commits April 22, 2026 21:45
Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>
Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>
@jukie jukie requested review from a team April 23, 2026 04:49
@jukie
Copy link
Copy Markdown
Contributor

jukie commented Apr 23, 2026

/retest

@zirain zirain merged commit 9d919e5 into envoyproxy:main Apr 23, 2026
58 of 61 checks passed
skos-ninja pushed a commit to skos-ninja/envoy-gateway that referenced this pull request May 1, 2026
…nvoyproxy#8528)

* fix: correct duration dashboard panels to use proper PromQL queries
  The duration panels for status_update, resource_apply, and resource_delete
  metrics were showing cumulative _sum counter values instead of actual durations.

  - Avg panels: use rate(sum)/rate(count) for true average duration
  - Rename Max panels to p99: use histogram_quantile(0.99, rate(bucket))
  - Rename Min panels to p50: use histogram_quantile(0.50, rate(bucket))
  - All panels: use sum by(kind) for consistent grouping and lastNotNull calc

  Fixes envoyproxy#8439

Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com>

* Update default.out.yaml

Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>

* Update e2e.out.yaml

Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>

---------

Signed-off-by: Felipe Sabadini Facina <fsabadini@hotmail.com>
Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>
Co-authored-by: Isaac Wilson <isaac.wilson514@gmail.com>
Signed-off-by: Jake Oliver <jake@truelayer.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dashboard queries for duration panels show cumulative totals instead of actual durations

4 participants