Skip to content

[grafana] Dashboard quantile panels return NaN/empty due to broken histogram buckets (PR #50 #217 #218 unresolved) #275

@obchain

Description

@obchain

Refs #54

Location

deploy/grafana/charon.json — Panel 2 (block_duration_seconds p50/p95), Panel 8 (profit_usd_cents heatmap)

Problem

PR #50 shipped charon_pipeline_block_duration_seconds and charon_executor_profit_usd_cents as Prometheus histograms using default Prometheus buckets (0.005s, 0.01s, ..., 10.0s).

Issues #217 and #218 flagged these buckets as incorrect for their domains:

  • block_duration_seconds on BSC has a natural value of ~3s. Default bucket upper bounds top out at 10s — almost all observations pile into the +Inf bucket, making p50/p95 quantiles return +Inf or NaN.
  • profit_usd_cents observations for real Venus liquidations range from ~5 (0.05 USD) to 5,000,000+ (50,000 USD). The default buckets (0.005 to 10.0) are off by a factor of 10,000+ — every observation lands in +Inf.

This PR ships a dashboard that bakes in both broken panels as official tooling. On first import, Panel 2 will display NaN for all quantiles and Panel 8 (heatmap) will render empty.

Impact

An operator importing this dashboard will see broken panels immediately. There is no in-panel indication that the backing histograms have misconfigured buckets. This is a silent correctness failure on the primary observability output.

Suggested Fix

This dashboard must not merge before either:

  1. The exporter bucket fix for both histograms (tracking issues [metrics] profit_usd_cents histogram uses default Prometheus buckets — useless for Venus profit range #217 and [metrics] block_duration_seconds histogram uses default buckets — mismatched for BSC 3s block time #218) is merged and in the same release, OR
  2. Panels 2 and 8 are removed from this PR with explicit tracking issues referencing this dashboard panel as a follow-up.

Shipping the dashboard before the exporter is fixed hardcodes incorrect visualization expectations into operator runbooks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinglayer:devopsCI / deploy / infra / telemetrypriority:p0-blockerBlocks the critical pathstatus:readyScoped and ready to pick uptype:featureNew capability or deliverable

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions