feat(grafana): importable dashboard JSON for Charon metrics by obchain · Pull Request #54 · obchain/Charon

obchain · 2026-04-22T11:39:25Z

Summary

deploy/grafana/charon.json: schema-v39 dashboard importable into Grafana 10.x or Grafana Cloud free tier
9 panels covering every metric the charon-metrics exporter actually emits — scanner, pipeline latency, position buckets, queue depth, cumulative profit, simulations, opportunity funnel, per-tx profit heatmap, build info
Templating: $datasource (Prometheus picker), $chain and $instance (auto-populated from label_values)
README gets a 3-step import section

Panels intentionally omitted (for now)

The issue body lists Mempool, Gas, and RPC-latency panels. The exporter doesn't emit the backing series yet, so the panels are NOT included — a blank panel is worse than a missing one. Follow-up PRs will ship the metrics and the panels together:

Mempool (txs/min, impacted positions) — requires wiring into charon-scanner::mempool
Gas (base fee, priority fee, tx cost cents) — requires wiring into charon-executor::gas
RPC health (latency, error rate) — requires instrumentation in the provider layer

Dashboard UID

charon-v0, tagged charon, liquidation, defi. Re-importing replaces the existing copy rather than duplicating.

Test plan

python3 -m json.tool parses cleanly
Schema version 39 (Grafana 10.4+)
Cargo workspace sweep green (no code changes that would break anything)
Manual smoke: import into Grafana Cloud with a Prometheus data source pointing at a running charon listen (next session — requires a Grafana Cloud account)

Stacked PR

Base is feat/25-foundry-fork-tests (PR #53). Merge order: #46 → #50 → #51 → #52 → #53 → this.

Closes #49.

New `deploy/grafana/charon.json` — one-click-importable dashboard covering every metric the `charon-metrics` exporter actually emits. Panels: - Scanner: blocks / sec per chain - Pipeline: block-latency p50 and p95 (from the histogram) - Scanner: live position counts stacked by bucket (green/yellow/red for healthy/near_liq/liquidatable) - Executor: queue depth stat + cumulative profit stat (USD) - Executor: simulations / min stacked by result (ok/revert/error) - Executor: opportunities queued vs dropped (per drop stage) - Executor: per-opportunity profit distribution as a heatmap - Build info: version + git_sha in a table Templating: - `$datasource` — Prometheus data source picker - `$chain` — auto-populated from `charon_scanner_blocks_total` labels - `$instance` — auto-populated from `charon_build_info` labels Aspirational panels from #49 body that are NOT included yet because the exporter doesn't emit the underlying series: - Mempool txs / min, impacted positions flagged - Gas (base fee, priority fee, tx cost in cents) - RPC latency p50/p95, error rate per endpoint These will be added as follow-up PRs alongside the metrics that feed them; shipping the panels blank would only clutter the dashboard. Dashboard UID `charon-v0` and stable tags mean re-importing replaces rather than duplicates the dashboard in Grafana. README gains a three-step import section pointing at the new file. Closes #49.

The p50/p95 block-duration quantiles already use a [5m] range, giving ~100 observations at BSC's 3s cadence — enough to keep the estimate stable between scrapes. Extend the panel description so the rationale is visible to operators reading the dashboard. Closes #279

The cumulative-profit panel previously queried the histogram _sum accumulator directly and divided by 100. _sum resets on process restart, so a mid-window redeploy rendered as a sharp step-down indistinguishable from a real loss. Switch to increase(..._sum[$__range]) / 100 so the counter-reset semantics of increase() absorb restarts and the window tracks the dashboard time picker. Title and description updated to match. Closes #276

The opportunity funnel panel aggregated queued as a single total while dropped was broken out by stage. The two series rendered on incomparable axes: one line vs four, no way to read stage-level loss rate against intake. Wrap the queued query in label_replace(..., "stage", "queued") so it joins the dropped series under one stage-partitioned legend. Colour overrides stay inert — Grafana picks from the classic palette for the five stage labels. Closes #280

Template variables $chain and $instance resolve via label_values, which returns empty until the bot is scraping. Without an explicit default, fresh imports rendered every panel as No Data. Set allValue='.*' on both, mark the current selection as All, and add a short description on each variable plus the dashboard so operators see that the All-default exists by design. Panels now resolve on first import and auto-refine once labels populate. Closes #282

Dashboard JSON is schema v39, which requires Grafana 10.4.x or newer (or any Grafana Cloud org). Earlier 9.x installs reject the import or silently drop panels. Call out the version requirement in the Grafana section so self-hosted operators running a stale Grafana do not hit a cryptic import error. Closes #278

The exporter currently binds 0.0.0.0:9091 without auth (tracked in #213 and #214). Directing operators to point a remote Prometheus at that URL bakes the exposure into the quickstart, leaking profit histograms, build SHA, queue depth, and sim results to anyone with network access — on a Hetzner VPS that is the public internet. Add a callout above the import steps: bind to 127.0.0.1 and tunnel, or put an authenticated reverse proxy in front of :9091, before configuring an external scrape. Step 1 echoes the same guidance so someone skim-reading the numbered list does not miss it. Closes #277

exclude git_sha from the build-info table transform and document the deferred mempool/gas/rpc-latency panels in the dashboard description (tracked in #300, #301, #302).

five prometheus rules covering bot-down, 1h-zero-liquidations, queue depth spike, simulation failure rate, and opportunity drop rate. load via prometheus rule_files or grafana unified alerting.

adds grafana-lint workflow: json.tool parse, dashboard-linter schema and promql check, promtool check rules for alerts.yaml. replaces the pr-test-plan claim that json.tool alone validated the dashboard.

…oard # Conflicts: # README.md

- Replace non-existent charon_scanner_positions_by_bucket reference in CharonNoLiquidationAttemptsOneHour annotation with canonical charon_scanner_positions{bucket="near_liq"|"liquidatable"} series. - Fix CharonSimulationFailureRateHigh numerator: the sim_result label values declared in charon-metrics are "ok" / "revert" / "error" — there is no "failure". Use result=~"revert|error" so the ratio is non-zero when simulations actually fail. Names and labels now match crates/charon-metrics/src/lib.rs::names and ::sim_result on main. No Rust changes.

obchain added 6 commits April 23, 2026 15:58

This was referenced Apr 23, 2026

[metrics] Add mempool monitoring series to charon-metrics exporter #300

Closed

[metrics] Add gas fee telemetry series to charon-metrics exporter #301

Closed

[metrics] Add RPC-latency series to charon-metrics exporter #302

Closed

obchain added 3 commits April 23, 2026 19:00

fix(grafana): hide git_sha in build-info until /metrics has auth

83d74c5

exclude git_sha from the build-info table transform and document the deferred mempool/gas/rpc-latency panels in the dashboard description (tracked in #300, #301, #302).

feat(grafana): alerting rules for scanner, queue, sim-fail, drop-rate

f61f72e

five prometheus rules covering bot-down, 1h-zero-liquidations, queue depth spike, simulation failure rate, and opportunity drop rate. load via prometheus rule_files or grafana unified alerting.

ci(grafana): lint dashboard json, schema, promql, alert rules

7b3259a

adds grafana-lint workflow: json.tool parse, dashboard-linter schema and promql check, promtool check rules for alerts.yaml. replaces the pr-test-plan claim that json.tool alone validated the dashboard.

obchain changed the base branch from feat/25-foundry-fork-tests to main April 24, 2026 14:46

obchain added 2 commits April 24, 2026 22:50

Merge remote-tracking branch 'origin/main' into feat/26-grafana-dashb…

d596aad

…oard # Conflicts: # README.md

obchain merged commit ae4dac5 into main Apr 24, 2026
0 of 2 checks passed

obchain mentioned this pull request Apr 25, 2026

ci: grafana-lint --strict fails on main (charon.json) — multiple pre-existing violations #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grafana): importable dashboard JSON for Charon metrics#54

feat(grafana): importable dashboard JSON for Charon metrics#54
obchain merged 12 commits intomainfrom
feat/26-grafana-dashboard

obchain commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

obchain commented Apr 22, 2026

Summary

Panels intentionally omitted (for now)

Dashboard UID

Test plan

Stacked PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant