feat(grafana): importable dashboard JSON for Charon metrics#54
Merged
feat(grafana): importable dashboard JSON for Charon metrics#54
Conversation
New `deploy/grafana/charon.json` — one-click-importable dashboard covering every metric the `charon-metrics` exporter actually emits. Panels: - Scanner: blocks / sec per chain - Pipeline: block-latency p50 and p95 (from the histogram) - Scanner: live position counts stacked by bucket (green/yellow/red for healthy/near_liq/liquidatable) - Executor: queue depth stat + cumulative profit stat (USD) - Executor: simulations / min stacked by result (ok/revert/error) - Executor: opportunities queued vs dropped (per drop stage) - Executor: per-opportunity profit distribution as a heatmap - Build info: version + git_sha in a table Templating: - `$datasource` — Prometheus data source picker - `$chain` — auto-populated from `charon_scanner_blocks_total` labels - `$instance` — auto-populated from `charon_build_info` labels Aspirational panels from #49 body that are NOT included yet because the exporter doesn't emit the underlying series: - Mempool txs / min, impacted positions flagged - Gas (base fee, priority fee, tx cost in cents) - RPC latency p50/p95, error rate per endpoint These will be added as follow-up PRs alongside the metrics that feed them; shipping the panels blank would only clutter the dashboard. Dashboard UID `charon-v0` and stable tags mean re-importing replaces rather than duplicates the dashboard in Grafana. README gains a three-step import section pointing at the new file. Closes #49.
This was referenced Apr 22, 2026
The p50/p95 block-duration quantiles already use a [5m] range, giving ~100 observations at BSC's 3s cadence — enough to keep the estimate stable between scrapes. Extend the panel description so the rationale is visible to operators reading the dashboard. Closes #279
The cumulative-profit panel previously queried the histogram _sum accumulator directly and divided by 100. _sum resets on process restart, so a mid-window redeploy rendered as a sharp step-down indistinguishable from a real loss. Switch to increase(..._sum[$__range]) / 100 so the counter-reset semantics of increase() absorb restarts and the window tracks the dashboard time picker. Title and description updated to match. Closes #276
The opportunity funnel panel aggregated queued as a single total while dropped was broken out by stage. The two series rendered on incomparable axes: one line vs four, no way to read stage-level loss rate against intake. Wrap the queued query in label_replace(..., "stage", "queued") so it joins the dropped series under one stage-partitioned legend. Colour overrides stay inert — Grafana picks from the classic palette for the five stage labels. Closes #280
Template variables $chain and $instance resolve via label_values, which returns empty until the bot is scraping. Without an explicit default, fresh imports rendered every panel as No Data. Set allValue='.*' on both, mark the current selection as All, and add a short description on each variable plus the dashboard so operators see that the All-default exists by design. Panels now resolve on first import and auto-refine once labels populate. Closes #282
Dashboard JSON is schema v39, which requires Grafana 10.4.x or newer (or any Grafana Cloud org). Earlier 9.x installs reject the import or silently drop panels. Call out the version requirement in the Grafana section so self-hosted operators running a stale Grafana do not hit a cryptic import error. Closes #278
The exporter currently binds 0.0.0.0:9091 without auth (tracked in #213 and #214). Directing operators to point a remote Prometheus at that URL bakes the exposure into the quickstart, leaking profit histograms, build SHA, queue depth, and sim results to anyone with network access — on a Hetzner VPS that is the public internet. Add a callout above the import steps: bind to 127.0.0.1 and tunnel, or put an authenticated reverse proxy in front of :9091, before configuring an external scrape. Step 1 echoes the same guidance so someone skim-reading the numbered list does not miss it. Closes #277
This was referenced Apr 23, 2026
five prometheus rules covering bot-down, 1h-zero-liquidations, queue depth spike, simulation failure rate, and opportunity drop rate. load via prometheus rule_files or grafana unified alerting.
adds grafana-lint workflow: json.tool parse, dashboard-linter schema and promql check, promtool check rules for alerts.yaml. replaces the pr-test-plan claim that json.tool alone validated the dashboard.
…oard # Conflicts: # README.md
- Replace non-existent charon_scanner_positions_by_bucket reference in
CharonNoLiquidationAttemptsOneHour annotation with canonical
charon_scanner_positions{bucket="near_liq"|"liquidatable"} series.
- Fix CharonSimulationFailureRateHigh numerator: the sim_result label
values declared in charon-metrics are "ok" / "revert" / "error" —
there is no "failure". Use result=~"revert|error" so the ratio is
non-zero when simulations actually fail.
Names and labels now match crates/charon-metrics/src/lib.rs::names
and ::sim_result on main. No Rust changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
deploy/grafana/charon.json: schema-v39 dashboard importable into Grafana 10.x or Grafana Cloud free tiercharon-metricsexporter actually emits — scanner, pipeline latency, position buckets, queue depth, cumulative profit, simulations, opportunity funnel, per-tx profit heatmap, build info$datasource(Prometheus picker),$chainand$instance(auto-populated from label_values)Panels intentionally omitted (for now)
The issue body lists Mempool, Gas, and RPC-latency panels. The exporter doesn't emit the backing series yet, so the panels are NOT included — a blank panel is worse than a missing one. Follow-up PRs will ship the metrics and the panels together:
charon-scanner::mempoolcharon-executor::gasDashboard UID
charon-v0, taggedcharon,liquidation,defi. Re-importing replaces the existing copy rather than duplicating.Test plan
python3 -m json.toolparses cleanlycharon listen(next session — requires a Grafana Cloud account)Stacked PR
Base is
feat/25-foundry-fork-tests(PR #53). Merge order: #46 → #50 → #51 → #52 → #53 → this.Closes #49.