Skip to content

replay: metrics endpoint exits too fast for Prometheus to scrape #422

@obchain

Description

@obchain

Symptom

replay --block N --borrower-file … emits 4 JSON records, logs replay: complete emitted=4, and exits in ~0.5s. Prometheus default scrape interval is 15s, so panels (Queue depth, Profit predicted, simulations/min) never see replay data — the dashboard shows "No data" for every replay run, even when the bot identified $39k+ of liquidatable opportunities. Operators have to read raw stdout to verify the demo, defeating the dashboard's purpose.

Proposed fix

Add a --hold-secs <N> flag to the replay subcommand. After the one-shot pipeline completes, keep the metrics exporter alive for N seconds so at least one Prometheus scrape lands. Default to 0 (current behaviour, CI / smoke-test friendly). Operators running the Notion cheat sheet pass --hold-secs 30 for a panel-friendly demo.

Also start the metrics exporter at the top of the replay path (today it only runs under listen), so the held-open window actually serves /metrics.

Impact

Pure UX fix. The $39k of profit identified by replay against block 91323624 is invisible on Grafana today; the dashboard shows zeros / No data. With --hold-secs 30 the operator sees Queue depth = 4, Profit predicted = sum(predicted_net_profit_usd_cents), and Executor simulations/min light up, validating that the bot would actually broadcast on a real chain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions