feat: add rollup_blocks_seen counter and last_rollup_block_seen_timestamp gauge#275
Merged
feat: add rollup_blocks_seen counter and last_rollup_block_seen_timestamp gauge#275
Conversation
… gauge Records two metrics on the ingress side of the EnvTask rollup-block subscription, labeled by host_chain_id. They advance even when the builder skips block construction, so operators can detect a silently-dead WS subscription that previously looked identical to a builder choosing not to build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
The chain-following subscription is to rollup blocks (ru_provider.subscribe_blocks), not host blocks. Rename the metric, label, and helper to match what is actually observed; the WS that can silently die is the rollup-block one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fraser999
approved these changes
May 8, 2026
| match (key.kind(), key.key().name()) { | ||
| (MetricKind::Counter, ROLLUP_BLOCKS_SEEN) => counter_value = Some(value), | ||
| (MetricKind::Gauge, LAST_ROLLUP_BLOCK_SEEN_TIMESTAMP) => gauge_value = Some(value), | ||
| _ => {} |
Contributor
There was a problem hiding this comment.
nit: we could always panic here rather than no-op
Member
Author
There was a problem hiding this comment.
yep i think this is probably preferable to silently doing nothing
Address review nit: the local recorder in record_rollup_block_seen_advances_counter_and_gauge should only ever see the two metrics under test. Silently no-op'ing on anything else hides isolation bugs. Bind the catch-all so the panic names what leaked in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Adds two metrics on the ingress side of the
EnvTaskrollup-block subscription so operators can detect a silently-dead WS subscription that previously looked identical to a builder choosing not to build:signet.builder.rollup_blocks_seen(counter) — incremented on every observed rollup-block notification before any Quincey/profitability/timestamp logicsignet.builder.last_rollup_block_seen_timestamp(gauge) — wall-clock Unix seconds at the most recent observationBoth labeled with
rollup_chain_idso operators running multiple builders against different rollups can distinguish them.The rollup-block stream (
ru_provider.subscribe_blocks) is the WS-driven heartbeat in this builder — host headers are fetched via HTTP per slot — so this is where a silently-dead WS would manifest as a stalled metric.Related Issue
Suggested alert rule
Tune
36to3 * SLOT_DURATIONfor the deployed chain. Thesignet_prefix reflects the dot-to-underscore conversion done bymetrics-exporter-prometheus.Testing
make fmtpassesmake clippypasses (with-D warnings)make testpasses (19 tests, +1 new)record_rollup_block_seen_advances_counter_and_gaugeinsrc/metrics.rsusesmetrics_util::debugging::DebuggingRecorderto verify the counter advances by N for N calls and the gauge is set to a wall-clock timestamp within the observed window, with therollup_chain_idlabel attached[Claude Code]