feat(mempool): observability for the TxMempool rewrite invariants#3519
Conversation
Adds four mempool metrics that expose the load-bearing invariants of the TxMempool rewrite (#3476) without changing any behavior: - tendermint_mempool_compact_total{reason} — counts compact() invocations labeled by the triggering call site: insert_overflow (Insert above hardLimit), update (per-block recompute), or reap. Rate of reason= insert_overflow is the actual capacity-pressure signal; reason=update is per-block and benign. - tendermint_mempool_compact_duration_seconds — histogram of compact() wall-clock. compact() is O(m log m) over the full mempool with an index rebuild, so buckets extend to 10s to accommodate 100k-entry mempools under GC pressure. - tendermint_mempool_promotion_total — counts pending→ready transitions inside the inline EVM promotion loop in txStore.insert. Cosmos txs are auto-ready and not counted. - tendermint_mempool_utilisation — gauge mirroring the same ratio the CheckTx drop gate evaluates (txmp.utilisation()). Exposed directly so recording rules don't redrive total/(Size+PendingSize) in PromQL, which invites drift from the gate's own definition. Design captured in platform PR #743 (observability appendix to the mempool rewrite test coverage 1-pager). This PR is the upstream implementation of section B of that appendix; the reactor-side gossip-bytes counter is a separate followup since it requires threading *Metrics through NewReactor (no metrics field today). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Platform-engineer: - Utilisation gauge was stale after Flush() and ReapTxs(remove=true). Both paths change Size implicitly. Set Size / PendingSize / TotalTxsSizeBytes / Utilisation at end of Flush (to 0) and end of ReapTxs (to current store state, when remove=true). Same staleness existed for the older gauges — fix together. Observability-platform-engineer: - Rename compact_total label reason → trigger (Prometheus convention: "reason" connotes failure cause; "trigger" reads as caller dimension). Cheap pre-scrape, expensive after. - Add tx_type label to promotion_total. Locks EVM scope into a queryable dimension instead of metric-name lore; preserves the door for tx_type="cosmos" counts without renaming. - Extend compact_duration_seconds buckets to 20s and 30s. The metric's own docstring sets expectation of 5-10s under GC pressure — without higher buckets, p99 saturates at +Inf the moment GC bites. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Coral cross-review applied (commit
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3519 +/- ##
==========================================
- Coverage 59.03% 58.22% -0.82%
==========================================
Files 2199 2129 -70
Lines 182207 174096 -8111
==========================================
- Hits 107569 101363 -6206
+ Misses 64975 63729 -1246
+ Partials 9663 9004 -659
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Keep compact() pure — no string parameter, no metrics emission inside.
The three call sites (Insert overflow, Update recompute, Reap clear)
each time the call and emit compact_total{trigger} + compact_duration_seconds
directly. Matches the original compact() signature and reads as
"instrumentation at the call site" instead of "function knows about metrics".
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Named constants for label keys and values (labelTrigger, labelTxType, triggerInsertOverflow/Update/Reap, txTypeEVM). Cleaner call sites and one canonical place to grep for label values. - Drop the Utilisation gauge. The same value is trivially derivable as (tendermint_mempool_size + tendermint_mempool_pending_size) / (cfg.Size + cfg.PendingSize) via PromQL; not worth a dedicated metric + 4 emit sites. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Other Metrics fields use 1-2 line descriptors. Match that style. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
PR SummaryLow Risk Overview
Reviewed by Cursor Bugbot for commit bd7f503. Bugbot is set up for automated code reviews on this repo. Configure here. |
Cursor bugbot caught: compact() resets account.nextNonce=firstNonce then calls insert() for every retained tx. The inline promotion loop in insert() would then re-promote every already-ready EVM tx and re-count the metric, inflating PromotionTotal by the full ready-EVM population on every block (compact runs at least once per block via the update trigger). Add a countPromotion flag to insert() — true for the public Insert path, false for compact's internal re-insertion. PromotionTotal now reflects only new pending→ready transitions from user-driven inserts, restoring the rate(promotion_total) ≈ rate(inserted_txs) invariant claimed in the PR description. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Per Amir's review note: go-kit/kit/metrics is unmaintained (last commit
on the upstream go-kit repo is years old) and OTel is the strategic
direction. sei-db has been on OTel for some time
(sei-db/db_engine/pebbledb/mvcc/metrics.go, etc).
Pull the 3 new mempool metrics out of the go-kit Metrics struct and emit
them via otel.Meter("tendermint_mempool"). Attribute sets are package-
level vars so no allocation per emit. Emit sites use context.Background()
matching the sei-db convention (the txStore API doesn't carry ctx).
The other 22 mempool metrics remain on go-kit — they'll move in a
follow-up that migrates the whole package consistently.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
Both reviewer comments addressed: @cursor[bot] — fixed the @amir-deris — converted the 3 new metrics to OTel native (commit `105bfd2`). Following the sei-db convention ( var (
mempoolMeter = otel.Meter("tendermint_mempool")
otelMetrics = struct {
compactTotal metric.Int64Counter
compactDurationSeconds metric.Float64Histogram
promotionTotal metric.Int64Counter
}{ /* ... */ }
triggerInsertOverflowAttr = metric.WithAttributes(attribute.String("trigger", "insert_overflow"))
/* ... */
)
// Emit site:
otelMetrics.compactTotal.Add(context.Background(), 1, triggerInsertOverflowAttr)
otelMetrics.compactDurationSeconds.Record(context.Background(), time.Since(start).Seconds())Attribute sets are package-level vars so emission is allocation-free on the hot path. The existing 22 go-kit metrics in this package remain untouched — they should move to OTel in a follow-up that handles the whole package consistently rather than mixing per-metric. |
It's an indirect indicator (the rewrite's invariant 3 is directly tested
by mempool_pending_promotion_test.js in the release-test suite). The
end-to-end RPC test is the ground truth; a separate metric for the same
thing isn't pulling weight yet, and dropping it removes the
countPromotion flag plumbing from insert() entirely.
Compaction metrics (compact_total{trigger}, compact_duration_seconds)
stay — they expose the rewrite's actual new failure mode (O(m log m)
prune + rebuild) which we don't have a corresponding test for.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
| } | ||
| start := time.Now() | ||
| s.compact(inner, true) | ||
| otelMetrics.compactTotal.Add(context.Background(), 1, triggerUpdateAttr) |
There was a problem hiding this comment.
Taking a follow-up to properly pass context in
observability-platform-engineer caught: the OTel→Prometheus exporter in sei-chain registers with namespace="sei-chain" (utils/metrics/metrics_util.go:20). The meter name "tendermint_mempool" becomes the otel_scope_name label, NOT a series prefix. With the short metric names "compact_total" etc, the exported series would be sei_chain_compact_total — platform #743's recording rules expecting tendermint_mempool_compact_total would silently return no data. Prefix the metric names so the exported series is sei_chain_tendermint_mempool_compact_total — verbose but matches the existing tendermint_mempool_* discoverability (size, pending_size, etc). Same precedent as app_tx_count_total → sei_chain_app_tx_count_total. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Final Coral review round — one must-fix landedBoth experts re-reviewed at `04eee9659`. Platform-engineer returned "ready"; observability caught a real blocker. Naming mismatch — fixed in `a36521af`. The OTel→Prom exporter in sei-chain registers with `namespace="sei-chain"` (`utils/metrics/metrics_util.go`), so the meter name ("tendermint_mempool") becomes the `otel_scope_name` label, not a series prefix. With the short names `compact_total` / `compact_duration_seconds`, the exported series would have been `sei_chain_compact_total` — the platform PR #743's recording rules expecting `tendermint_mempool_compact_total` would have silently returned no data. Fix: prefixed the metric names with `tendermint_mempool_` so the exported series is `sei_chain_tendermint_mempool_compact_total`. Matches the existing precedent (`sei_chain_app_tx_count_total` from `meter="app"`, metric="tx_count"`) and sits alongside the existing go-kit `tendermint_mempool_*` metrics in queries. Platform appendix in sei-protocol/platform#743 updated in parallel (`f39756b`) to reference `sei_chain_tendermint_mempool_*` in PromQL. Other findings — all green:
Ready for human review. |
- Replace local must[V any] helper with libs/utils.OrPanic1 - Switch compact_duration_seconds buckets to stdprometheus. ExponentialBucketsRange(0.001, 30, 14) instead of a hand-tuned list - Keep timing at the 3 call sites (rather than moving inside compact() to dedupe) so compact() stays pure and the per-trigger metric emission remains explicit at the caller Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
@pompon0 — addressed in `bd7f50392`:
|
| metric.WithDescription("Number of compact() invocations, labeled by call site (insert_overflow, update, reap)."), | ||
| )), | ||
| compactDurationSeconds: utils.OrPanic1(mempoolMeter.Float64Histogram( | ||
| "tendermint_mempool_compact_duration_seconds", |
There was a problem hiding this comment.
It probably doesn't need _seconds in the name. That will be added by default.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bd7f503. Configure here.
| start := time.Now() | ||
| s.compact(inner, false) | ||
| otelMetrics.compactTotal.Add(context.Background(), 1, triggerInsertOverflowAttr) | ||
| otelMetrics.compactDurationSeconds.Record(context.Background(), time.Since(start).Seconds()) |
There was a problem hiding this comment.
Duration histogram missing trigger attribute at all call sites
Medium Severity
The compactDurationSeconds.Record() calls omit the trigger attribute (triggerInsertOverflowAttr, triggerUpdateAttr, triggerReapAttr) that is passed to compactTotal.Add(). All three call sites record duration into an unlabeled histogram, making it impossible to break down compact latency by trigger. This defeats the stated justification for keeping timing outside compact() ("The dimension of the metric depends on where this is called").
Additional Locations (2)
Reviewed by Cursor Bugbot for commit bd7f503. Configure here.


Adds two metrics that expose the TxMempool rewrite's compaction behavior without changing logic:
tendermint_mempool_compact_total{trigger}andtendermint_mempool_compact_duration_seconds. Thetriggerlabel distinguishes the three call sites ofcompact()(insert_overflow,update,reap); rate-of-insert_overflowis the capacity-pressure signal. Emitted via OpenTelemetry (otel.Meter("tendermint_mempool")) following the sei-db convention.Companion to platform sei-protocol/platform#743 (recording rules, dashboard panels, alerts). The reactor-side gossip-bytes counter and any pending→ready promotion signal are separate followups — the latter is already directly tested by
mempool_pending_promotion_test.jsin the release-test suite, so a metric for it isn't yet pulling weight.Metric rationale
These metrics let us answer four questions about the rewrite that we couldn't ask cleanly before.
What each metric is for, when stress-testing
compact_total{trigger="insert_overflow"}is the capacity-pressure signal. When this counter ticks, the mempool was momentarily overhardLimit(= 2× softLimit). A non-zero rate means ingress is outpacing block consumption — the only operational question is whether admission control is supposed to be catching it. If bothcheck_tx_met_drop_utilisation_thresholdandinsert_overfloware rising, pressure escaped the gate.compact_total{trigger="update"}is the block heartbeat. It fires once per Update, so its rate is a block-rate proxy. If it flatlines while seiload is still pushing, the consensus loop is stuck — different signal fromblock_height_deltabecause it's measured at the mempool, not the indexer.compact_total{trigger="reap"}is this node's proposer share. Validators that propose blocks Reap; followers don't. Useful for telling "is this node doing block-proposal work?" without consulting Tendermint internals.compact_duration_secondsis where we'll catch the rewrite's real failure mode.compact()is O(m log m) over the full mempool. As load grows, this latency grows. The number we'll watch israte(compact_total[5m]) × avg(compact_duration_seconds)— wall-time-fraction spent inside compact. Once that approaches ~50%, compact is now blocking block production. The histogram lets us see this coming long before p99 actually hits block-interval (~200ms).The composite questions only these together can answer
rate(insert_overflow) / rate(inserted_txs)should stay sub-linear as throughput rises. If the ratio goes to 1, the 2× hardLimit amortization broke and we're back to per-insert prune.compact_duration_p99rising independently ofinsert_overflowrate means individual compacts got expensive (likely GC pressure on 100k+ entries); both rising means we're pruning more often AND each prune is slower — the failure mode.rate(check_tx_met_drop_utilisation_threshold) / rate(insert_overflow)— if it's high, the gate is absorbing the spike. If it's near zero while overflow is high, the gate's threshold is mistuned.What we're still blind to
These metrics don't tell us why a compact was slow (sort vs. map-rebuild vs. GC) — that needs pprof or per-phase timing inside
compact(). Per-peer gossip bandwidth is the next PR. The pending→ready promotion behavior is directly tested bymempool_pending_promotion_test.js; a continuous prod signal for it can be added later if operational need surfaces.Test plan
go build ./...cleango test ./internal/mempool/ -count=1passesReferences: design doc sei-protocol/platform#743, TxMempool rewrite #3476.
🤖 Generated with Claude Code