Skip to content

Reduce metrics cardinality, bookie consistency check, antithesis assertions#455

Merged
gorbak25 merged 2 commits intomainfrom
gorbak/alerts-assertions
Mar 31, 2026
Merged

Reduce metrics cardinality, bookie consistency check, antithesis assertions#455
gorbak25 merged 2 commits intomainfrom
gorbak/alerts-assertions

Conversation

@gorbak25
Copy link
Copy Markdown
Contributor

@gorbak25 gorbak25 commented Mar 31, 2026

  • Introduces a new admin command (CheckBookieConsistency) that rebuilds bookie state from DB via BookedVersions::load_all_from_conn and compares it against in-memory bookie state to detect consistency drift.
  • Runs the corrosion healtcheck on antithesis. This should catch sync performance degradation in antithesis.
  • Changes corro.db.gaps.sum to bucket by peer_state instead of actor_id to avoid high-cardinality series.
  • Introduces corro.db.buffered.changes.v2.total and corro.db.buffered.changes.v2.oldest_age_seconds for buffered-change visibility with bounded cardinality. The metric got renamed so the old metrics don't slow down grafana.
  • Removes high-cardinality transport per-address metrics; keeps delta tracking and logs only increases for cwnd, congestion_events, and black_holes_detected.
  • Buckets transport/connect/send metrics by traffic class (sync, broadcast, foca) and wires traffic tagging through open_bi, send_uni, and send_datagram. This way we can distinguish if sync or broadcast uses up more traffic. This tagging relies on the assumption that:
    • open_bi is used primarily in sync
    • send_datagram by FOCA/SWIM
    • send_uni by broadcast traffic
  • Adds v2 to the name of some metrics to distinguish the old high cardinality series from the new low cardinality ones
  • Introduces corro.transport.rtt.v2.seconds which is an histogram of the rtt of peers

…ed changes. Admin command for bookie consistency check
@gorbak25 gorbak25 requested a review from somtochiama March 31, 2026 17:29
@gorbak25 gorbak25 force-pushed the gorbak/alerts-assertions branch from f16f6c8 to ce660be Compare March 31, 2026 17:34
Copy link
Copy Markdown
Contributor

@somtochiama somtochiama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @gorbak25 !

@gorbak25 gorbak25 force-pushed the gorbak/alerts-assertions branch from b53f87c to 03451f8 Compare March 31, 2026 18:44
@gorbak25 gorbak25 merged commit 0571768 into main Mar 31, 2026
6 checks passed
@gorbak25 gorbak25 deleted the gorbak/alerts-assertions branch March 31, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants