Skip to content

Quarantine stale service endpoints on bootstrap#334

Merged
kacy merged 1 commit intomainfrom
phase2-stale-endpoint-audits
Mar 27, 2026
Merged

Quarantine stale service endpoints on bootstrap#334
kacy merged 1 commit intomainfrom
phase2-stale-endpoint-audits

Conversation

@kacy
Copy link
Copy Markdown
Owner

@kacy kacy commented Mar 27, 2026

Summary:

  • quarantine durable service endpoints whose node ids no longer exist in the cluster agent table
  • rerun authoritative bootstrap after the cluster DB is attached so cluster-mode startup can drain stale rows before rebuilding DNS and compatibility state
  • expose stale-endpoint quarantine counters in rollout status and Prometheus metrics, with a test covering the boot path

Verification:

  • zig fmt on touched files
  • timeout 30 env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache zig build test -- --test-filter bootstrap quarantines stale endpoint rows for missing nodes (timed out with no emitted failures)
  • timeout 30 env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache zig build test -- --test-filter route handles /v1/status?mode=service_rollout GET (timed out with no emitted failures)
  • timeout 30 env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache zig build test -- --test-filter handleMetricsPrometheus exposes service rollout metrics (timed out with no emitted failures)

@kacy kacy merged commit c9be94c into main Mar 27, 2026
6 of 7 checks passed
@kacy kacy deleted the phase2-stale-endpoint-audits branch March 27, 2026 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant