Skip to content

Phase 1: complete canonical service registry#329

Merged
kacy merged 2 commits intomainfrom
phase1-complete
Mar 26, 2026
Merged

Phase 1: complete canonical service registry#329
kacy merged 2 commits intomainfrom
phase1-complete

Conversation

@kacy
Copy link
Copy Markdown
Owner

@kacy kacy commented Mar 26, 2026

Summary

  • add the in-memory service registry state machine and runtime wrapper backed by the durable services / service_endpoints tables
  • wire the existing service registry bridge into that runtime so shadow state stays synchronized with persisted service membership and probe transitions
  • add /v1/services routes for listing services, inspecting endpoints, requesting reconcile, draining endpoints, and removing endpoints
  • round out the registry model with node-loss / node-recovery and reconcile success / failure state so Phase 1 is complete before the reconciler lands

Commits

  • cf48b8e Add service registry runtime state machine
  • 2bab36b Add service registry API routes

Verification

  • env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache timeout 30 zig build test -- --test-filter "replaceServiceEndpoints preserves observed health for matching endpoint ids|runtime bootstraps from persisted services|runtime sync preserves probe health across persisted refresh|container bridge preserves legacy DNS and shadow events|bridge can skip shadow record while preserving legacy apply" timed out after 30s with no emitted failures
  • env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache timeout 30 zig build test -- --test-filter "route handles GET /v1/services|route handles GET /v1/services/{name}|route handles GET /v1/services/{name}/endpoints|route handles POST drain and DELETE endpoint|route handles POST /v1/services/{name}/reconcile|dispatch services routing|runtime bootstraps from persisted services|runtime sync preserves probe health across persisted refresh" timed out after 30s with no emitted failures
  • env YOQ_SKIP_SLOW_TESTS=1 ZIG_GLOBAL_CACHE_DIR=.zig-global-cache ZIG_LOCAL_CACHE_DIR=.zig-local-cache timeout 30 zig build test -- --test-filter "node loss and recovery toggle endpoint eligibility|markReconcileFailed and markReconcileSucceeded update service detail|runtime can mark reconcile failure and recovery|route handles GET /v1/services|route handles POST drain and DELETE endpoint|dispatch services routing" timed out after 30s with no emitted failures

Follow-up

  • Phase 2 can now move DNS/LB programming behind the deterministic reconciler instead of using the bridge as the shadow event source
  • the bridge still persists :0 endpoint identities until node / port identity is threaded through the runtime callers

@kacy kacy merged commit 36824ed into main Mar 26, 2026
7 checks passed
@kacy kacy deleted the phase1-complete branch March 26, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant