feat(relay): pyrycode_relay_connected_{binaries,phones} gauges (#61)#67
Conversation
Pull-based prometheus.Collector that reads Registry.Counts() on every
scrape. No edits to registry.go — the grace-expiry pointer-identity
guard keeps the maps unchanged on a stale fire, so the gauges are
structurally unaffected (no second source of truth to keep in sync).
Label-less by design: a {server="..."} label would carry the
attacker-influenced x-pyrycode-server header value onto the metrics
surface, which threat-model § Log hygiene forbids.
Tests: live-count reflection, grace-stale-fire no-move, and a
-race-driven scraper-vs-mutator interleaving check (16 × 200 ops).
Code Review: #61Decision: PASS FindingsNone. SummaryFaithful execution of the architect's spec. The pull-based collector lives in its own file ( Security (label-gated review). The architect's spec at
Concurrency. The collector spawns no goroutines and holds no mutable state of its own. One positive deviation from the spec. AC coverage.
Ready to advance. |
Adds docs/knowledge/features/connection-count-gauges.md (evergreen design + pull-based-collector rationale), docs/knowledge/codebase/61.md (per-ticket implementation summary + lessons), updates INDEX.md, and refreshes the metrics-registry feature doc to note the first collector has landed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
What
Adds the first pair of Prometheus metrics on top of the #59 scaffolding:
pyrycode_relay_connected_binariesandpyrycode_relay_connected_phones, scalar gauges of the registry's live connection counts.Design is pull-based: a
prometheus.CollectorreadsRegistry.Counts()on every scrape, in a new fileinternal/relay/metrics_connections.go.registry.gois untouched. The grace-expiry pointer-identity guard keeps the maps unchanged on a stale fire, so the gauge is structurally unaffected — no second source of truth to keep in sync. Rationale matches the architect's spec § Why pull-based (zero registry diff, stale-fire AC satisfied structurally, no new lock-acquisition path).Gauges are label-less. A
{server="…"}label would carry the attacker-influencedx-pyrycode-serverheader value onto the metrics surface, which threat-model § Log hygiene forbids — the spec's security review § Tokens calls this out and the constructor's doc comment pins it.Issue
Closes #61.
Testing
internal/relay/metrics_connections_test.goadds three tests:TestConnectionsMetrics_ReflectLiveCounts— AC gate. Scrapes via the existingNewMetricsHandlerand substring-matchespyrycode_relay_connected_binaries 0|1etc. before/after ClaimServer / RegisterPhone / UnregisterPhone / ReleaseServer.TestConnectionsMetrics_GraceStaleFireDoesNotMoveGauge— the stale-fire AC. Arms grace, immediately reclaims via ClaimServer, sleeps past the original grace window, asserts the gauges read (1, 1) — the live state — and not (0, 0).TestConnectionsMetrics_RaceFreedom— the AC's "race register/unregister/grace-expiry against periodic gauge reads". 16 mutator goroutines × 200 ops + 1 tight-loop scraper goroutine; the-raceverdict is the assertion.make vet,make test(-race), andmake buildall clean.Architecture compliance
docs/knowledge/features/metrics-registry.md§ Seam shape): private struct in its own file, constructor takesprometheus.Registerer, registers against the relay's private registry — neverDefaultRegisterer(the existingTestMetricsRegistry_NoGlobalRegistrarLeakkeeps passing).registry.go, per the spec's design choice. No new lock-acquisition path:CollectcallsCounts(), which already holds RLock.package relay(PROJECT-MEMORY line 28), reusingfakeConnfromregistry_test.gowithout duplication./metricslistener (relay: localhost-only /metrics listener with bind-address validation #60), upgrade-attempt counters (relay: /metrics — upgrade-attempt and register-failure counters #57), frame/grace counters (relay: /metrics — frame-forwarded and grace-expiry counters #58), and thedocs/knowledge/codebase/61.md+INDEX.mdentries (documentation phase).