Skip to content

feat(relay): pyrycode_relay_connected_{binaries,phones} gauges (#61)#67

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/61
May 12, 2026
Merged

feat(relay): pyrycode_relay_connected_{binaries,phones} gauges (#61)#67
ilmoniemi merged 3 commits into
mainfrom
feature/61

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Adds the first pair of Prometheus metrics on top of the #59 scaffolding: pyrycode_relay_connected_binaries and pyrycode_relay_connected_phones, scalar gauges of the registry's live connection counts.

Design is pull-based: a prometheus.Collector reads Registry.Counts() on every scrape, in a new file internal/relay/metrics_connections.go. registry.go is untouched. The grace-expiry pointer-identity guard keeps the maps unchanged on a stale fire, so the gauge is structurally unaffected — no second source of truth to keep in sync. Rationale matches the architect's spec § Why pull-based (zero registry diff, stale-fire AC satisfied structurally, no new lock-acquisition path).

Gauges are label-less. A {server="…"} label would carry the attacker-influenced x-pyrycode-server header value onto the metrics surface, which threat-model § Log hygiene forbids — the spec's security review § Tokens calls this out and the constructor's doc comment pins it.

Issue

Closes #61.

Testing

internal/relay/metrics_connections_test.go adds three tests:

  • TestConnectionsMetrics_ReflectLiveCounts — AC gate. Scrapes via the existing NewMetricsHandler and substring-matches pyrycode_relay_connected_binaries 0|1 etc. before/after ClaimServer / RegisterPhone / UnregisterPhone / ReleaseServer.
  • TestConnectionsMetrics_GraceStaleFireDoesNotMoveGauge — the stale-fire AC. Arms grace, immediately reclaims via ClaimServer, sleeps past the original grace window, asserts the gauges read (1, 1) — the live state — and not (0, 0).
  • TestConnectionsMetrics_RaceFreedom — the AC's "race register/unregister/grace-expiry against periodic gauge reads". 16 mutator goroutines × 200 ops + 1 tight-loop scraper goroutine; the -race verdict is the assertion.

make vet, make test (-race), and make build all clean.

Architecture compliance

ilmoniemi added 2 commits May 12, 2026 19:40
Pull-based prometheus.Collector that reads Registry.Counts() on every
scrape. No edits to registry.go — the grace-expiry pointer-identity
guard keeps the maps unchanged on a stale fire, so the gauges are
structurally unaffected (no second source of truth to keep in sync).

Label-less by design: a {server="..."} label would carry the
attacker-influenced x-pyrycode-server header value onto the metrics
surface, which threat-model § Log hygiene forbids.

Tests: live-count reflection, grace-stale-fire no-move, and a
-race-driven scraper-vs-mutator interleaving check (16 × 200 ops).
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #61

Decision: PASS

Findings

None.

Summary

Faithful execution of the architect's spec. The pull-based collector lives in its own file (internal/relay/metrics_connections.go), reads Registry.Counts() per scrape, and adds no diff to registry.go — exactly the design the spec picked. make vet, go test -race ./internal/relay/..., and go build ./... are all clean locally.

Security (label-gated review). The architect's spec at docs/specs/architecture/61-connected-gauges.md includes a ## Security review section with a PASS verdict — required-step verified. Diff-side goggles:

  • Gauges are label-less in code (prometheus.NewDesc(..., nil, nil) for both). The {server=\"...\"} shape the threat model forbids (would carry the attacker-influenced x-pyrycode-server header onto the metrics surface) is rejected structurally, and the constructor's doc comment pins the rule so a future refactor can't quietly add a label without first deleting the spec reference.
  • No file I/O, no subprocess, no crypto, no new network surface, no new log lines, no new dependency. MustNewConstMetric inputs are non-negative ints with no labels — the Must form can't panic on these inputs.
  • Registers against the passed prometheus.Registerer; never touches DefaultRegisterer. The existing TestMetricsRegistry_NoGlobalRegistrarLeak continues to pass after this ticket.

Concurrency. The collector spawns no goroutines and holds no mutable state of its own. Collect calls Counts() once per scrape; Counts() already takes the registry's RLock — no new lock-acquisition path. The race test (16 mutator goroutines × 200 ops + 1 tight-loop scraper) passes under -race locally.

One positive deviation from the spec. TestConnectionsMetrics_RaceFreedom adds a scrapeDone := make(chan struct{}) channel with defer close(scrapeDone) in the scraper and <-scrapeDone after close(stopScrape) in the test body. The spec's version closes stopScrape and returns immediately, which would let the scraper's t.Errorf fire after the test function exits if the non-200 branch ever triggered. The developer's tightening of the synchronisation is correct and harmless to the AC. Worth keeping.

AC coverage.

  • Two gauges defined and registered against the metrics registry from relay: adopt prometheus/client_golang and introduce metrics registry scaffolding #59.
  • Pull-based: gauge values come from Counts() at the existing register/unregister/grace-expiry-eviction sites by way of the registry's existing maps — no Inc/Dec wiring needed.
  • Concurrent register/unregister/grace-expiry sequences race-clean against periodic scrapes (TestConnectionsMetrics_RaceFreedom).
  • Grace-expiry-eviction stale-fire no-ops do not move the gauge (TestConnectionsMetrics_GraceStaleFireDoesNotMoveGauge) — structurally guaranteed because the pointer-identity guard already keeps the maps unchanged on stale fire.
  • make vet, go test -race, make build clean.
  • docs/knowledge/codebase/<n>.md + INDEX.md — deliberately deferred to the documentation phase per spec § Open questions relay: routing-envelope wrapper type — marshal, unmarshal, tests #1. Correct.

Ready to advance.

Adds docs/knowledge/features/connection-count-gauges.md (evergreen design
+ pull-based-collector rationale), docs/knowledge/codebase/61.md
(per-ticket implementation summary + lessons), updates INDEX.md, and
refreshes the metrics-registry feature doc to note the first collector
has landed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit 37b62d5 into main May 12, 2026
2 checks passed
@ilmoniemi ilmoniemi deleted the feature/61 branch May 12, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: wire pyrycode_relay_connected_{binaries,phones} gauges into registry events

1 participant