feat(cadence): retire fixed-staleness model and add per-channel observability#3242
Conversation
…s automatically After Unit 2 ships per-channel eligibility (next_survey_eligible_at) and Unit 3 adds owned/contrib discovery, the fixed-30-day staleness model has no callers left in the dispatch path. This unit removes its remnants and lands the one-time migration that backfills the new schema fields on legacy entries. Removes: - SURVEY_STALENESS_MS constant and isSurveyStale function. Last production reference cleared during Unit 2's dispatch logic swap; only test code held references after that. - 5 isSurveyStale boundary tests. Adds: - migrateRepoEntry(entry, _now): pure helper that backfills missing discovery_channel (default 'collab') and missing next_survey_eligible_at (computed from last_survey_at via computeNextEligibleAt, using whichever channel the entry already carries — so an owned-channel entry with a survey date but no eligibility uses the 14-day interval, not 30). Returns the input by reference when both fields are populated, so steady-state runs neither allocate nor bump the migrated counter. - summary.migrated counter on ReconcileSummary, distinct from refreshed so operators can tell one-time migration commits from steady-state field-probe drift. The first post-rollout daily reconcile produces a single commit carrying summary.migrated === N (one-shot for all legacy entries); subsequent runs report 0. - formatCommitMessage now exported, with a conditional `+N migrated` suffix that only appears when summary.migrated > 0. Steady-state commit messages keep the existing format unchanged so log-grepping behavior is preserved. - isNoOp and planHasChanges both consider summary.migrated, so a migration-only run is correctly classified as a real commit (the migration must persist). migrateRepoEntry is wired into classifyTracked first thing — every tracked entry passes through it before access-list classification. The mutator-closure 409-retry pattern is preserved because migrateRepoEntry is pure and idempotent. Verification: `grep -rn "SURVEY_STALENESS_MS|isSurveyStale" scripts/` returns zero matches.
…g line Operators reading the daily reconcile JSON output now get a per-channel breakdown so they can answer "how much did each channel contribute today?" without aggregating manually. Adds: - ChannelStats interface (tracked, dispatched, deferred, lostAccess) and ReconcileSummary.byChannel: Record<DiscoveryChannel, ChannelStats>. Engine populates tracked (post-classification sweep over nextRepos) and lostAccess (at the two flip sites in classifyTracked). Shell populates dispatched and deferred after cap selection so the values reflect what actually ran this run, not just what was eligible. - ReconcileLogger.info field for non-warn structured output. Default impl writes to stdout (warn continues to write to stderr). Extended in baseParams test fixture so existing tests need no changes. - First-survey-of-channel info log: when an owned or contrib entry with last_survey_at: null is selected for dispatch, runDispatches emits `reconcile: first survey for new <channel> entry: <owner>/<repo>`. Collab is intentionally excluded — the milestone matters less for the historical channel that's been around since the original collab-only design. Owned entries are auto-discovered, so seeing them flow through to first survey is meaningful; contrib entries are operator-curated, so first-survey is the moment the trust signal was honored. The byChannel breakdown coexists with the existing top-level counters (added, pendingReview, refreshed, migrated, etc.) — those stay global because their semantics don't break down cleanly per channel and the plan doesn't request it. External JSON consumers see the byChannel field as a strict superset of the prior shape, so the addition is non-breaking.
fro-bot
left a comment
There was a problem hiding this comment.
The structural extraction here is clean. Two focused commits that close out the cadence engine work: retire the fixed-staleness model with an automated migration path, then wire per-channel observability counters. Both changes land inside the existing pure-engine boundary without expanding the I/O surface.
Verdict: PASS
Blocking issues
None
Non-blocking concerns
migrateRepoEntry has an unused _now: Date parameter (reconcile-repos.ts:466). The function body never references it — all date computation derives from entry.last_survey_at and computeNextEligibleAt. The underscore prefix signals intent-to-ignore, but the parameter shouldn't exist at all. Low-risk dead code; clean it up in a follow-up.
Missing tests
None. The 13 net new tests cover the declared scenarios comprehensively:
migrateRepoEntryidempotency (5 tests: both fields missing, one field missing, malformed date, null survey, already populated)- Migration integration through
reconcileRepos(3 tests: first-run migration, post-migration no-op with referential equality, 18-entry batch) formatCommitMessage(3 tests: steady-state, migration-only, mixed)byChannelengine counters (3 tests: tracked per channel, lost-access per channel, newcomers per channel)- Shell observability (4 tests: first-survey for owned, NOT for collab, NOT for re-survey, dispatched/deferred after cap)
The isEligibleForSurvey boundary tests (null, malformed, inclusive boundary, one-day-before, one-day-after) were already present and still pass against the new dispatch gate wiring.
Risk assessment: LOW
The migration path (migrateRepoEntry) is pure, idempotent, and runs inside the existing 409-retry mutator closure. The summary.migrated counter is distinct from summary.refreshed, so operators can grep for the one-time migration commit without confusion. The byChannel shape is additive-only — existing JSON consumers see new fields without anything moving. The isNoOp check correctly includes summary.migrated === 0 so migration-only runs produce a commit. The first-survey log line is scoped to owned+contrib (collab excluded per design doc) and only fires on last_survey_at === null, not on re-surveys. Dead code removal (SURVEY_STALENESS_MS, isSurveyStale) is confirmed clean — grep returns zero matches.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/.github |
| Run ID | 25474006106 |
| Cache | hit |
| Session | ses_1ff92131fffehuY1xPIUbIc04u |
What changes
Closes out the cadence engine work. Two commits:
SURVEY_STALENESS_MSandisSurveyStale(no remaining production callers after feat(cadence): add per-channel survey eligibility with deterministic jitter #3238 swapped dispatch logic toisEligibleForSurvey). AddsmigrateRepoEntrythat backfillsdiscovery_channel(default'collab') andnext_survey_eligible_at(computed fromlast_survey_at) on legacy entries. Wired first thing intoclassifyTracked, idempotent on already-populated entries.ReconcileSummary.byChannel: Record<DiscoveryChannel, ChannelStats>coverstracked,dispatched,deferred,lostAccessper channel. Engine populatestrackedandlostAccess; shell populatesdispatchedanddeferredafter cap selection. NewReconcileLogger.infochannel; first-survey log emits for owned and contrib entries that have never been surveyed.Why
The fixed-30-day staleness model was the dispatch gate before #3238 introduced per-channel
next_survey_eligible_at. After #3238 landed,SURVEY_STALENESS_MSwas dead code with one purpose left: a transitional bridge for legacy entries on thedatabranch. This PR retires the bridge by automating the backfill — the first daily reconcile run after merge produces a single migration commit, every subsequent run reportssummary.migrated === 0.The observability counters answer a question operators couldn't answer before: which channel is doing the work each day? With three discovery channels feeding one repo list, "0 dispatches today" needed disambiguation between "no eligible repos in any channel" and "owned channel exhausted, contrib channel waiting". The breakdown shows up directly in the JSON output.
Migration commit shape
The first reconcile run after merge produces a commit subject like:
where N is the number of legacy entries backfilled (~18 today). The
+N migratedsuffix only appears when the count is non-zero, so steady-state commit messages are unchanged.summary.migratedis distinct fromsummary.refreshed— operators can grep for migration commits and field-probe-drift commits separately.migrateRepoEntryrespects whicheverdiscovery_channelan entry already carries: an owned-channel entry with a survey date but no eligibility uses the 14-day interval, not the 30-day default. This matters because Unit 1's loose-then-tight rollout means some entries already havediscovery_channelset withoutnext_survey_eligible_at.byChannel breakdown contract
{ "byChannel": { "collab": {"tracked": 18, "dispatched": 6, "deferred": 0, "lostAccess": 0}, "owned": {"tracked": 3, "dispatched": 1, "deferred": 0, "lostAccess": 0}, "contrib": {"tracked": 0, "dispatched": 0, "deferred": 0, "lostAccess": 0} } }Field semantics:
tracked: entries currently onmetadata/repos.yamlfor this channel. Includeslost-accessentries — they remain in the file, only theironboarding_statuschanges.dispatched: dispatches actually fired this run, post-cap and post-rotation.deferred: dispatch candidates excluded by the per-run cap. Sums to the existing top-leveldispatchesDeferredacross all channels.lostAccess: entries that flipped tolost-accessthis run.The breakdown is a strict superset of the prior
ReconcileSummaryshape — external JSON consumers see new fields without anything moving or disappearing.First-survey log
When an owned or contrib entry is selected for dispatch with
last_survey_at: null:Collab is excluded — the milestone matters less for the historical channel. Owned and contrib are signal events: owned because the entry was auto-discovered, contrib because the operator's allowlist trust signal was honored.
What's NOT in this PR
discovery_channel/next_survey_eligible_atfrom optional to required (deferred to a follow-up after one daily reconcile cycle confirmsdatais fully migrated).Test scenarios covered
399/399 tests passing (was 386 on main; +13 net new across both commits).
migrateRepoEntryidempotencylast_survey_at, nulllast_survey_at, already populated)reconcileReposformatCommitMessageformatsbyChannelengine populationPlan reference
Implements Units 4 and 5 of
docs/plans/2026-05-05-001-feat-survey-cadence-and-multi-channel-discovery-plan.md. Plan checkboxes flipped in this PR. With this merge, the plan's implementation phase is complete — outstanding work is deferred to follow-up plans (schema tightening, per-day quota model, env-var overrides).