Skip to content

feat(cadence): retire fixed-staleness model and add per-channel observability#3242

Merged
marcusrbrown merged 2 commits into
mainfrom
feat/cadence-staleness-cleanup
May 7, 2026
Merged

feat(cadence): retire fixed-staleness model and add per-channel observability#3242
marcusrbrown merged 2 commits into
mainfrom
feat/cadence-staleness-cleanup

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

What changes

Closes out the cadence engine work. Two commits:

  1. Retire the fixed-staleness model and migrate legacy entries automatically. Removes SURVEY_STALENESS_MS and isSurveyStale (no remaining production callers after feat(cadence): add per-channel survey eligibility with deterministic jitter #3238 swapped dispatch logic to isEligibleForSurvey). Adds migrateRepoEntry that backfills discovery_channel (default 'collab') and next_survey_eligible_at (computed from last_survey_at) on legacy entries. Wired first thing into classifyTracked, idempotent on already-populated entries.
  2. Per-channel observability counters and first-survey log line. ReconcileSummary.byChannel: Record<DiscoveryChannel, ChannelStats> covers tracked, dispatched, deferred, lostAccess per channel. Engine populates tracked and lostAccess; shell populates dispatched and deferred after cap selection. New ReconcileLogger.info channel; first-survey log emits for owned and contrib entries that have never been surveyed.

Why

The fixed-30-day staleness model was the dispatch gate before #3238 introduced per-channel next_survey_eligible_at. After #3238 landed, SURVEY_STALENESS_MS was dead code with one purpose left: a transitional bridge for legacy entries on the data branch. This PR retires the bridge by automating the backfill — the first daily reconcile run after merge produces a single migration commit, every subsequent run reports summary.migrated === 0.

The observability counters answer a question operators couldn't answer before: which channel is doing the work each day? With three discovery channels feeding one repo list, "0 dispatches today" needed disambiguation between "no eligible repos in any channel" and "owned channel exhausted, contrib channel waiting". The breakdown shows up directly in the JSON output.

Migration commit shape

The first reconcile run after merge produces a commit subject like:

chore(reconcile): +0 new, 0 pending-review, 0 lost-access, 0 refreshes, +N migrated

where N is the number of legacy entries backfilled (~18 today). The +N migrated suffix only appears when the count is non-zero, so steady-state commit messages are unchanged. summary.migrated is distinct from summary.refreshed — operators can grep for migration commits and field-probe-drift commits separately.

migrateRepoEntry respects whichever discovery_channel an entry already carries: an owned-channel entry with a survey date but no eligibility uses the 14-day interval, not the 30-day default. This matters because Unit 1's loose-then-tight rollout means some entries already have discovery_channel set without next_survey_eligible_at.

byChannel breakdown contract

{
  "byChannel": {
    "collab": {"tracked": 18, "dispatched": 6, "deferred": 0, "lostAccess": 0},
    "owned":  {"tracked": 3,  "dispatched": 1, "deferred": 0, "lostAccess": 0},
    "contrib": {"tracked": 0, "dispatched": 0, "deferred": 0, "lostAccess": 0}
  }
}

Field semantics:

  • tracked: entries currently on metadata/repos.yaml for this channel. Includes lost-access entries — they remain in the file, only their onboarding_status changes.
  • dispatched: dispatches actually fired this run, post-cap and post-rotation.
  • deferred: dispatch candidates excluded by the per-run cap. Sums to the existing top-level dispatchesDeferred across all channels.
  • lostAccess: entries that flipped to lost-access this run.

The breakdown is a strict superset of the prior ReconcileSummary shape — external JSON consumers see new fields without anything moving or disappearing.

First-survey log

When an owned or contrib entry is selected for dispatch with last_survey_at: null:

reconcile: first survey for new owned entry: fro-bot/agent
reconcile: first survey for new contrib entry: bfra-me/.github

Collab is excluded — the milestone matters less for the historical channel. Owned and contrib are signal events: owned because the entry was auto-discovered, contrib because the operator's allowlist trust signal was honored.

What's NOT in this PR

  • Schema tightening for discovery_channel/next_survey_eligible_at from optional to required (deferred to a follow-up after one daily reconcile cycle confirms data is fully migrated).
  • Per-day quota model for true continuous wiki growth — the structural answer to "most days are zero-dispatch" at N=17 collab repos. Deferred to a future plan; trigger remains "30 days under the threshold model show >7 consecutive zero-dispatch days".
  • Tunable per-channel intervals via env vars. Constants stay hardcoded for now; promotion is a small follow-up after observing real cadence.

Test scenarios covered

399/399 tests passing (was 386 on main; +13 net new across both commits).

Area Coverage
migrateRepoEntry idempotency 5 tests (both fields missing, only one missing, malformed last_survey_at, null last_survey_at, already populated)
Migration integration through reconcileRepos 3 tests (single legacy entry, post-migration no-op preserving referential equality, 18-entry migration commit)
formatCommitMessage formats 3 tests (steady-state, migration-only, mixed counters)
byChannel engine population 3 tests (tracked counting per channel, lost-access counting, newcomers per channel)
Shell observability 4 tests (first-survey for owned, NOT for collab, NOT for re-survey, dispatched/deferred after cap)

Plan reference

Implements Units 4 and 5 of docs/plans/2026-05-05-001-feat-survey-cadence-and-multi-channel-discovery-plan.md. Plan checkboxes flipped in this PR. With this merge, the plan's implementation phase is complete — outstanding work is deferred to follow-up plans (schema tightening, per-day quota model, env-var overrides).

…s automatically

After Unit 2 ships per-channel eligibility (next_survey_eligible_at) and Unit 3
adds owned/contrib discovery, the fixed-30-day staleness model has no callers
left in the dispatch path. This unit removes its remnants and lands the one-time
migration that backfills the new schema fields on legacy entries.

Removes:
- SURVEY_STALENESS_MS constant and isSurveyStale function. Last production
  reference cleared during Unit 2's dispatch logic swap; only test code held
  references after that.
- 5 isSurveyStale boundary tests.

Adds:
- migrateRepoEntry(entry, _now): pure helper that backfills missing
  discovery_channel (default 'collab') and missing next_survey_eligible_at
  (computed from last_survey_at via computeNextEligibleAt, using whichever
  channel the entry already carries — so an owned-channel entry with a survey
  date but no eligibility uses the 14-day interval, not 30). Returns the input
  by reference when both fields are populated, so steady-state runs neither
  allocate nor bump the migrated counter.
- summary.migrated counter on ReconcileSummary, distinct from refreshed so
  operators can tell one-time migration commits from steady-state field-probe
  drift. The first post-rollout daily reconcile produces a single commit
  carrying summary.migrated === N (one-shot for all legacy entries); subsequent
  runs report 0.
- formatCommitMessage now exported, with a conditional `+N migrated` suffix that
  only appears when summary.migrated > 0. Steady-state commit messages keep the
  existing format unchanged so log-grepping behavior is preserved.
- isNoOp and planHasChanges both consider summary.migrated, so a migration-only
  run is correctly classified as a real commit (the migration must persist).

migrateRepoEntry is wired into classifyTracked first thing — every tracked
entry passes through it before access-list classification. The mutator-closure
409-retry pattern is preserved because migrateRepoEntry is pure and idempotent.

Verification: `grep -rn "SURVEY_STALENESS_MS|isSurveyStale" scripts/` returns
zero matches.
…g line

Operators reading the daily reconcile JSON output now get a per-channel
breakdown so they can answer "how much did each channel contribute today?"
without aggregating manually.

Adds:
- ChannelStats interface (tracked, dispatched, deferred, lostAccess) and
  ReconcileSummary.byChannel: Record<DiscoveryChannel, ChannelStats>. Engine
  populates tracked (post-classification sweep over nextRepos) and lostAccess
  (at the two flip sites in classifyTracked). Shell populates dispatched and
  deferred after cap selection so the values reflect what actually ran this
  run, not just what was eligible.
- ReconcileLogger.info field for non-warn structured output. Default impl
  writes to stdout (warn continues to write to stderr). Extended in baseParams
  test fixture so existing tests need no changes.
- First-survey-of-channel info log: when an owned or contrib entry with
  last_survey_at: null is selected for dispatch, runDispatches emits
  `reconcile: first survey for new <channel> entry: <owner>/<repo>`. Collab is
  intentionally excluded — the milestone matters less for the historical
  channel that's been around since the original collab-only design. Owned
  entries are auto-discovered, so seeing them flow through to first survey is
  meaningful; contrib entries are operator-curated, so first-survey is the
  moment the trust signal was honored.

The byChannel breakdown coexists with the existing top-level counters (added,
pendingReview, refreshed, migrated, etc.) — those stay global because their
semantics don't break down cleanly per channel and the plan doesn't request it.
External JSON consumers see the byChannel field as a strict superset of the
prior shape, so the addition is non-breaking.
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structural extraction here is clean. Two focused commits that close out the cadence engine work: retire the fixed-staleness model with an automated migration path, then wire per-channel observability counters. Both changes land inside the existing pure-engine boundary without expanding the I/O surface.

Verdict: PASS

Blocking issues

None

Non-blocking concerns

migrateRepoEntry has an unused _now: Date parameter (reconcile-repos.ts:466). The function body never references it — all date computation derives from entry.last_survey_at and computeNextEligibleAt. The underscore prefix signals intent-to-ignore, but the parameter shouldn't exist at all. Low-risk dead code; clean it up in a follow-up.

Missing tests

None. The 13 net new tests cover the declared scenarios comprehensively:

  • migrateRepoEntry idempotency (5 tests: both fields missing, one field missing, malformed date, null survey, already populated)
  • Migration integration through reconcileRepos (3 tests: first-run migration, post-migration no-op with referential equality, 18-entry batch)
  • formatCommitMessage (3 tests: steady-state, migration-only, mixed)
  • byChannel engine counters (3 tests: tracked per channel, lost-access per channel, newcomers per channel)
  • Shell observability (4 tests: first-survey for owned, NOT for collab, NOT for re-survey, dispatched/deferred after cap)

The isEligibleForSurvey boundary tests (null, malformed, inclusive boundary, one-day-before, one-day-after) were already present and still pass against the new dispatch gate wiring.

Risk assessment: LOW

The migration path (migrateRepoEntry) is pure, idempotent, and runs inside the existing 409-retry mutator closure. The summary.migrated counter is distinct from summary.refreshed, so operators can grep for the one-time migration commit without confusion. The byChannel shape is additive-only — existing JSON consumers see new fields without anything moving. The isNoOp check correctly includes summary.migrated === 0 so migration-only runs produce a commit. The first-survey log line is scoped to owned+contrib (collab excluded per design doc) and only fires on last_survey_at === null, not on re-surveys. Dead code removal (SURVEY_STALENESS_MS, isSurveyStale) is confirmed clean — grep returns zero matches.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 25474006106
Cache hit
Session ses_1ff92131fffehuY1xPIUbIc04u

@marcusrbrown marcusrbrown merged commit 9df0bb0 into main May 7, 2026
12 checks passed
@marcusrbrown marcusrbrown deleted the feat/cadence-staleness-cleanup branch May 7, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants