Skip to content

fix(triggers): revalidate work-item freshness before implementation dispatch (MNG-1053)#1378

Merged
zbigniewsobiecki merged 2 commits into
devfrom
fix/MNG-1053-implementation-freshness-gate
May 18, 2026
Merged

fix(triggers): revalidate work-item freshness before implementation dispatch (MNG-1053)#1378
zbigniewsobiecki merged 2 commits into
devfrom
fix/MNG-1053-implementation-freshness-gate

Conversation

@aaight
Copy link
Copy Markdown
Collaborator

@aaight aaight commented May 18, 2026

Summary

Adds an implementation freshness gate inside the shared worker execution pipeline so stale or duplicate implementation dispatches stop before the agent starts mutating work-item state. PM router adapters intentionally embed a pre-resolved TriggerResult for delayed/coalesced PM jobs, which means the router-level work-item lock cannot see a sibling implementation run that already completed via post-completion review chaining or a manual run, a terminal checklist an operator finished while the job sat in the PM coalesce window, or a PR that already exists for the work item. The gate runs the last-mile check using live PM + DB + GitHub state.

This addresses MNG-1053 and closes the duplicate-agent-run class observed on MNG-1013 (see MNG-1032).

What changed

src/triggers/shared/implementation-freshness-gate.ts (new)

  • Typed outcome model: dispatchable | already_implemented | active_implementation | implementation_pr_exists | needs_human_reconciliation with a human-readable message and structured evidence.
  • Narrow scope: returns dispatchable immediately when agentType !== 'implementation' or no workItemId is resolved, so review / respond-to-review / respond-to-ci / respond-to-pr-comment follow-up agents keep their existing dispatch path.
  • Live PM reload: calls provider.getChecklists(workItemId). Terminal checklists are recognised by exact heading match against "Implementation Steps" and "Acceptance Criteria" — narrow on purpose so unrelated checklists (dependencies, friction, etc.) cannot accidentally block reimplementation. Non-empty fully-complete terminal checklists return already_implemented.
  • Ownership reload: counts active same-agent runs via countActiveRuns({ projectId, workItemId, agentType: 'implementation' }) within the existing DEFAULT_STALE_RUN_THRESHOLD_MS window and inspects recent successful runs via getRunsByWorkItem.
  • PR ownership: merges PR candidates from pr_work_items + agent_runs.prUrl, de-duplicates by PR number, and verifies each through githubClient.getPR() using the existing persona-token scope established by runAgentWithCredentials. Open PRs return implementation_pr_exists; merged PRs return already_implemented; closed-unmerged PRs do not permanently block reimplementation.
  • Fail-closed semantics: PR lookup or checklist read failures combined with other credible ownership evidence return needs_human_reconciliation. A successful implementation run without a prUrl is unexpected → needs_human_reconciliation. When evidence is empty and helpers fail, the gate stays dispatchable so we never block on the gate's own bugs.
  • postFreshnessSkipNotice updates the existing ackCommentId when present, falls back to addComment, and never throws.

src/triggers/shared/agent-execution.ts

  • New runFreshnessGate(context) helper called between validateAgentExecutionLifecycle and checkPreRunBudget / persistAgentWorkItemLinks / prepareAgentExecutionLifecycle. Returns true to short-circuit; the pipeline exits normally so router cleanup releases locks without retrying.
  • The gate evaluator itself is wrapped in try/catch so a bug there falls back to dispatchable rather than blocking a legitimate run.

src/triggers/shared/agent-execution-runtime.ts and agent-execution-types.ts

  • AgentExecutionContext now carries the active PMProvider (already created for PMLifecycleManager) so the gate reuses one provider instance.

Tests

  • tests/unit/triggers/implementation-freshness-gate.test.ts (new, 23 tests): agent-type gating, terminal checklist coverage, partial / empty / unrelated checklist handling, active run blocking, open/merged/closed-unmerged PR semantics, recent-run-derived PR candidates, fail-closed branches for PR lookup failure + checklist read failure + successful run without PR, ack comment update vs fallback addComment, and never-throws guarantee for the comment poster.
  • tests/unit/triggers/agent-execution.test.ts: 8 new pipeline integration tests verifying the gate runs only for implementation+workItemId, blocks before runAgent / prepareForAgent / checkBudgetExceeded, does not invoke handleFailure/cleanupProcessing on normal skips, posts the durable skip notice, and survives evaluator errors without crashing the pipeline.

Documentation

  • docs/architecture/03-trigger-system.md: Mermaid pipeline diagram updated with the freshness gate node and a paragraph describing its scope, evidence, and uncertainty semantics.
  • docs/architecture/10-resilience.md: new "Implementation freshness gate (worker-side)" subsection under Concurrency Controls covering layering with the router-level work-item lock.
  • CLAUDE.md (and AGENTS.md via symlink): one-paragraph operator overview alongside the existing concurrency notes.

Test plan

  • npx vitest run tests/unit/triggers/implementation-freshness-gate.test.ts — 23 tests pass.
  • npx vitest run tests/unit/triggers/agent-execution.test.ts — 39 tests pass (8 new freshness-gate integration tests).
  • npx vitest run tests/unit/pm/webhook-handler.test.ts tests/unit/router/work-item-lock.test.ts tests/unit/router/agent-type-lock.test.ts — existing concurrency guards stay green.
  • npx vitest run tests/unit/api/router.test.ts tests/unit/api/routers/projects.test.ts — appRouter import chain unchanged.
  • npx vitest run tests/unit/architecture-docs.test.ts — doc drift guards stay green after the doc updates.
  • npm test — full unit suite 9712 passed / 28 skipped.
  • npm run typecheck — clean.
  • npm run lint — clean (only preexisting non-fixable warnings remain).

🕵️ claude-code · claude-opus-4-7 · run details

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 87.25490% with 52 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...c/triggers/shared/implementation-freshness-gate.ts 85.75% 50 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Requesting changes: the freshness gate is in the right place, but two traced paths still allow incorrect dispatch behavior.

Code Issues

Blocking

  • src/triggers/shared/implementation-freshness-gate.ts:359 - PR verification assumes the runAgentWithCredentials GitHub token scope, but the shared pipeline is also used by dashboard manual/retry jobs through triggerManualRun, which only establishes PM credentials. For an implementation retry with a recent successful run and PR URL, githubClient.getPR() throws No GitHub client in scope; the gate then returns needs_human_reconciliation, blocking manual reimplementation even when the PR is closed-unmerged. If the only signal is pr_work_items, the same missing token leaves haveOtherEvidence false and the existing open PR is ignored.
  • src/triggers/shared/implementation-freshness-gate.ts:481 - Checklist read failures fall open when there is no separate DB/PR evidence. A stale PM job whose live Implementation Steps checklist was completed while queued, but whose getChecklists() call transiently fails before any PR/run evidence exists, returns dispatchable and starts the duplicate implementation. The work item explicitly required checklist read uncertainty to fail closed before starting from stale context.

🕵️ codex · gpt-5.5 · run details

let inspectedPRs: InspectedPR[] = [];
let erroredPRs: CandidatePR[] = [];
if (input.project.repo && candidates.length > 0) {
const verification = await inspectPullRequests(input.project.repo, candidates);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes every caller of the shared pipeline is already inside withGitHubToken, but dashboard manual/retry jobs call runAgentExecutionPipeline through triggerManualRun with only PM credentials. In that path githubClient.getPR() throws No GitHub client in scope; with a recent successful run plus PR URL the gate turns that into needs_human_reconciliation, so an intentional manual reimplementation is blocked even if the PR is closed-unmerged. With only a pr_work_items candidate, haveOtherEvidence is false and an existing open PR can be missed. Please either guarantee GitHub scope for all implementation pipeline callers or avoid using the scoped singleton from paths that cannot satisfy it.

}
}

if (!signals.checklistsResult && otherSignalsBesidesChecklist) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still dispatches when getChecklists() fails and there is no other DB/PR evidence. That misses the core stale-checklist case: a PM coalesced implementation job can sit in the queue, a human completes Implementation Steps, and then a transient PM read failure makes checklistsResult null with no other signals yet, so the duplicate implementation starts. The work item called for checklist read uncertainty to fail closed before dispatching from stale context.

@zbigniewsobiecki
Copy link
Copy Markdown
Member

@aaight address code review concerns

@aaight
Copy link
Copy Markdown
Collaborator Author

aaight commented May 18, 2026

@zbigniewsobiecki addressed the review concerns and pushed 7d6ca161.

  • PR verification now resolves and scopes the implementer GitHub token inside the freshness gate, so manual/retry pipeline callers do not rely on ambient withGitHubToken scope.
  • Checklist read uncertainty now fails closed even without other DB/PR evidence, and unverified linked PR candidates also fail closed.
  • Added regression coverage for closed-unmerged run-derived PRs, pr_work_items verification failures, and checklist read failures without other evidence.

Verification: npm run typecheck, npm run lint, npx vitest run tests/unit/triggers/implementation-freshness-gate.test.ts, npm test, plus the pre-push changed-test hook.

🕵️ codex · gpt-5.5 · run details

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the follow-up commit resolves the prior blockers: PR verification now establishes the implementation GitHub persona scope itself, checklist read uncertainty fails closed, and the pipeline short-circuits before budget checks, work-item persistence, lifecycle mutation, or agent startup.

🕵️ codex · gpt-5.5 · run details

@zbigniewsobiecki zbigniewsobiecki merged commit 0ca616f into dev May 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants