audit: flag pending/skipped/blocked/hollow tasks after execute#20
Merged
Conversation
Closes a real audit blind spot: validate_execution_evidence_code only
checked files-claimed-vs-files-in-diff and rubber-stamp executor_notes,
silently treating tasks left at status=pending after execute as fine. A
chain that lost ~70% of its sprint scope to mid-execute quota exhaustion
shipped a "clean" audit artifact this morning.
Now emit a finding for any of:
- Tasks left at `status=pending` after execute (executor never started
them — the most common silent-scope-shrink mode).
- Tasks marked `skipped` or `blocked` with empty executor_notes (no
reason recorded, indistinguishable from dropped-on-floor).
- Tasks marked `done` but with neither files_changed nor commands_run
(suspiciously hollow — likely skipped without flagging).
Findings flow through the existing auto-driver retry path: if the
executor genuinely had more to do, the chain re-dispatches execute. If
nothing can advance, the chain stalls visibly on a known reason
("Tasks left pending: T7, T8, T10, T12") instead of the prior silent
silver lining of an empty audit.
Tests: 5 new cases covering pending, skipped-without-reason,
blocked-without-reason, hollow-done, and a clean-run negative case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the audit blind spot that let Sprint 3 silently ship ~30% of its scope: `_validate_execution_evidence_code` only compared files-claimed-vs-files-in-diff and never noticed that 10 of 14 tasks were still `status=pending` after the executor died on quota.
After this PR, the audit emits a finding for any of:
Findings flow through the existing auto-driver retry path. If the executor genuinely had more to do, the chain re-dispatches execute. If nothing can advance, the chain stalls visibly on a known reason instead of producing an "audit clean" artifact for an obviously-incomplete run.
Test plan
🤖 Generated with Claude Code