Skip to content

audit: flag pending/skipped/blocked/hollow tasks after execute#20

Merged
peteromallet merged 1 commit into
mainfrom
megaplan-audit-incomplete-tasks
May 5, 2026
Merged

audit: flag pending/skipped/blocked/hollow tasks after execute#20
peteromallet merged 1 commit into
mainfrom
megaplan-audit-incomplete-tasks

Conversation

@peteromallet
Copy link
Copy Markdown
Owner

Summary

Closes the audit blind spot that let Sprint 3 silently ship ~30% of its scope: `_validate_execution_evidence_code` only compared files-claimed-vs-files-in-diff and never noticed that 10 of 14 tasks were still `status=pending` after the executor died on quota.

After this PR, the audit emits a finding for any of:

  • Tasks left at `status=pending` after execute (executor never started them)
  • Tasks marked `skipped` or `blocked` with empty executor_notes (no reason recorded)
  • Tasks marked `done` with neither files_changed nor commands_run (suspiciously hollow)

Findings flow through the existing auto-driver retry path. If the executor genuinely had more to do, the chain re-dispatches execute. If nothing can advance, the chain stalls visibly on a known reason instead of producing an "audit clean" artifact for an obviously-incomplete run.

Test plan

  • `pytest tests/test_evaluation.py` — 70 passed (11 `validate_execution_evidence` cases including 5 new ones)
  • No regressions vs main (the 4 pre-existing test failures in `test_finalize`/`test_cloud_chain_status` reproduce on `main` without this PR)

🤖 Generated with Claude Code

Closes a real audit blind spot: validate_execution_evidence_code only
checked files-claimed-vs-files-in-diff and rubber-stamp executor_notes,
silently treating tasks left at status=pending after execute as fine. A
chain that lost ~70% of its sprint scope to mid-execute quota exhaustion
shipped a "clean" audit artifact this morning.

Now emit a finding for any of:

- Tasks left at `status=pending` after execute (executor never started
  them — the most common silent-scope-shrink mode).
- Tasks marked `skipped` or `blocked` with empty executor_notes (no
  reason recorded, indistinguishable from dropped-on-floor).
- Tasks marked `done` but with neither files_changed nor commands_run
  (suspiciously hollow — likely skipped without flagging).

Findings flow through the existing auto-driver retry path: if the
executor genuinely had more to do, the chain re-dispatches execute. If
nothing can advance, the chain stalls visibly on a known reason
("Tasks left pending: T7, T8, T10, T12") instead of the prior silent
silver lining of an empty audit.

Tests: 5 new cases covering pending, skipped-without-reason,
blocked-without-reason, hollow-done, and a clean-run negative case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@peteromallet peteromallet merged commit 9303d40 into main May 5, 2026
@peteromallet peteromallet deleted the megaplan-audit-incomplete-tasks branch May 5, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant