Skip to content

Pipeline Plan 417

ezigus edited this page Apr 23, 2026 · 1 revision

I'll provide the comprehensive implementation plan directly below:


Implementation Plan: Cross-Stage Drift Detector — plan.md vs git diff

Problem Analysis

Current State

When the build loop completes, the pipeline immediately enters stage_review() function. However, there is no verification that the code changes actually match what was planned. A file listed in the plan could be completely untouched, or a file could be modified that was never planned—but neither is caught before burning review/audit/CQ quota.

Root Cause

  • Missing detection: stage_review() (line 6 in pipeline-stages-review.sh) runs immediately after build with zero consistency checking
  • Silent drift: Plan-to-build divergence goes undetected until full review cycle
  • Class of failure: Repeatable pipeline failure mode: planned work is incomplete, or out-of-scope changes introduce unplanned complexity

Impact

  • Wasted AI review cycles on unplanned changes
  • Missed detection of incomplete builds (planned files untouched)
  • No early warning before heavy processing

Files to Modify

  • scripts/lib/pipeline-stages-review.sh — Add drift detection helper + integration (~38 lines total)

Implementation Steps

Phase 1: Build the Drift Detection Helper (Lines 50–80)

  1. Add detect_plan_drift() function before stage_review()

    • Input: $1=artifacts_dir, $2=project_root
    • Output: String of drift warnings (or empty if no drift)
    • Logic:
      • Read $artifacts_dir/plan.md
      • Extract filenames from ## Files to Modify section (parse markdown bullets)
      • Get git diff --name-only HEAD (actual changed files)
      • Find planned files NOT in actual changes
      • Format as [DRIFT-WARNING] Planned file not modified: <filepath> for each untouched file
    • Error handling: Return empty string (fail-open) if plan.md missing, parse fails, or git fails
  2. Handle edge cases

    • plan.md missing → return "" (no warnings, continue review)
    • ## Files to Modify not found → return "" (no warnings)
    • git diff fails → return "" with warning logged
    • All planned files modified → return "" (no drift)

Phase 2: Integrate Drift Warnings into Review Prompt (Line ~60)

  1. Call detect_plan_drift() from stage_review()

    • Location: Between ruflo review completion (line 79) and native review prompt construction (line 82)
    • Code: local drift_warnings; drift_warnings=$(detect_plan_drift "$ARTIFACTS_DIR" "$PROJECT_ROOT" 2>/dev/null || true)
  2. Inject warnings into review_prompt

    • If [[ -n "$drift_warnings" ]], append section before diff:
      review_prompt+="
      ## Cross-Stage Drift Detected
      The following files were planned but not modified by the build:
      ${drift_warnings}
      
      Reviewer: Verify whether these files were intentionally skipped or represent incomplete implementation.
      "
  3. Add event logging

    • When drift detected: emit_event "review.drift_detected" "issue=${ISSUE_NUMBER:-0}"
    • Allows tracking of plan-build divergence frequency

Phase 3: Testing

  1. Create unit test file scripts/sw-lib-pipeline-stages-review-test.sh

    • Test case 1: plan.md lists 2 files, git diff shows 1 → expect 1 drift warning
    • Test case 2: plan.md lists 2 files, git diff shows 2 → expect no warnings
    • Test case 3: plan.md missing → expect no warnings (fail-open)
    • Test case 4: git diff fails → expect no warnings (fail-open)
    • Test case 5: plan.md has no ## Files to Modify → expect no warnings
  2. Create integration test

    • Setup real git repo with commits
    • Create plan.md with 3 files
    • Create diff showing only 2 files modified
    • Run stage_review()
    • Verify review.md contains [DRIFT-WARNING] markers
  3. Run npm test

    • Verify no regressions in existing tests
    • Ensure new tests discovered and pass

Task Checklist

  • Task 1: Implement detect_plan_drift() helper function with error handling
  • Task 2: Unit test — 5 test cases for detect_plan_drift() covering happy path and edge cases
  • Task 3: Call detect_plan_drift() from stage_review() around line 60
  • Task 4: Inject drift warnings into review_prompt variable
  • Task 5: Add event logging for drift detection (emit_event "review.drift_detected")
  • Task 6: Create integration test with full review stage
  • Task 7: Run npm test and verify all tests pass
  • Task 8: Add inline documentation to helper function
  • Task 9: Verify fail-open behavior with corrupted plan.md (manual test)
  • Task 10: Final validation — test suite passes, no regressions

Testing Approach

Test Pyramid

  • Unit tests (70%): 5 tests for detect_plan_drift() function

    • Correct filename extraction from plan.md
    • Accurate git diff comparison
    • Missing plan.md handling
    • Git command failures
    • Parse error handling
  • Integration tests (20%): 2 tests with full review stage

    • Drift warnings appear in final review prompt
    • No warnings when all planned files changed
  • E2E tests (10%): 1 test for complete pipeline

    • Full pipeline: plan → build with partial changes → review detects drift

Critical Paths to Test

Happy Path:

  • plan.md: src/a.js, src/b.js, src/c.js
  • git diff: src/a.js, src/b.js (changed)
  • Expected: [DRIFT-WARNING] Planned file not modified: src/c.js in review prompt

Error Case 1: Missing plan.md

  • No plan.md in artifacts
  • Expected: Empty drift warnings, review continues unaffected

Error Case 2: Git failure

  • git diff command fails
  • Expected: Fails safely, review stage doesn't break

Edge Case: All planned files modified

  • plan.md: 2 files
  • git diff: Both files + extras modified
  • Expected: No drift warnings (all planned work done)

Definition of Done

detect_plan_drift() function implemented in pipeline-stages-review.sh ✓ Filenames extracted from ## Files to Modify markdown section ✓ Comparison against git diff --name-only HEAD works correctly ✓ [DRIFT-WARNING] injected for each unmodified planned file ✓ Fail-open: returns empty string if plan.md missing or git fails ✓ Integrated into stage_review() before native review prompt ✓ Drift warnings visible in final review output ✓ Event logging: emit_event "review.drift_detected" ✓ Unit tests pass (5 test cases) ✓ Integration tests pass (2 test cases) ✓ npm test passes (no regressions) ✓ Inline documentation added ✓ Manual fail-open verification completed ✓ Issue #417 resolved, Issue #128 superseded

Design Alternatives Considered

Alternative A: Extract from ## Files to Modify (CHOSEN)

Approach: Parse markdown section for filenames, compare against git diff

Pros:

  • Simple: straightforward regex/grep pattern matching
  • Reusable: Works with standardized plan.md format
  • Low risk: ~20 lines of code
  • Already validated: plan.md format is stable

Cons:

  • Fragile to format changes (but low risk)
  • Doesn't detect deleted planned files

Chosen because: Simplicity + low risk + proven plan.md structure


Alternative B: Use git metadata

Approach: Extract from git log/blame to infer planned vs actual

Pros:

  • Format-independent
  • Automatically adapts

Cons:

  • Complex implementation
  • Slow (git operations)
  • Uncertain inference
  • Higher failure modes

Rejected: Too complex without proportional benefit


Alternative C: Embed in pipeline state

Approach: Save planned file list in pipeline-state.md or JSON

Pros:

  • Decoupled from plan.md parsing

Cons:

  • Requires state management changes
  • More infrastructure
  • No simplicity gain

Rejected: More infrastructure without benefit

Risk Analysis

Risk Impact Mitigation
plan.md parsing breaks on unexpected format Drift detection fails silently Fail-open design: return empty string, log warning, review continues
git diff performance on large repos Slow review stage Already called in review (line 37), no additional overhead
False positives: flagging legitimate skips Reviewer confusion [DRIFT-WARNING] provides context, not automatic failure
Review prompt becomes too large Prompt truncation hides warnings guard_prompt_size() already handles prompt inflation
Git failures (network, permissions) Review stage breaks Catch errors, return empty string, don't break review

Success Metrics

  • Plan-to-build drift is now detectable before review/audit/CQ burn time
  • Drift warnings help reviewers understand scope completeness
  • Zero breaking changes to existing review behavior
  • Issue #417 closes, Issue #128 superseded
  • Test suite passes without regressions

Clone this wiki locally