Skip to content

Pipeline Plan 20

Seth Ford edited this page Feb 12, 2026 · 4 revisions

It seems I'm being blocked from writing files. Let me present the plan directly instead.


Plan: End-to-End Integration Test Suite in CI

Overview

Add an integration test suite that goes beyond the existing 17 mock-based test suites by running the real pipeline with enhanced mock binaries that validate stage ordering, state transitions, and artifact generation — plus a CI workflow with budget enforcement and structured reporting.

Two tiers:

  1. Smoke tests (no Claude, no GitHub) — validate pipeline machinery with deterministic mocks
  2. Live integration tests (requires CLAUDE_API_KEY + GITHUB_TOKEN) — run a minimal pipeline against real Claude, budget-capped at $1.00

Files to Modify

File Action Purpose
scripts/sw-integration-test.sh Create Integration test suite (~500 lines)
templates/pipelines/integration-test.json Create Minimal 3-stage pipeline template
.github/workflows/integration-test.yml Create CI workflow (smoke + live jobs)
.github/workflows/test.yml Modify Add smoke test step after unit tests
package.json Modify Add test:smoke and test:integration scripts

Implementation Steps

Step 1: Create templates/pipelines/integration-test.json

Minimal template: intakebuildtest only, all auto-gated, model sonnet, max 3 iterations.

Step 2: Create scripts/sw-integration-test.sh

Same harness pattern as sw-pipeline-test.sh: set -euo pipefail, ERR trap, PASS/FAIL counters, temp dir isolation, mock binaries on PATH, assertion functions.

Tier 1 — Smoke Tests (10 tests, always run):

  1. test_smoke_full_stage_ordering — Run fast template E2E, parse pipeline-state.md, verify stage_progress order, status: success
  2. test_smoke_state_transitions — Verify state transitions pendingrunningcomplete per stage, current_stage updates correctly
  3. test_smoke_artifact_integrity — Verify all expected artifacts exist: intake.json (valid JSON), plan.md, test-results.log, events.jsonl entries
  4. test_smoke_no_crashes_fast_template — Fast template exits 0, no ERROR lines
  5. test_smoke_no_crashes_standard_template — Standard template exits 0
  6. test_smoke_no_crashes_autonomous_template — Autonomous template exits 0
  7. test_smoke_budget_enforcement — Mock cost at $1.01, verify pipeline fails; mock at $0.99, verify it passes
  8. test_smoke_resume_preserves_artifacts — Fail at test stage, resume, verify prior artifacts preserved
  9. test_smoke_dry_run_no_side_effects--dry-run creates no artifacts, no branches, no events
  10. test_smoke_stage_timing_recorded — Each completed stage has duration in state file log

Tier 2 — Live Tests (2 tests, require secrets):

  1. test_live_trivial_readme_change — Real Claude makes a trivial change, verify diff exists, clean git status, cost < $1.00
  2. test_live_pr_creation — Verify PR created with valid URL, cleanup after

Mode selection: --live flag enables Tier 2, --filter <name> for single test.

Step 3: Create .github/workflows/integration-test.yml

  • smoke job: Matrix (macOS + Ubuntu), runs on every PR and push to main
  • live job: Runs after smoke passes, only on push to main or manual trigger, uses integration-test environment for secret access, 15-minute timeout, uploads artifacts

Step 4: Update package.json

Add "test:smoke" and "test:integration" scripts.

Step 5: Update .github/workflows/test.yml

Add - name: Run integration smoke tests step after existing unit test steps.


Task Checklist

  • Task 1: Create templates/pipelines/integration-test.json
  • Task 2: Create scripts/sw-integration-test.sh scaffold (harness, mocks, assertions)
  • Task 3: Implement smoke tests 1-5 (ordering, transitions, artifacts, crash tests)
  • Task 4: Implement smoke tests 6-10 (budget, resume, dry-run, timing)
  • Task 5: Implement live tests (README change, PR creation)
  • Task 6: Create .github/workflows/integration-test.yml
  • Task 7: Update package.json with new scripts
  • Task 8: Update .github/workflows/test.yml with smoke step
  • Task 9: Add CI summary reporting ($GITHUB_STEP_SUMMARY markdown table)
  • Task 10: Run smoke tests locally and fix failures
  • Task 11: Verify CLAUDE.md AUTO sections update

Testing Approach

  1. Local: bash scripts/sw-integration-test.sh — all 10 smoke tests pass
  2. Filter: --filter test_smoke_budget_enforcement for debugging individual tests
  3. CI: Push branch → verify workflow triggers and summary reports
  4. Live: Manual dispatch with run_live: true or merge to main
  5. Cross-platform: macOS + Ubuntu matrix

Definition of Done

  • npm run test:smoke passes (exit 0)
  • npm run test:integration runs full suite when secrets available
  • CI runs smoke tests on every PR to main
  • CI runs live tests on push to main (regression)
  • Budget cap: live tests abort if cost > $1.00
  • CI summary: pass/fail per test in $GITHUB_STEP_SUMMARY
  • Existing 17 test suites still pass
  • Follows conventions: set -euo pipefail, bash 3.2, shipwright color theme

Clone this wiki locally