Skip to content

Pipeline Plan 20

Seth Ford edited this page Feb 11, 2026 · 4 revisions

I have the complete plan ready. Here it is:


Plan: End-to-End Integration Test Suite in CI

Overview

Create two new test artifacts: (1) a bash E2E test script that runs the real pipeline with mocked Claude/GitHub (smoke tests) and supports an optional "live" mode against real Claude for integration tests, and (2) a GitHub Actions workflow for integration tests on PRs to main.

The design follows the existing test harness pattern exactly — PASS/FAIL counters, mock binaries in temp dirs, assertion helpers, colored output. No new frameworks or dependencies.

Files to Modify

File Action Purpose
scripts/sw-e2e-test.sh Create End-to-end integration test suite
.github/workflows/integration.yml Create CI workflow for integration tests on PRs
package.json Modify Add test:e2e and test:integration npm scripts
.claude/CLAUDE.md Modify Add E2E test suite to test suites table

Implementation Steps

Step 1: Create scripts/sw-e2e-test.sh

Follows the exact pattern from sw-pipeline-test.sh — same boilerplate (header, colors, counters), same assertion helpers (assert_exit_code, assert_output_contains, assert_file_exists, assert_file_contains, assert_branch_exists, assert_state_contains), same invoke_pipeline() → captures output + exit code, same run_test() runner, same setup_env() / reset_test() / cleanup_env() lifecycle.

Two modes: --smoke (default, mocked, fast) and --live (real Claude, budget-capped).

10 Smoke Tests (mocked — always run in CI):

  1. test_full_pipeline_stage_order — Run fast template (intake→build→test→PR) with mocks. Verify exit 0, state file status: complete, all stage artifacts exist.
  2. test_stage_order_preserved — Parse output for stage markers, verify correct ordering.
  3. test_state_file_updated_per_stage — Verify state file has all required YAML fields: pipeline, goal, status, branch, stage_progress, timestamps.
  4. test_no_unhandled_errors — Run full pipeline, verify no unbound variable or command not found in output.
  5. test_resume_from_interrupted — Run intake-only, manually edit state, resume. Verify continuation.
  6. test_artifacts_have_valid_json — Verify all .json in pipeline-artifacts are valid via jq.
  7. test_branch_created_cleanly — Feature branch exists, has commits, clean working tree.
  8. test_dry_run_produces_no_artifacts--dry-run creates no artifacts directory.
  9. test_pipeline_with_custom_template — Custom template with only intake+build, verify only those stages run.
  10. test_error_recovery_no_crash — Mock claude returns exit 1 during plan, verify graceful failure.

Step 2: Live Integration Mode

Same script with --live flag. One test: trivial goal ("Add a comment to README.md"), intake→build with real Claude. Budget protection: --max-turns 5, fast template, 120s timeout, $1.00 cap. Skips gracefully if claude CLI unavailable.

Step 3: CI Workflow

.github/workflows/integration.yml — smoke tests on macOS + Ubuntu matrix for every PR. Conditional live integration job when CLAUDE_API_KEY secret exists. Upload test result artifacts.

Step 4: npm Scripts

Add "test:e2e" and "test:integration" to package.json scripts.

Step 5: Update CLAUDE.md

Add new test suite to the test suites table.

Task Checklist

  • Task 1: Create scripts/sw-e2e-test.sh boilerplate (header, colors, counters, assertions, setup/cleanup, main)
  • Task 2: Implement mock environment setup (real pipeline + templates, mock binaries, mock git project)
  • Task 3: Implement 10 smoke tests
  • Task 4: Implement live integration mode with budget protection and skip logic
  • Task 5: Create .github/workflows/integration.yml
  • Task 6: Update package.json with new scripts
  • Task 7: Update .claude/CLAUDE.md test suites table
  • Task 8: Run smoke tests locally to verify they pass
  • Task 9: Run ShellCheck and fix warnings

Definition of Done

  • npm run test:e2e runs the smoke test suite and passes
  • npm run test:integration runs live tests or skips gracefully
  • CI runs smoke tests on every PR to main
  • Budget-capped to $1.00 per live test run
  • Pass/fail output matches existing test suite style
  • Bash 3.2 compatible, no ShellCheck errors

Clone this wiki locally