Skip to content

Pipeline Plan 46

Seth Ford edited this page Feb 13, 2026 · 3 revisions

The plan file is being blocked by the sandbox. The plan content has been fully prepared. Let me present it directly to you instead.


Implementation Plan: Dynamic Team Scaling (Issue #46)

Overview

Add mid-pipeline team scaling to Shipwright — spawn additional agents when a pipeline stage is struggling, dismiss idle agents when work is done. This builds on the existing daemon_auto_scale() infrastructure (which scales pipelines) to now scale agents within a pipeline.

Key distinction: daemon_auto_scale() controls how many concurrent pipelines run. This feature controls how many agents work within a single pipeline's build stage.

Important: Previous Iterations Introduced Regressions

The 12 prior loop iterations on this branch made no progress on the actual feature and introduced regressions by removing safety code:

  • Removed unset CLAUDECODE (needed for spawning Claude CLI from within a session)
  • Removed trap '' HUP (prevents tmux detach from killing pipelines)
  • Reverted printf-based state writes to heredoc (vulnerable to delimiter injection)
  • Changed daemon_log from stderr to stdout (corrupts $() captures)
  • Removed CPU-based agent health detection (premature killing of thinking agents)
  • Removed nudge system, deterministic log rotation, overflow-safe backoff

These must be reverted first before adding new code.


Files to Modify

New Files

  1. scripts/sw-scaling.sh (~400 lines) — Team scaling engine: trigger evaluation, spawn/dismiss orchestration, agent registry, cooldown, budget checks
  2. scripts/sw-scaling-test.sh (~600 lines) — Test suite

Modified Files

  1. scripts/sw-daemon.sh — Source scaling engine, add scaling_evaluate() call in health check loop, add scaling config to load_config()
  2. scripts/sw-loop.sh — Write scaling-signals.json after each iteration with structured progress data
  3. scripts/sw-pipeline.sh — Export scaling-relevant state from stage_build()
  4. templates/pipelines/autonomous.json — Add scaling config section
  5. templates/pipelines/full.json — Add scaling config section
  6. templates/pipelines/enterprise.json — Add scaling config section
  7. dashboard/server.ts — Handle pipeline.agent_spawned / pipeline.agent_dismissed events
  8. package.json — Register sw-scaling-test.sh
  9. .claude/CLAUDE.md — Document the scaling system

Implementation Steps

Step 1: Revert regressions (sw-daemon.sh, sw-pipeline.sh, sw-loop.sh)

Restore: unset CLAUDECODE, trap '' HUP, printf-based state writes, daemon_log stderr routing, CPU health detection, nudge system, DAEMON_LOG_WRITE_COUNT, overflow-safe backoff, exec in daemon_spawn_pipeline.

Step 2: Create sw-scaling.sh core engine

Functions: scaling_load_config(), scaling_evaluate(), scaling_check_triggers(), scaling_cooldown_ok(), scaling_budget_check(), scaling_get_active_agents()

Agent registry at ~/.shipwright/scaling/<issue>/agents.json. Triggers:

Trigger Condition Action
iteration_threshold Build iter > 5, no test pass Spawn debugger
repeated_errors Same error 3+ times Spawn specialist
multi_module 5+ directories touched Spawn builder
idle_dismiss No output for 5+ min Dismiss agent
stage_complete Stage finished Dismiss extras

Step 3: Implement scaling_spawn_agent()

Uses tmux split-window, sets pane title, writes context file, launches Claude, registers in agent registry, emits pipeline.agent_spawned event.

Step 4: Implement scaling_dismiss_agent()

Sends C-c to pane, waits 30s, force-kills if needed, removes from registry, emits pipeline.agent_dismissed, re-tiles.

Step 5: Implement scaling_build_context()

Assembles: pipeline state, recent errors, memory patterns, file assignments (non-overlapping), team task list.

Step 6: Integrate into daemon health check loop

Add scaling_evaluate() call after existing health check logic. Add config vars to load_config().

Step 7: Enhance sw-loop.sh progress reporting

Write scaling-signals.json after each iteration with iteration count, consecutive failures, error signature, modules touched, test pass/fail.

Step 8: Add scaling config to pipeline templates

autonomous.json, full.json, enterprise.json get "scaling": { "enabled": true, "agents_max": 3, "cooldown_s": 120 }.

Step 9: Dashboard event handling

Add pipeline.agent_spawned, pipeline.agent_dismissed, pipeline.team_scaled to event processing in dashboard/server.ts.

Step 10: Write test suite

15 tests following project harness pattern with mock tmux/claude/sw-cost.sh binaries.

Step 11: Register tests and update docs

Step 12: Run full test suite — all 23 suites pass


Task Checklist

  • Task 1: Revert regressions from previous iterations
  • Task 2: Create sw-scaling.sh with core scaling functions
  • Task 3: Implement scaling_spawn_agent() with tmux + context + registry
  • Task 4: Implement scaling_dismiss_agent() with graceful shutdown
  • Task 5: Implement scaling_build_context() for new agent context transfer
  • Task 6: Integrate scaling into daemon health check loop
  • Task 7: Add scaling-signals.json output to sw-loop.sh
  • Task 8: Add scaling config to pipeline templates
  • Task 9: Add dashboard event handling for scaling events
  • Task 10: Create sw-scaling-test.sh with 15 unit tests
  • Task 11: Register test suite in package.json, update CLAUDE.md
  • Task 12: Run full test suite — all 23 suites pass

Testing Approach

Unit tests: Mock tmux/claude/cost binaries; test each trigger, cooldown, budget, registry CRUD, context generation.

Integration: Run existing daemon + pipeline test suites to verify no regressions.

Manual: Start daemon with scaling.enabled: true, run pipeline with failing tests, verify agent spawns/dismisses, check events.jsonl + dashboard.

Definition of Done

  • sw-scaling.sh follows project conventions (pipefail, Bash 3.2, VERSION, emit_event)
  • Pipeline can add/dismiss agents mid-stage
  • New agents receive full context
  • Budget limits constrain scaling
  • 120s cooldown prevents thrashing
  • Scale events in events.jsonl and dashboard
  • All regressions reverted
  • 15+ scaling tests pass
  • Full npm test (23 suites) passes

Clone this wiki locally