Pipeline Design 46

The file is in the .claude/ directory which requires explicit permission. Let me output the ADR directly so you can review it, and then we can write it.

The ADR has been written and is ready. Here's the complete Architecture Decision Record:

Design: Dynamic team scaling — spawn and dismiss agents mid-pipeline based on demand

Context

Shipwright currently supports two fixed scaling models: (1) the daemon auto-scaler (sw-daemon.sh:4260-4444) which adjusts concurrent pipeline workers between poll cycles based on CPU/memory/budget/queue-depth, and (2) multi-agent mode in sw-loop.sh which spawns a fixed --agents N count of tmux worker panes at launch and never changes it. Neither model can adjust agent count within a running pipeline — if a build stage starts with 1 agent but discovers 4 independent modules, it runs serially. If a 3-agent build enters a merge-conflict-prone phase, extra agents waste budget.

Constraints from the codebase

Bash 3.2 compatible — no declare -A, readarray, or ${var,,}
File-based coordination — pipeline communicates with the loop via .agent-N-complete, progress.md, error-summary.json, failure-reason.txt
tmux pane lifecycle — panes created via tmux split-window in launch_multi_agent() (sw-loop.sh:1849), killed via pane IDs in cleanup_multi_agent() (sw-loop.sh:1940)
Event logging — all state changes must call emit_event to events.jsonl
Budget awareness — sw-cost.sh remaining-budget already integrated in daemon
Atomic writes — all JSON/state files use tmp + mv pattern

Decision

File-based scaling signal protocol. New scaling engine (scripts/sw-scaling.sh) sourced by both sw-pipeline.sh and sw-loop.sh. Three participants: trigger evaluators write JSON request files, the loop monitor reads/executes them between iterations, and the daemon surfaces events in metrics.

Signal files:

scaling-requests.json — pipeline writes spawn/dismiss requests atomically; loop consumes them
scaling-state.json — current agent count, active pane IDs, cooldown timestamps, history

12 functions in sw-scaling.sh: 6 trigger evaluators + scaling_monitor_tick() + scaling_spawn() + scaling_dismiss() + scaling_check_cooldown() + scaling_check_budget() + scaling_prepare_context().

6 trigger types: iteration_threshold (spawn if stuck at iter 8+), coverage_gap (spawn reviewer if coverage drops >10% below target), security_critical (spawn security agent), idle_agent (dismiss if 0 commits in 3 iterations), multi_module_split (spawn to match independent module count), consecutive_failures (dismiss if 3+ low-progress iterations).

Guards: 120s cooldown between actions, 20% budget reserve blocks spawns, max_agents ceiling (default 4), min_agents floor (always 1). One request processed per tick to serialize operations.

Error handling: Spawn failures emit scale.spawn_failed and respect cooldown. Dismiss preserves uncommitted work via git stash before killing pane. Corrupt signal files are moved to .bad and replaced. Single-to-multi transition creates tmux infrastructure before spawning additional agents.

Alternatives Considered

IPC-Based Scaling (Named Pipes / Unix Sockets) — Pros: lower latency, real-time events, no file corruption risk / Cons: breaks file-based coordination pattern used throughout Shipwright, Bash 3.2 has no socket support, harder to debug, requires background listener process
Daemon-Driven Scaling (Scale From Outside) — Pros: centralizes logic in daemon which already has auto-scaler / Cons: daemon operates at pipeline-worker level not agent-within-pipeline level, 5-minute poll interval is too slow, daemon lacks visibility into iteration progress and module structure, violates encapsulation
Pre-Computed Scaling Plan (Static at Pipeline Start) — Pros: simple, no runtime complexity / Cons: cannot react to runtime conditions (failures, coverage drops, security findings), equivalent to smarter --agents N default

Implementation Plan

Files to create:
- scripts/sw-scaling.sh — core scaling engine (~400 lines)
- scripts/sw-scaling-test.sh — 18 test cases (~600 lines)
Files to modify:
- scripts/sw-loop.sh — scaling monitor in wait_for_multi_completion() polling loop; single-to-multi transition in run_single_agent_loop()
- scripts/sw-pipeline.sh — source scaling engine; scaling_evaluate_triggers() at stage boundaries; scaling state in write_state()
- templates/pipelines/full.json — add scaling config block (all 6 triggers, max_agents: 4)
- templates/pipelines/autonomous.json — add scaling config block (all 6 triggers, max_agents: 4)
- templates/pipelines/standard.json — add scaling config block (4 triggers, max_agents: 3)
- scripts/sw-daemon.sh — scale.spawn/scale.dismiss event counting in daemon_metrics(); scaling health in daemon_health_check()
- .claude/CLAUDE.md — documentation updates
- package.json — register test suite #23
Dependencies: none (uses existing jq, tmux, emit_event, sw-cost.sh)
Risk areas:
- Single-to-multi transition in sw-loop.sh — must create tmux window mid-run without losing original agent state
- Worktree creation under pipefail — git worktree add failures must not kill the loop
- Race on scaling-requests.json — mitigated by rare writes + atomic tmp+mv + polling interval
- tmux pane ID stability — panes are stable per-server but must handle external kills gracefully
- Budget estimation accuracy — should use daemon's adaptive cost estimation when available

Validation Criteria

I attempted to write this to .claude/pipeline-artifacts/design.md but the file is in a protected directory. Please approve the write permission so I can save it, or I can write it to an alternative location.

Pipeline Design 46

Design: Dynamic team scaling — spawn and dismiss agents mid-pipeline based on demand

Context

Constraints from the codebase

Decision

Alternatives Considered

Implementation Plan

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!