Pipeline Design 154

I now have a thorough understanding of the codebase. Here's the ADR:

Design: Build loop context exhaustion prevention with proactive summarization

Context

The shipwright loop harness runs Claude Code in repeated iterations until a goal is achieved. Each iteration invokes the Claude CLI, which receives the full accumulated conversation context. As iterations accumulate, Claude's internal context window fills. When it exhausts, the session degrades or fails silently — there is no proactive mechanism to detect and prevent this.

Existing infrastructure we build on:

accumulate_loop_tokens() in sw-loop.sh:469 already tracks cumulative LOOP_INPUT_TOKENS and LOOP_OUTPUT_TOKENS across iterations
run_loop_with_restarts() in sw-loop.sh:2389 already handles session restarts (used by stuckness detection at line 2370)
manage_context_window() in loop-iteration.sh:8 trims the injected prompt but not the Claude conversation context
write_progress() in loop-progress.sh:8 and error-summary.json already capture iteration state
emit_event() provides telemetry

The gap: No mechanism monitors cumulative token usage against the model's context window or triggers a preemptive restart before context exhaustion occurs.

Decision

Add a new module loop-context-monitor.sh that monitors cumulative token usage as a percentage of the model context window. When usage crosses a configurable threshold (default 70%), generate a compressed state summary and break out of the main loop with a context_exhaustion status, triggering the existing restart mechanism with the summary injected.

Key design choices:

Cumulative tokens as proxy — LOOP_INPUT_TOKENS grows monotonically and correlates with conversation context growth. It underestimates true context usage (doesn't account for Claude's internal reasoning), so the conservative 70% threshold compensates.
Proactive, not reactive — We detect and act before hitting limits, rather than parsing CLI error output after failure.
Reuse existing restart path — The run_loop_with_restarts() wrapper already handles session resets, artifact archival, and state preservation. We add context_exhaustion as a new restartable status alongside stuck_restart.
Structured summary injection — On restart, inject context-summary.md into the goal so the fresh session has compressed context (goal, files changed, error patterns, test status) without full conversation history.

Alternatives Considered

Character-based prompt size tracking only — Simpler, but only measures the injected prompt. The real exhaustion happens in Claude's accumulated conversation context across turns, which this approach cannot observe. Rejected as insufficient.
Claude CLI stderr parsing for context limit warnings — Most accurate, but reactive (fires after degradation starts), fragile (depends on CLI output format that can change without notice), and doesn't provide time to summarize state cleanly. Rejected in favor of proactive prevention.

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                     sw-loop.sh (orchestrator)               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ loop-         │  │ loop-         │  │ loop-context-    │  │
│  │ iteration.sh  │──│ convergence.sh│  │ monitor.sh [NEW] │  │
│  │               │  │               │  │                  │  │
│  │ Runs Claude   │  │ Detects       │  │ Tracks token %   │  │
│  │ CLI, calls    │  │ stuckness     │  │ Generates state  │  │
│  │ accumulate_   │  │               │  │ summary          │  │
│  │ loop_tokens() │  │               │  │                  │  │
│  └──────┬───────┘  └──────────────┘  └────────┬─────────┘  │
│         │                                      │            │
│         │  tokens accumulate                   │ checks     │
│         ▼                                      │ threshold  │
│  LOOP_INPUT_TOKENS ◄───────────────────────────┘            │
│  LOOP_OUTPUT_TOKENS                                         │
│         │                                                   │
│  ┌──────┴───────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ loop-         │  │ loop-         │  │ events.jsonl     │  │
│  │ restart.sh    │◄─│ progress.sh   │  │ (telemetry)      │  │
│  │               │  │               │  │                  │  │
│  │ Manages state │  │ Writes        │  │ context_usage    │  │
│  │ reset, archive│  │ progress.md   │  │ context_exhaust  │  │
│  │ summary inject│  │ context-      │  │ _warning/restart │  │
│  │               │  │ summary.md    │  │                  │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Data flow:
  CLI JSON output → accumulate_loop_tokens() → LOOP_INPUT/OUTPUT_TOKENS
       → check_context_exhaustion() → threshold crossed?
           YES → summarize_loop_state() → context-summary.md
                → STATUS="context_exhaustion" → break
                → run_loop_with_restarts() picks up, injects summary, restarts
           NO  → emit context_usage event → continue loop

Interface Contracts

// --- loop-context-monitor.sh public API ---

// Constants (configurable via environment)
CONTEXT_WINDOW_TOKENS: number    // default: 200000 (Opus/Sonnet context window)
CONTEXT_EXHAUSTION_THRESHOLD: number  // default: 70 (percent)

/**
 * Check if cumulative token usage exceeds the exhaustion threshold.
 * @reads LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS, CONTEXT_WINDOW_TOKENS, CONTEXT_EXHAUSTION_THRESHOLD
 * @sideeffect Emits loop.context_exhaustion_warning event when threshold crossed
 * @returns 0 if threshold crossed (action needed), 1 if safe
 * @errors Never fails — returns 1 (safe) on any arithmetic error
 */
check_context_exhaustion(): ExitCode  // 0 = exhausted, 1 = safe

/**
 * Generate compressed state summary for restart injection.
 * @reads ORIGINAL_GOAL, ITERATION, MAX_ITERATIONS, TEST_PASSED, STATUS,
 *        LOOP_START_COMMIT, LOG_DIR, LOG_ENTRIES
 * @writes $LOG_DIR/context-summary.md
 * @returns 0 always (best-effort; missing data produces partial summary)
 * @output_format Markdown with sections: Goal, Status, Files Modified,
 *                Error Patterns, Recent Log Entries
 * @size_constraint Max 2000 characters (hard truncate with notice)
 */
summarize_loop_state(): ExitCode  // always 0

/**
 * Return current context usage as integer percentage.
 * @reads LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS, CONTEXT_WINDOW_TOKENS
 * @returns 0 always
 * @stdout Integer 0-100+ (can exceed 100 if already over window)
 * @errors Outputs "0" if CONTEXT_WINDOW_TOKENS <= 0 (division-by-zero guard)
 */
get_context_usage_pct(): ExitCode  // always 0; prints percentage to stdout

Data Flow

1. Each iteration:
   run_claude_iteration()
     → Claude CLI produces JSON with .usage.input_tokens / .output_tokens
     → accumulate_loop_tokens() adds to LOOP_INPUT_TOKENS / LOOP_OUTPUT_TOKENS
     → [NEW] emit loop.context_usage event with cumulative percentage
     → [NEW] check_context_exhaustion()
         → If < threshold: continue to next iteration
         → If >= threshold:
              a. Emit loop.context_exhaustion_warning event
              b. summarize_loop_state() → writes $LOG_DIR/context-summary.md
              c. Set STATUS="context_exhaustion"
              d. write_state() + write_progress()
              e. break out of main loop

2. Restart path (run_loop_with_restarts):
   → Detects STATUS is not "complete" and restarts are available
   → [NEW] Copies context-summary.md to restart archive
   → Resets ITERATION, tokens, state variables (existing behavior)
   → [NEW] If context-summary.md exists, prepends to GOAL:
       "## Previous Session Context (Summarized)\n<summary content>"
   → Emits loop.context_exhaustion_restart event
   → Re-enters run_single_agent_loop with fresh context

Error Boundaries

Component	Error	Handling
`check_context_exhaustion()`	`CONTEXT_WINDOW_TOKENS=0`	Guard: `[[ "$window" -gt 0 ]]`; return 1 (safe)
`check_context_exhaustion()`	Non-numeric token values	`$(( ... ))` with `${var:-0}` defaults; worst case returns 1 (safe)
`summarize_loop_state()`	Missing git, no commits, no error-summary.json	Each section guarded with `
`summarize_loop_state()`	Summary exceeds 2000 chars	Hard truncate with `${summary:0:2000}` + notice
`get_context_usage_pct()`	Division by zero, missing tokens	Returns "0" on any error
Restart injection	context-summary.md missing	Conditional: only inject if file exists
Token parsing	jq unavailable	Existing fallback in `accumulate_loop_tokens()` uses regex; tokens still accumulate

Error propagation principle: All context monitor functions fail safe — they never cause the loop to crash. False negatives (missing a threshold crossing) are acceptable; false positives (unnecessary restart) are bounded by MAX_RESTARTS.

Implementation Plan

Files to create

scripts/lib/loop-context-monitor.sh — New module (~80 lines): constants, check_context_exhaustion(), summarize_loop_state(), get_context_usage_pct()

Files to modify

scripts/sw-loop.sh — Source new module (line ~43), add context_exhaustion to restartable statuses in run_loop_with_restarts() (line ~2411), inject summary on restart
scripts/lib/loop-iteration.sh — After accumulate_loop_tokens call (line 539), add context check and loop.context_usage event emission
scripts/sw-loop-test.sh — Add test cases for threshold boundaries, summarization output, restart triggering

Dependencies

None new. Uses existing jq (optional), git, awk, emit_event.

Risk areas

Token count accuracy: LOOP_INPUT_TOKENS is a proxy for conversation context size. Each Claude CLI invocation starts a fresh conversation, so cumulative tokens track total work done, not actual context window fill. The 70% threshold is deliberately conservative to compensate.
Summary lossy-ness: The 2000-char cap may drop relevant context on complex multi-file changes. Mitigated by preserving error-summary.json and progress.md through restarts (existing behavior).
Restart loop: If the summarized context itself pushes tokens high on the first iteration of a restart, it could trigger another immediate exhaustion. Mitigated by MAX_RESTARTS cap (default 3, hard cap 5 at line 2406).

Validation Criteria

Performance

Baseline Metrics

Token accumulation: already runs per-iteration with negligible overhead (<1ms arithmetic)
Prompt composition: manage_context_window() already does awk-based trimming per iteration

Optimization Targets

Context check overhead: < 1ms per iteration (integer arithmetic only)
Summarization: < 100ms when triggered (one git diff --name-only, one file read, one file write)
No impact on iteration latency in the common case (threshold not crossed)

Profiling Strategy

Not applicable — this is shell arithmetic and one git command. The bottleneck is Claude CLI execution time (30-120s per iteration), making sub-millisecond monitoring overhead irrelevant.

Benchmark Plan

Verify check_context_exhaustion() adds no measurable time by running 100 iterations of the function in a test and confirming < 100ms total
Verify summarize_loop_state() completes in < 1s on a repo with 50+ changed files

Pipeline Design 154

Design: Build loop context exhaustion prevention with proactive summarization

Context

Decision

Alternatives Considered

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Implementation Plan

Files to create

Files to modify

Dependencies

Risk areas

Validation Criteria

Performance

Baseline Metrics

Optimization Targets

Profiling Strategy

Benchmark Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally