-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 154
I now have a thorough understanding of the codebase. Here's the ADR:
The shipwright loop harness runs Claude Code in repeated iterations until a goal is achieved. Each iteration invokes the Claude CLI, which receives the full accumulated conversation context. As iterations accumulate, Claude's internal context window fills. When it exhausts, the session degrades or fails silently — there is no proactive mechanism to detect and prevent this.
Existing infrastructure we build on:
-
accumulate_loop_tokens()insw-loop.sh:469already tracks cumulativeLOOP_INPUT_TOKENSandLOOP_OUTPUT_TOKENSacross iterations -
run_loop_with_restarts()insw-loop.sh:2389already handles session restarts (used by stuckness detection at line 2370) -
manage_context_window()inloop-iteration.sh:8trims the injected prompt but not the Claude conversation context -
write_progress()inloop-progress.sh:8anderror-summary.jsonalready capture iteration state -
emit_event()provides telemetry
The gap: No mechanism monitors cumulative token usage against the model's context window or triggers a preemptive restart before context exhaustion occurs.
Add a new module loop-context-monitor.sh that monitors cumulative token usage as a percentage of the model context window. When usage crosses a configurable threshold (default 70%), generate a compressed state summary and break out of the main loop with a context_exhaustion status, triggering the existing restart mechanism with the summary injected.
Key design choices:
-
Cumulative tokens as proxy —
LOOP_INPUT_TOKENSgrows monotonically and correlates with conversation context growth. It underestimates true context usage (doesn't account for Claude's internal reasoning), so the conservative 70% threshold compensates. -
Proactive, not reactive — We detect and act before hitting limits, rather than parsing CLI error output after failure.
-
Reuse existing restart path — The
run_loop_with_restarts()wrapper already handles session resets, artifact archival, and state preservation. We addcontext_exhaustionas a new restartable status alongsidestuck_restart. -
Structured summary injection — On restart, inject
context-summary.mdinto the goal so the fresh session has compressed context (goal, files changed, error patterns, test status) without full conversation history.
-
Character-based prompt size tracking only — Simpler, but only measures the injected prompt. The real exhaustion happens in Claude's accumulated conversation context across turns, which this approach cannot observe. Rejected as insufficient.
-
Claude CLI stderr parsing for context limit warnings — Most accurate, but reactive (fires after degradation starts), fragile (depends on CLI output format that can change without notice), and doesn't provide time to summarize state cleanly. Rejected in favor of proactive prevention.
┌─────────────────────────────────────────────────────────────┐
│ sw-loop.sh (orchestrator) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ loop- │ │ loop- │ │ loop-context- │ │
│ │ iteration.sh │──│ convergence.sh│ │ monitor.sh [NEW] │ │
│ │ │ │ │ │ │ │
│ │ Runs Claude │ │ Detects │ │ Tracks token % │ │
│ │ CLI, calls │ │ stuckness │ │ Generates state │ │
│ │ accumulate_ │ │ │ │ summary │ │
│ │ loop_tokens() │ │ │ │ │ │
│ └──────┬───────┘ └──────────────┘ └────────┬─────────┘ │
│ │ │ │
│ │ tokens accumulate │ checks │
│ ▼ │ threshold │
│ LOOP_INPUT_TOKENS ◄───────────────────────────┘ │
│ LOOP_OUTPUT_TOKENS │
│ │ │
│ ┌──────┴───────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ loop- │ │ loop- │ │ events.jsonl │ │
│ │ restart.sh │◄─│ progress.sh │ │ (telemetry) │ │
│ │ │ │ │ │ │ │
│ │ Manages state │ │ Writes │ │ context_usage │ │
│ │ reset, archive│ │ progress.md │ │ context_exhaust │ │
│ │ summary inject│ │ context- │ │ _warning/restart │ │
│ │ │ │ summary.md │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Data flow:
CLI JSON output → accumulate_loop_tokens() → LOOP_INPUT/OUTPUT_TOKENS
→ check_context_exhaustion() → threshold crossed?
YES → summarize_loop_state() → context-summary.md
→ STATUS="context_exhaustion" → break
→ run_loop_with_restarts() picks up, injects summary, restarts
NO → emit context_usage event → continue loop
// --- loop-context-monitor.sh public API ---
// Constants (configurable via environment)
CONTEXT_WINDOW_TOKENS: number // default: 200000 (Opus/Sonnet context window)
CONTEXT_EXHAUSTION_THRESHOLD: number // default: 70 (percent)
/**
* Check if cumulative token usage exceeds the exhaustion threshold.
* @reads LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS, CONTEXT_WINDOW_TOKENS, CONTEXT_EXHAUSTION_THRESHOLD
* @sideeffect Emits loop.context_exhaustion_warning event when threshold crossed
* @returns 0 if threshold crossed (action needed), 1 if safe
* @errors Never fails — returns 1 (safe) on any arithmetic error
*/
check_context_exhaustion(): ExitCode // 0 = exhausted, 1 = safe
/**
* Generate compressed state summary for restart injection.
* @reads ORIGINAL_GOAL, ITERATION, MAX_ITERATIONS, TEST_PASSED, STATUS,
* LOOP_START_COMMIT, LOG_DIR, LOG_ENTRIES
* @writes $LOG_DIR/context-summary.md
* @returns 0 always (best-effort; missing data produces partial summary)
* @output_format Markdown with sections: Goal, Status, Files Modified,
* Error Patterns, Recent Log Entries
* @size_constraint Max 2000 characters (hard truncate with notice)
*/
summarize_loop_state(): ExitCode // always 0
/**
* Return current context usage as integer percentage.
* @reads LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS, CONTEXT_WINDOW_TOKENS
* @returns 0 always
* @stdout Integer 0-100+ (can exceed 100 if already over window)
* @errors Outputs "0" if CONTEXT_WINDOW_TOKENS <= 0 (division-by-zero guard)
*/
get_context_usage_pct(): ExitCode // always 0; prints percentage to stdout1. Each iteration:
run_claude_iteration()
→ Claude CLI produces JSON with .usage.input_tokens / .output_tokens
→ accumulate_loop_tokens() adds to LOOP_INPUT_TOKENS / LOOP_OUTPUT_TOKENS
→ [NEW] emit loop.context_usage event with cumulative percentage
→ [NEW] check_context_exhaustion()
→ If < threshold: continue to next iteration
→ If >= threshold:
a. Emit loop.context_exhaustion_warning event
b. summarize_loop_state() → writes $LOG_DIR/context-summary.md
c. Set STATUS="context_exhaustion"
d. write_state() + write_progress()
e. break out of main loop
2. Restart path (run_loop_with_restarts):
→ Detects STATUS is not "complete" and restarts are available
→ [NEW] Copies context-summary.md to restart archive
→ Resets ITERATION, tokens, state variables (existing behavior)
→ [NEW] If context-summary.md exists, prepends to GOAL:
"## Previous Session Context (Summarized)\n<summary content>"
→ Emits loop.context_exhaustion_restart event
→ Re-enters run_single_agent_loop with fresh context
| Component | Error | Handling |
|---|---|---|
check_context_exhaustion() |
CONTEXT_WINDOW_TOKENS=0 |
Guard: [[ "$window" -gt 0 ]]; return 1 (safe) |
check_context_exhaustion() |
Non-numeric token values |
$(( ... )) with ${var:-0} defaults; worst case returns 1 (safe) |
summarize_loop_state() |
Missing git, no commits, no error-summary.json | Each section guarded with ` |
summarize_loop_state() |
Summary exceeds 2000 chars | Hard truncate with ${summary:0:2000} + notice |
get_context_usage_pct() |
Division by zero, missing tokens | Returns "0" on any error |
| Restart injection | context-summary.md missing | Conditional: only inject if file exists |
| Token parsing | jq unavailable | Existing fallback in accumulate_loop_tokens() uses regex; tokens still accumulate |
Error propagation principle: All context monitor functions fail safe — they never cause the loop to crash. False negatives (missing a threshold crossing) are acceptable; false positives (unnecessary restart) are bounded by MAX_RESTARTS.
-
scripts/lib/loop-context-monitor.sh— New module (~80 lines): constants,check_context_exhaustion(),summarize_loop_state(),get_context_usage_pct()
-
scripts/sw-loop.sh— Source new module (line ~43), addcontext_exhaustionto restartable statuses inrun_loop_with_restarts()(line ~2411), inject summary on restart -
scripts/lib/loop-iteration.sh— Afteraccumulate_loop_tokenscall (line 539), add context check andloop.context_usageevent emission -
scripts/sw-loop-test.sh— Add test cases for threshold boundaries, summarization output, restart triggering
- None new. Uses existing
jq(optional),git,awk,emit_event.
-
Token count accuracy:
LOOP_INPUT_TOKENSis a proxy for conversation context size. Each Claude CLI invocation starts a fresh conversation, so cumulative tokens track total work done, not actual context window fill. The 70% threshold is deliberately conservative to compensate. -
Summary lossy-ness: The 2000-char cap may drop relevant context on complex multi-file changes. Mitigated by preserving
error-summary.jsonandprogress.mdthrough restarts (existing behavior). -
Restart loop: If the summarized context itself pushes tokens high on the first iteration of a restart, it could trigger another immediate exhaustion. Mitigated by
MAX_RESTARTScap (default 3, hard cap 5 at line 2406).
-
check_context_exhaustion()returns 0 (exhausted) when cumulative tokens >= 70% ofCONTEXT_WINDOW_TOKENS -
check_context_exhaustion()returns 1 (safe) when cumulative tokens < 70% -
check_context_exhaustion()returns 1 (safe) whenCONTEXT_WINDOW_TOKENS=0(no division-by-zero crash) -
check_context_exhaustion()returns 1 (safe) when token counters are 0 (no false positive on fresh loop) -
summarize_loop_state()produces markdown with goal, iteration count, modified files list, error patterns, and test status -
summarize_loop_state()output does not exceed 2000 characters - Loop breaks with
STATUS="context_exhaustion"when threshold is crossed mid-loop -
run_loop_with_restarts()treatscontext_exhaustionas restartable (session continues) - Summary is injected into GOAL on restart so the fresh session has context
-
loop.context_exhaustion_warningevent emitted when threshold crossed -
loop.context_exhaustion_restartevent emitted on restart -
loop.context_usageevent emitted per iteration withusage_pct - Existing
sw-loop-test.shtests continue to pass - All code is Bash 3.2 compatible (no associative arrays, no
${var,,})
- Token accumulation: already runs per-iteration with negligible overhead (<1ms arithmetic)
- Prompt composition:
manage_context_window()already does awk-based trimming per iteration
- Context check overhead: < 1ms per iteration (integer arithmetic only)
- Summarization: < 100ms when triggered (one
git diff --name-only, one file read, one file write) - No impact on iteration latency in the common case (threshold not crossed)
Not applicable — this is shell arithmetic and one git command. The bottleneck is Claude CLI execution time (30-120s per iteration), making sub-millisecond monitoring overhead irrelevant.
- Verify
check_context_exhaustion()adds no measurable time by running 100 iterations of the function in a test and confirming < 100ms total - Verify
summarize_loop_state()completes in < 1s on a repo with 50+ changed files