Skip to content

Pipeline Design 242

Seth Ford edited this page Mar 10, 2026 · 5 revisions

Now I have the full picture. Here's the ADR:


Design: Misleading "jq not available" warning when Claude outputs JSON object instead of array

Context

Claude CLI's --output-format json can produce either a JSON array ([{...}]) or a JSON object ({...}). Two functions in sw-loop.sh process this output:

  1. _extract_text_from_json() (lines 561-618) — extracts .result text from Claude's JSON response. This function was already fixed in prior iterations: it now handles both array and object formats (Case 2, lines 579-606), so the misleading "jq not available" warning (Case 3, line 610) is only reachable when jq is genuinely absent. Tests 17-19 and 23 validate this.

  2. accumulate_loop_tokens() (lines 511-556) — parses token counts. This function is still broken: line 516 checks head -c1 | grep -q '\[', so JSON objects (starting with {) fall through to the regex fallback (lines 548-554), silently losing structured token/cost data.

Constraints

  • Bash 3.2 compatibility (no associative arrays, no readarray)
  • jq may not be available — regex fallback must always exist
  • Atomic changes only — this is a bugfix, not a refactor

Decision

Fix accumulate_loop_tokens() to handle JSON objects the same way _extract_text_from_json() already does — check for both [ and { as the first character, then use the appropriate jq expression (.[-1].usage.* for arrays, .usage.* for objects).

This mirrors the pattern already validated in _extract_text_from_json() and keeps the two JSON-parsing functions symmetric.

Component Diagram

Claude CLI (--output-format json)
    │
    │  JSON array: [{result: "...", usage: {...}}]
    │  — OR —
    │  JSON object: {result: "...", usage: {...}}
    │
    ├──→ _extract_text_from_json()   [ALREADY FIXED]
    │      ├─ Case 2a: array + jq → jq '.[-1].result'
    │      ├─ Case 2b: object + jq → jq '.result'
    │      ├─ Case 3: JSON + no jq → raw copy + warning
    │      └─ Case 4: not JSON → raw copy
    │
    └──→ accumulate_loop_tokens()    [NEEDS FIX]
           ├─ Branch A: array + jq → jq '.[-1].usage.*'  ← exists
           ├─ Branch B: object + jq → jq '.usage.*'      ← MISSING
           └─ Fallback: regex parsing                     ← exists

Interface Contracts

// accumulate_loop_tokens(log_file: string): void
// Precondition: log_file is a path to a file (may not exist)
// Postcondition: LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS, LOOP_COST_MILLICENTS
//                are incremented by values parsed from the file
// Error contract: never fails — all parse errors default to 0

// _extract_text_from_json(json_file: string, log_file: string, err_file: string): 0
// Precondition: json_file path provided (file may be empty/missing)
// Postcondition: log_file contains extracted text or placeholder
// Error contract: always returns 0 — never causes pipeline failure

Data Flow

accumulate_loop_tokens(log_file)
  → read first char of file
  → if '{' or '[' AND jq available:
      → if '[': jq '.[-1].usage.input_tokens // 0'   (array path)
      → if '{': jq '.usage.input_tokens // 0'        (object path)  ← NEW
      → accumulate into LOOP_INPUT_TOKENS, LOOP_OUTPUT_TOKENS
      → parse cost_usd or estimate from model rates
  → else:
      → regex fallback (unchanged)

Error Boundaries

Component Error Source Handling
accumulate_loop_tokens jq parse failure `
accumulate_loop_tokens missing file early return 0
accumulate_loop_tokens non-numeric jq output ${var:-0} default
_extract_text_from_json jq failure `

Alternatives Considered

  1. Unified normalization layer — Pre-process all Claude output into a canonical array format before any parsing.

    • Pros: Single parsing path, DRY
    • Cons: Adds a new abstraction for a two-function problem; over-engineering for a bugfix; risk of breaking existing array handling
  2. Remove the jq path entirely, use only regex — Since regex fallback exists, just always use it.

    • Pros: Simpler, no jq dependency
    • Cons: Loses structured parsing (cost_usd, cache tokens); regex is fragile and can't extract nested fields; significant accuracy regression
  3. Wrap Claude CLI to force array output — Add | jq '[.]' or similar wrapper.

    • Pros: Guarantees array format downstream
    • Cons: Brittle coupling to Claude CLI internals; breaks if CLI changes; adds a jq dependency to the wrapper itself

Implementation Plan

  • Files to modify:

    • scripts/sw-loop.sh — Fix accumulate_loop_tokens() (lines 516-546) to handle { first-char alongside [, using .usage.* instead of .[-1].usage.* for objects
    • scripts/sw-loop-test.sh — Add test for accumulate_loop_tokens() with JSON object input (functional test, not just grep-based)
  • Files to create: None

  • Dependencies: None (jq is already optional with regex fallback)

  • Risk areas:

    • Arithmetic evaluation in bash: ensure jq output is always numeric before $(( )) — already handled by ${var:-0} pattern
    • The head -c1 approach assumes no leading whitespace — same assumption as _extract_text_from_json, acceptable since Claude CLI output is deterministic

Validation Criteria

  • accumulate_loop_tokens() correctly parses input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and total_cost_usd from a JSON object (not just array)
  • New functional test: given {"type":"result","result":"...","usage":{"input_tokens":100,"output_tokens":50},"total_cost_usd":0.003}, token accumulators reflect correct values
  • Existing Test 23 continues to pass — no "jq not available" warning when jq IS present and output is a JSON object
  • All 69 existing sw-loop-test.sh tests continue to pass
  • Regex fallback still works when jq is absent (no regression)
  • No unrelated changes included in the diff

Clone this wiki locally