Skip to content

Fix CLI rendering corruption and split CLI/frontend model defaults#121

Merged
akseljoonas merged 3 commits intomainfrom
codex/fix-cli-rendering-split-default-models-20260425-223601
Apr 25, 2026
Merged

Fix CLI rendering corruption and split CLI/frontend model defaults#121
akseljoonas merged 3 commits intomainfrom
codex/fix-cli-rendering-split-default-models-20260425-223601

Conversation

@akseljoonas
Copy link
Copy Markdown
Collaborator

@akseljoonas akseljoonas commented Apr 25, 2026

Summary

  • stop the CLI research overlay from running a background redraw ticker that races with streamed markdown output
  • keep research sub-agent routing on direct Anthropic when the active model is direct Anthropic, and mute LiteLLM warning spam in the interactive UI
  • split model defaults into explicit CLI and frontend config files, and align the frontend Claude constant with the backend's Bedrock default
  • restore missing recovery helpers and patch dangling tool-result handling, doom-loop polling detection, sandbox reuse messaging, and async test support

Testing

  • python -m compileall agent backend
  • ./frontend/node_modules/.bin/tsc -p frontend/tsconfig.json --noEmit
  • UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q

The interactive CLI was interleaving live sub-agent redraws with streamed
markdown output, which corrupted ANSI rendering and leaked raw control
sequences into the terminal. The CLI and web app also shared one default
model config even though they need different Anthropic routing defaults.

Constraint: CLI default must use direct Anthropic credentials while web sessions must default to Bedrock Anthropic
Constraint: Interactive terminal output must remain readable while sub-agent progress is live
Rejected: Single shared config file with runtime overrides | keeps ownership of defaults implicit across surfaces
Rejected: Keep background redraw ticker | concurrent terminal writers still corrupt streamed output
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep CLI and frontend default models in separate config files unless both surfaces intentionally converge again
Tested: python -m compileall agent backend
Tested: ./frontend/node_modules/.bin/tsc -p frontend/tsconfig.json --noEmit
Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest python -m pytest -q tests/unit/test_cli_rendering.py
Not-tested: Full pytest suite (blocked by pre-existing tests/unit/test_llm_error_classification.py import error during collection)
@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @akseljoonas's task in 2m 48s —— View job


PR Review

  • Gather context and read changed files
  • Review CLI rendering fix (terminal_display.py)
  • Review research tool routing changes (research_tool.py, main.py)
  • Review config split (cli_agent_config.json, frontend_agent_config.json)
  • Review frontend changes (ChatInput.tsx, model.ts)
  • Review test coverage (test_cli_rendering.py)
  • Review remaining files (session_manager.py, README.md)
  • Post final review

No blocking issues — 2 P1
Verdict: ready to merge

What I checked

  • CLI rendering fix (terminal_display.py): Removal of _ticker_task and the _tick() coroutine. Verified the display now updates only on events (start, add_call, clear). No tick means no background wake-up racing with the markdown typewriter loop. Correct.
  • Research model routing (research_tool.py:217–224): Old code matched "anthropic" in main_model, routing both direct-Anthropic and Bedrock-Anthropic to bedrock/us.anthropic.claude-sonnet-4-6. New code uses startswith to split them correctly. Direct-Anthropic → anthropic/claude-sonnet-4-6; Bedrock-Anthropic → bedrock/us.anthropic.claude-sonnet-4-6. Fix is correct for the stated goal.
  • Config split (cli_agent_config.json, frontend_agent_config.json): All call sites in agent/main.py and backend/session_manager.py updated. Grepped for lingering main_agent_config references — none found in Python or TypeScript.
  • isClaudePath broadening (model.ts:13): Changed from startsWith('anthropic/') to includes('anthropic'). Necessary because CLAUDE_MODEL_PATH now contains bedrock/us.anthropic.claude-opus-4-6-v1. No non-Claude model in MODEL_OPTIONS contains "anthropic" in its path, so the match set is correct. The predicate is frontend-only UI (quota chip visibility); actual authz is backend _is_anthropic_model at agent.py:72 which already used "anthropic" in model_id.
  • _configure_runtime_logging (main.py:58–65): Suppresses LiteLLM loggers to ERROR. Called in cli() (covers interactive path) and inside headless_main() (covers direct invocation). Double-call in headless mode is harmless since setLevel is idempotent.
  • Test coverage (test_cli_rendering.py): Covers all three routing branches of _get_research_model and the no-background-task regression. Tests are correct.
  • Backend AVAILABLE_MODELS (routes/agent.py:41–68): bedrock/us.anthropic.claude-opus-4-6-v1 already present — frontend constant aligns.

P1: configs/main_agent_config.json left behind

The old config file still exists on disk and is no longer referenced by any code path, but it has "model_name": "anthropic/claude-opus-4-6" — the pre-split direct-Anthropic value. A contributor cloning the repo who reads the old file and hand-edits it would get silent no-ops. Should be removed or at minimum a comment pointing to the two new files.

Fix this →


P1: Bedrock Sonnet research model ID likely missing -v1

research_tool.py:222 returns "bedrock/us.anthropic.claude-sonnet-4-6" (no -v1 suffix) for the Bedrock Anthropic research path. Every other Bedrock Anthropic model ID in the codebase carries the suffix: bedrock/us.anthropic.claude-opus-4-6-v1 in configs/frontend_agent_config.json:3, backend/routes/agent.py:50. If AWS cross-region inference requires the versioned suffix (which the Opus ID suggests), Bedrock users will get a model-not-found error from LiteLLM when a research sub-agent is spawned.

This is pre-existing (the value existed before this PR), but the PR now routes Bedrock users to this path by making it the frontend default, so it becomes load-bearing. Worth verifying against the actual Bedrock model catalog before merging.

The earlier PR fixed the CLI rendering and model-default split, but the
full local suite exposed additional regressions in tool-result patching,
doom-loop polling detection, sandbox reuse messaging, and async test
support. This follow-up commit restores the missing helpers and updates
those production paths so the new regression tests pass for real.

Constraint: Provider message histories must keep tool_use/tool_result pairing valid across interrupted turns
Constraint: Legitimate polling with changing results must not trip doom-loop recovery
Rejected: Only fix the original collection blocker | leaves the full suite red and the PR note stale
Rejected: Silence the failing tests without restoring runtime helpers | would hide real production regressions
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep the local regression tests in sync with the production recovery paths they exercise
Tested: python -m compileall agent/context_manager/manager.py agent/core/agent_loop.py agent/core/doom_loop.py agent/tools/sandbox_tool.py backend/user_quotas.py
Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q tests/unit/test_dangling_tool_calls.py tests/unit/test_doom_loop_polling.py tests/unit/test_sandbox_already_active_message.py tests/unit/test_user_quotas.py
Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q
Not-tested: Remote CI environment parity
@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @akseljoonas's task in 3m 46s —— View job


Claude finished @akseljoonas's task ——


Re-review (new commit: 0140724)

  • Check if prior P1s were addressed
  • Review new _patch_dangling_tool_calls rewrite (manager.py)
  • Review doom_loop.py result-hash fix
  • Review agent_loop.py error classification and retry schedule
  • Review malformed-args recovery injection
  • Review new test suite
  • Post updated review

1 P0
Verdict: changes requested

Prior P1s (not re-flagged per policy)

Both P1s from the first review remain unaddressed: configs/main_agent_config.json is still on disk, and research_tool.py:222 still returns bedrock/us.anthropic.claude-sonnet-4-6 without the -v1 suffix.

What I checked (new commit)

  • _patch_dangling_tool_calls rewrite (manager.py:253–283): Forward scan correctly inserts stubs inline at the right position (immediately before the next non-tool message) for each assistant message whose tool_calls lack a matching result. Idempotent — a second call finds the stubs already present and skips. The multi-turn case (test_orphan_in_earlier_turn_still_gets_patched) is now handled correctly.
  • doom_loop.py result-hash (doom_loop.py:24,59–70): ToolCallSignature.result_hash is populated from the immediately following tool message. Polling with identical args but changing results produces signatures that differ on result_hash, so detect_identical_consecutive won't fire. The "truly stuck" case (same args + same result) still fires correctly.
  • Malformed-args recovery (agent_loop.py:95–148, 269–292): _detect_repeated_malformed walks backward, skipping non-tool messages and breaking on any non-malformed tool result. The recovery prompt is injected before the next LLM call. Logic and tests look correct.
  • Context overflow catch (agent_loop.py:422–423, 249–250): Both streaming and non-streaming paths now wrap non-ContextWindowExceededError context overflow exceptions before they reach the generic retry handler. Correct.
  • Test coverage: All five new test files exercise the production paths they claim to cover.

P0: Rate-limit retry budget falls short of the stated 60 s goal; test validates a dead entry

agent_loop.py:179:

_LLM_RATE_LIMIT_RETRY_DELAYS = [15, 30, 45]  # exceed Bedrock's ~60s TPM bucket window

The retry loop is for _llm_attempt in range(_MAX_LLM_RETRIES): where _MAX_LLM_RETRIES = 3 (agent_loop.py:177). The retry condition is _llm_attempt < _MAX_LLM_RETRIES - 1 (agent_loop.py:433), i.e., _llm_attempt < 2. That means:

  • Attempt 0 → wait 15 s
  • Attempt 1 → wait 30 s
  • Attempt 2 → condition false, raise

_LLM_RATE_LIMIT_RETRY_DELAYS[2] (45 s) is never reached. The actual retry budget is 45 s, which is below the ~60 s Bedrock TPM bucket window the comment promises to exceed.

The test at test_llm_error_classification.py:737 asserts sum(_LLM_RATE_LIMIT_RETRY_DELAYS) > 60 — this passes (90 > 60) but tests the sum of the full list, not the sum of the two entries that are actually consumed.

Minimal fix — change the schedule to two entries that together exceed 60 s:

_LLM_RATE_LIMIT_RETRY_DELAYS = [30, 60]  # exceed Bedrock's ~60s TPM bucket window

30 + 60 = 90 s. No test changes needed: sum([30, 60]) = 90 > 60 still passes, and _retry_delay_for(err, 2) is None still passes since the list has length 2.

Fix this →

The review was right about two follow-up issues: the old shared config file
was still present after the CLI/frontend split, and the Bedrock rate-limit
retry schedule still had a dead third entry because the loop only ever
consumed two retry delays. This commit removes the orphaned config and makes
the rate-limit schedule line up with the actual retry budget.

Constraint: Retry budget for Bedrock token throttling must exceed the provider's ~60s bucket recovery window in the retries that actually run
Rejected: Keep a third delay entry in the schedule | the current retry loop never reaches it
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep retry schedules aligned with the retry loop's real number of sleeps, not the raw retry constant count
Tested: python -m compileall agent/core/agent_loop.py tests/unit/test_llm_error_classification.py
Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q tests/unit/test_llm_error_classification.py
Tested: UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q
Not-tested: Remote CI environment parity
@akseljoonas
Copy link
Copy Markdown
Collaborator Author

Validated the remaining review points before changing code:

  • Fixed: removed the orphaned configs/main_agent_config.json.
  • Fixed: shortened _LLM_RATE_LIMIT_RETRY_DELAYS to the two entries the retry loop actually consumes, while keeping the effective retry budget above 60s.
  • Not applied: the Bedrock Sonnet 4.6 -v1 suggestion is incorrect. The current AWS Bedrock model card lists anthropic.claude-sonnet-4-6 with geo/global IDs like us.anthropic.claude-sonnet-4-6, so leaving bedrock/us.anthropic.claude-sonnet-4-6 unchanged is intentional.

Local verification after the update: 87 passed with UV_CACHE_DIR=/tmp/uv-cache uv run --with pytest --with pytest-asyncio python -m pytest -q.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @akseljoonas's task in 1m 3s —— View job


Re-review (new commit: 4af77ac)

LGTM
Verdict: ready to merge

Prior findings

  • P0 (rate-limit retry budget) — Fixed. _LLM_RATE_LIMIT_RETRY_DELAYS = [30, 60] now has exactly _MAX_LLM_RETRIES - 1 = 2 entries, matching what the retry loop actually consumes. Budget is 90 s (> 60 s). Test at test_llm_error_classification.py:99 adds a structural guard (len(...) == _MAX_LLM_RETRIES - 1) that will break if a future diff re-introduces a dead third entry.
  • P1 (orphaned config) — Fixed. configs/main_agent_config.json deleted; grepped for main_agent_config across all .py/.ts/.json/.md — zero remaining references.
  • P1 (Bedrock Sonnet -v1 suffix) — Author's pushback accepted. AWS model catalog lists us.anthropic.claude-sonnet-4-6 without the -v1 suffix (unlike Opus), so research_tool.py:222 is intentionally correct.

What I checked

  • agent/core/agent_loop.py:179,215–226: _retry_delay_for uses attempt_index >= len(schedule) as its guard — with a 2-entry list and a loop that only retries when _llm_attempt < 2, indices 0 and 1 are exactly the consumed entries. Logic is consistent end-to-end.
  • tests/unit/test_llm_error_classification.py:99–100: Both assertions (len structural guard and sum > 60) pass. No spurious tests left over from the old 3-entry list.
  • configs/: Only cli_agent_config.json and frontend_agent_config.json remain. No dangling references.

@akseljoonas akseljoonas merged commit 3eec386 into main Apr 25, 2026
1 check passed
@akseljoonas akseljoonas deleted the codex/fix-cli-rendering-split-default-models-20260425-223601 branch April 25, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant