Skip to content

fix: #515 — rc13 plan-summary over-fire (CLI-style brevity restored)#516

Merged
Nathan Schram (nathanschram) merged 1 commit into
devfrom
fix/v0.35.3rc13-plan-summary-overfire
May 12, 2026
Merged

fix: #515 — rc13 plan-summary over-fire (CLI-style brevity restored)#516
Nathan Schram (nathanschram) merged 1 commit into
devfrom
fix/v0.35.3rc13-plan-summary-overfire

Conversation

@nathanschram
Copy link
Copy Markdown
Member

Summary

Changes

  • src/untether/runner_bridge.py _DEFAULT_PREAMBLE: A1 → "concise 3–5 bullets; not as final deliverable"; A2 → "brief CLI-style summary, ~500–1500 chars, do NOT re-paste full plan content"; A3 → "Path AND a 3–5 bullet headline summary, not a re-paste".
  • src/untether/runners/claude.py _prepend_exitplanmode_plan: substring-skip replaced with len(final_answer) < 600 length gate (substring check kept as cheap secondary skip); plan body capped at 1500 chars + truncation marker when Layer E fires.
  • Version bump 0.35.3rc12 → rc13; CHANGELOG entry; uv lock synced.
  • 7 new/updated tests in tests/test_preamble.py (regression-locks rc11 verbosity-driving phrases out, plus length-gate / body-cap / substring-skip cases); 2 new tests in tests/test_claude_runner.py.

Wider blast radius note

A3's reworded ## Summary ### Plan/Document Created bullet affects all completed Claude runs, not just plan-mode. This is intentional — the user explicitly asked for "summaries like we have here in command line", which is the broader goal. If a future case needs the older verbose shape, the right answer is per-chat /verbose (already exists) rather than reverting A3.

Test plan

  • uv run pytest — 2652 passed, 2 skipped, 82.38% coverage
  • uv run ruff format --check src/ tests/ — clean
  • uv run ruff check src/ tests/ — clean
  • uv lock --check (lockfile synced)
  • Live integration test (primed) on @untether_dev_bot: research-task prompt with "keep it short" → 882 chars, no 📋 Plan (approved): literal, clean CLI summary
  • Live integration test (unprimed) on @untether_dev_bot: default research-task prompt (no brevity hint) → 1019 chars; preamble does its job on its own
  • Fallback path (Claude exits with brief post-approval text → Layer E fires with capped plan body): unit-tested only — not live-verified because the new preamble makes the empty post-approval path almost impossible to repro intentionally; test_prepend_exitplanmode_plan_when_final_answer_short + test_translate_result_caps_long_plan_body_when_prepending cover it

Targets

  • v0.35.3rc13 (TestPyPI on dev merge, then staging via scripts/staging.sh install 0.35.3rc13)
  • Aimed to land alongside the other rc12/rc13 fixes on the dev branch

🤖 Generated with Claude Code

Closes the rc11/rc12 over-correction on #508 that produced 25k–42k char
(~8–12 Telegram message) finals on staging plan-mode research/audit
runs. User report (Nathan, 2026-05-12): "I had a summary from Claude
Code yesterday which was 11 Telegram messages long!! What I really
want back is to have Claude Code provide summaries like we have here
in command line — summaries of plans (not the entire plan), summaries
of recommendations and/or findings and/or next steps (where relevant)."

Three stacked over-shoots in rc11/rc12:

1. A1 preamble: "expand the bullets into a substantive summary" for
   research/audit → plan body ballooned to 2–5k chars.
2. A2 preamble: "your next assistant message ... MUST repeat the
   substantive findings" → post-approval text ballooned to 0.5–2k
   chars AND was paraphrased rather than literal-copied.
3. Layer E: substring-skip rule (body in final_answer) failed on every
   paraphrased run, so the plan body was unconditionally concatenated
   in front of the post-approval text.

Evidence from `journalctl --user -u untether.service` (last 48h on
staging @hetz_lba1_bot v0.35.3rc12): aushistory finals at 14k / 16k /
28k / 35k / 42k chars; scout finals at 26k / 27k chars. The 42k case
matches the 11-message user repro. Telegram MCP `search_messages` for
the literal "📋 Plan (approved):" returned hits on every recent
plan-mode completion in both chats — confirming Layer E was the
load-bearing over-firer.

rc13 retuning:

- A1 → "concise 3–5 bullet summary; plan is shown for approval, not
  as the final deliverable" (drops the substantive-expansion license).
- A2 → "brief CLI-style summary, 3–7 bullets or 1–2 short paragraphs,
  ~500–1500 chars, do NOT re-paste the full plan content".
- A3 (## Summary Plan/Document Created bullet) → "Path AND a 3–5
  bullet headline summary, not a re-paste of the full content". Note:
  A3 affects the ## Summary block on ALL completed work, not just
  plan-mode runs — intentional, matches user's stated goal.
- _prepend_exitplanmode_plan: substring check replaced with a length
  gate (`len(final_answer) < 600`). Substring check stays as a cheap
  belt-and-braces second skip. Plan body is capped at 1500 chars +
  truncation marker so a runaway body can't ship 30k chars even when
  Layer E does fire (preserves original #508 UX for genuinely empty
  post-approval results without re-introducing concatenation).

Live verification on @untether_dev_bot (test chat -5284581592):

- Primed test (with "keep it short" instruction): answer_len=882
  chars (~1 Telegram message), no "📋 Plan (approved):" literal.
- Unprimed test (default research-task prompt): answer_len=1019 chars
  — preamble is doing its job without user help. Layer E correctly
  skipped (1019 > 600). Quality verified: 3 substantive bullets +
  ## Summary block with Completed / Next Steps.

The original #508 fallback path (Claude exits with very short post-
approval text → Layer E fires with capped plan body) is unit-tested
only; not live-verified because the new preamble makes it almost
impossible to repro intentionally.

Tests: 7 new/updated in tests/test_preamble.py (regression-locks the
rc11 verbosity-driving phrases out of _DEFAULT_PREAMBLE, plus
length-gate / body-cap / substring-skip cases) and 2 in
tests/test_claude_runner.py (`test_translate_result_skips_prepend_
when_answer_substantive`, `test_translate_result_caps_long_plan_body_
when_prepending`). Full suite: 2652 passed, 2 skipped, 82.38%
coverage. ruff format + check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47d3127b-ab55-462b-935b-df840e6bb526

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/v0.35.3rc13-plan-summary-overfire

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nathanschram Nathan Schram (nathanschram) merged commit 1b192f1 into dev May 12, 2026
20 of 21 checks passed
@nathanschram Nathan Schram (nathanschram) deleted the fix/v0.35.3rc13-plan-summary-overfire branch May 12, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant