Skip to content

feat(summarize): condense spoken output to a single sentence#2

Merged
rbright merged 11 commits intomainfrom
fix/summarize-output-tightening
Feb 23, 2026
Merged

feat(summarize): condense spoken output to a single sentence#2
rbright merged 11 commits intomainfrom
fix/summarize-output-tightening

Conversation

@rbright
Copy link
Owner

@rbright rbright commented Feb 23, 2026

Summary

  • tighten summarize prompt instructions so spoken output avoids meta preambles (for example "Summary:" / "Here's a summary...")
  • add summarize post-processing guardrails to strip summary meta-commentary and hard-cap output at 4 sentences
  • add regression tests for prompt contract, meta-prefix stripping, and sentence cap behavior
  • document summarize behavior updates in README

Test Plan

  • just fmt
  • just fmt-nix
  • just lint
  • just test
  • just smoke-e2e /tmp/koko-local-assets /tmp/koko-smoke-summary-tighten.wav "Summarize output tightening smoke test"
  • python3 ~/.agents/skills/update-docs/check-doc-links.py

Summary by CodeRabbit

  • New Features

    • Speech summaries now produce exactly one short spoken sentence (hard max 1); output is post-processed to remove leading list/markdown markers and collapse excess whitespace.
  • Bug Fixes

    • Improved sentence extraction and truncation to reliably keep only the first complete sentence and return empty when nothing remains.
  • Documentation

    • Updated summarization guidance to require one-sentence output and prohibit meta-preambles like "Summary:".
  • Tests

    • Added tests for preamble removal, whitespace collapsing, sentence capping, and preservation of model wording.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Require spoken summaries to be exactly one short sentence; add sentence-splitting, markdown/list marker stripping, whitespace collapsing, and sentence-clamping helpers; update prompt, README, and tests to reflect the single-sentence constraint.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added clarifications: summarization targets a single short spoken sentence (hard max 1) and post-processing is minimal, relying on prompt instructions.
Prompt Template
src/koko_cli/prompts/summarize_for_speech.txt
Tightened instructions to require exactly one short sentence; prohibited meta preambles/labels (e.g., "Summary:", "Here's a summary") and mentions of rewriting/that text is being summarized; retained preservation-of-meaning and actionable-step guidance.
Core Normalization Logic
src/koko_cli/summarization.py
Added constants (MAX_SUMMARY_SENTENCES, SENTENCE_ENDINGS) and multiple helpers: normalize_line_for_speech, strip_leading_markdown_markers, leading_ordered_list_marker_length, collapse_spaces, clamp_summary_sentences, split_sentences; refactored normalize_summary_output to normalize lines, remove markers, collapse spaces, and clamp output to one sentence.
Tests
tests/test_summarization.py
Added tests validating prompt text (hard max 1 sentence), preservation of model wording, whitespace collapsing, and truncation to the first sentence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hop through lines and nibble off the fluff,
One tidy sentence—short, not rough.
I strip the markers, squash spaces tight,
A rabbit's voice: concise and light. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(cli): tighten summarize spoken output shaping' accurately and specifically describes the main change—constraining the summarization output to avoid meta-commentary and enforce stricter formatting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/summarize-output-tightening

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43a15c1618

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/koko_cli/summarization.py`:
- Around line 12-43: Ruff flags literal en‑dash characters in the regex and
string literals; update the patterns and any affected strings (notably
SUMMARY_META_PREFIX_PATTERNS and SUMMARY_META_SENTENCE_EXACT) to replace the
literal en‑dash character (–) with an explicit escape like \u2013 (e.g. change
"[:\-–]" to "[:\-\u2013]") or use the Unicode name escape, or alternatively
append "# noqa: RUF001" to the specific lines; modify
SUMMARY_META_PREFIX_PATTERNS entries and any matching literals around them
accordingly so no literal en‑dash remains.
- Around line 163-174: The function strip_summary_meta_sentences currently
returns the original text when all sentences are detected as meta, allowing
meta-only outputs to pass through; update strip_summary_meta_sentences so that
after computing sentences = split_sentences(text) you return an empty string
when there are no sentences or when filtered_sentences (computed via
is_summary_meta_sentence) is empty instead of returning text or the original
sentences; locate the variables/functions split_sentences,
is_summary_meta_sentence and the filtered_sentences logic and replace the
fallback returns with "" (or raise a clear exception if preferred).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/koko_cli/summarization.py`:
- Around line 15-27: The regex literals contain literal Unicode dash characters
(e.g. the "–" inside the character class like "[:\\-–]") which Ruff flags;
replace those literal Unicode dashes with explicit unicode escapes (e.g. use
"\u2013") or use the escaped form compatible with your string type (in raw
strings use "\\u2013", or convert to a normal string and use "\u2013") for each
regex in summarization.py (look for the patterns containing "[:\\-–]" and the
long multi-line patterns shown and also the occurrences around the other noted
spots). Ensure you update every occurrence (including the ones referenced at
lines ~50-57 and ~197) so no literal en-dash remains.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/koko_cli/summarization.py`:
- Around line 186-214: The trimming logic in trim_leading_meta_words currently
only returns a trimmed result when saw_meta_core and remove_until >= 2, which
skips cases where a single meta-core word (e.g., "Summary") is directly followed
by content and later gets treated as meta-only by strip_summary_meta_sentences;
change the condition to saw_meta_core and remove_until >= 1 and remove_until <
len(tokens) so a single leading meta-core token is stripped when followed by
content, keeping the existing token-splitting, normalization via
is_meta_or_filler_word and is_meta_core_word, and the lstrip behavior to produce
the trimmed text if non-empty.

@rbright rbright changed the title fix(cli): tighten summarize spoken output shaping feat(summarize): condense spoken output to a single sentence Feb 23, 2026
@rbright rbright merged commit d8f747e into main Feb 23, 2026
2 of 3 checks passed
@rbright rbright deleted the fix/summarize-output-tightening branch February 23, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant