Skip to content

[Feature]: Channel responses should produce an LLM-generated summary alongside the markdown attachment, not a head-truncation #64

@initializ-mk

Description

@initializ-mk

Component

forge-plugins (channel plugins, markdown converter)

Scope

Medium (multiple files / new command)

Problem statement

When an agent response is large, channel adapters (Slack, Telegram) already split it into an inline message plus a research-report.md attachment. The mechanism works end-to-end, but the inline message is not a summary — it is just the head of the report sliced at a sentence boundary. For a long research response with one big opening paragraph, the user sees the first 500 characters of the report in the channel followed by Full report attached as file above., and has to download the markdown to find out what the report actually concludes.

This breaks the goal of "response brevity": the brevity that exists today is accidental (we ran out of budget after 500 chars), not intentional (we summarised the work).

Current behaviour — Slack

forge-plugins/channels/slack/slack.go:539-605:

Condition What gets sent to the channel
Runtime attached a file part (tool output > 8000 chars, see forge-core/runtime/loop.go:189) LLM text → truncated by SplitSummaryAndReport to first paragraph capped at 600 chars → uploaded file as research-report.md (or runtime-supplied name)
No file part, len(text) > 4096 summary, report := SplitSummaryAndReport(text)report == text verbatim, summary is first paragraph if ≤ 600 chars else truncateAtSentence(text, 500)
No file part, len(text) ≤ 4096 Full text, chunked at 4000 chars

Telegram (forge-plugins/channels/telegram/telegram.go:322-405) mirrors the same logic with 4096 / 600 / 500 char thresholds.

The actual "summarizer" — forge-plugins/channels/markdown/split.go

func SplitSummaryAndReport(text string) (summary, report string) {
    report = text
    if idx := strings.Index(text, \"\\n\\n\"); idx > 0 && idx <= 600 {
        summary = text[:idx]
    } else if idx > 600 {
        summary = truncateAtSentence(text, 500)
    } else {
        summary = truncateAtSentence(text, 500)
    }
    summary = strings.TrimSpace(summary)
    return summary, report
}

truncateAtSentence walks backwards from char 500 looking for ., !, or ?. There is no semantic summarisation step. If the LLM happened to produce a structured # Header\\n\\n…short lede…\\n\\n…body… document, the lede becomes the "summary" by luck. If it produced one long unbroken paragraph (very common for research output), the user gets a mid-sentence head-cut.

Symptoms users see

  1. The inline message ends abruptly mid-thought after the first ~500 chars.
  2. The inline message often contains the report's introduction ("I will analyse X by …") but never its conclusion ("…and the answer is Y"). Users have to download the attachment to find the answer.
  3. The thresholds (4096 char trigger, 600 char paragraph cap, 500 char fallback, 8000 char tool-output trigger) are all magic numbers, not configurable. There is no way for an operator to say "my Slack channel prefers short summaries even for 2K-char responses" or "my Telegram channel can take 10K-char text natively, don't attach a file at all".
  4. When the runtime attaches a tool-output file part, the LLM text we use as the "summary" is itself whatever the LLM produced — which may already be 4 KB of prose. We then run that 4 KB through the same head-truncation. The summary is still just the head of the LLM text, which is itself just the head of the actual report.

Proposed solution

Replace the "head-truncation = summary" model with an explicit two-output model:

  1. Below threshold → full inline message, no attachment. Unchanged. ("Brevity is preserved naturally.")

  2. Above threshold → LLM-generated summary inline + the full markdown attached. The summary is a real summary (3–5 bullets / 1–2 paragraphs / configurable), produced by:

    • Preferred: a dedicated summarisation step in the agent loop — when len(finalResponse) > threshold, run one extra LLM call asking the model to produce a short summary of its own output, attach the original full output as the markdown file, return both. The runtime hands the channel adapter a Summary field on a2a.Message (or a designated summary file part) so the adapter does not have to guess.
    • Fallback when the agent runs without a summariser (custom skill, no LLM in scope): keep the existing head-truncation but rename it from summary to preview everywhere so the contract is honest.
  3. Configurable thresholds in forge.yaml under a new channels.response block:

    channels:
      response:
        inline_max_chars: 4096        # default — full message inline up to this
        summary_target_chars: 800     # target length for LLM-generated summary
        attach_full_report: true      # if false, just trim and skip the attachment
        attachment_filename: research-report.md

    Per-channel overrides under channels.slack.response / channels.telegram.response.

  4. A2A type extension. Add Summary string to a2a.Message (or a typed PartKindSummary). The runtime fills it when it produced one; channels prefer it over SplitSummaryAndReport(text). This is the contract that lets us evolve the summariser without touching every channel.

  5. Make SplitSummaryAndReport the fallback only. It stays, but is no longer the primary code path. Renaming to headPreview would also be reasonable.

Alternatives considered

  • Just make the thresholds configurable, keep the head-truncation. Easier to ship, doesn't solve the actual quality problem — users still see a head cut, just at a different offset.
  • Always send the full message as an attachment, no inline text. Loses Slack/Telegram's native preview affordance; people scrolling the channel see nothing of substance. Rejected.
  • Use the first N lines instead of first N chars. Marginal improvement, still not a summary; a 12-line introduction is no more useful than a 500-char one.
  • Ask the channel platform to do the previewing. Slack/Telegram render the first ~3 lines of a file attachment as a preview, but this is platform-specific, not styleable, and doesn't help us emit a thoughtful summary message.

Additional context

  • Affected files: forge-plugins/channels/markdown/split.go, forge-plugins/channels/slack/slack.go, forge-plugins/channels/telegram/telegram.go, forge-core/runtime/loop.go (file-part attachment), forge-core/a2a/types.go (if we add Summary), docs/core-concepts/channels.md ("Large Response Handling" section currently says "first paragraph, up to 600 characters" — would need rewriting).
  • Today's behaviour is documented at docs/core-concepts/channels.md:198-207 ("Large Response Handling"). That doc would be the natural place to describe the new contract.
  • Backwards compatibility: the Summary field is additive on a2a.Message; absent it, channel adapters fall back to SplitSummaryAndReport exactly as today. Configurable thresholds default to today's hardcoded values. No breaking change.
  • Related: the file-part path (extractLargestFile in slack.go:935, telegram.go:621) attaches the largest tool output as the channel file when present. That path also benefits from this change — once the runtime can emit a real summary, the file-part case can stop using the LLM's prose text as a "summary" too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions