Component
forge-plugins (channel plugins, markdown converter)
Scope
Medium (multiple files / new command)
Problem statement
When an agent response is large, channel adapters (Slack, Telegram) already split it into an inline message plus a research-report.md attachment. The mechanism works end-to-end, but the inline message is not a summary — it is just the head of the report sliced at a sentence boundary. For a long research response with one big opening paragraph, the user sees the first 500 characters of the report in the channel followed by Full report attached as file above., and has to download the markdown to find out what the report actually concludes.
This breaks the goal of "response brevity": the brevity that exists today is accidental (we ran out of budget after 500 chars), not intentional (we summarised the work).
Current behaviour — Slack
forge-plugins/channels/slack/slack.go:539-605:
| Condition |
What gets sent to the channel |
Runtime attached a file part (tool output > 8000 chars, see forge-core/runtime/loop.go:189) |
LLM text → truncated by SplitSummaryAndReport to first paragraph capped at 600 chars → uploaded file as research-report.md (or runtime-supplied name) |
No file part, len(text) > 4096 |
summary, report := SplitSummaryAndReport(text) — report == text verbatim, summary is first paragraph if ≤ 600 chars else truncateAtSentence(text, 500) |
No file part, len(text) ≤ 4096 |
Full text, chunked at 4000 chars |
Telegram (forge-plugins/channels/telegram/telegram.go:322-405) mirrors the same logic with 4096 / 600 / 500 char thresholds.
The actual "summarizer" — forge-plugins/channels/markdown/split.go
func SplitSummaryAndReport(text string) (summary, report string) {
report = text
if idx := strings.Index(text, \"\\n\\n\"); idx > 0 && idx <= 600 {
summary = text[:idx]
} else if idx > 600 {
summary = truncateAtSentence(text, 500)
} else {
summary = truncateAtSentence(text, 500)
}
summary = strings.TrimSpace(summary)
return summary, report
}
truncateAtSentence walks backwards from char 500 looking for ., !, or ?. There is no semantic summarisation step. If the LLM happened to produce a structured # Header\\n\\n…short lede…\\n\\n…body… document, the lede becomes the "summary" by luck. If it produced one long unbroken paragraph (very common for research output), the user gets a mid-sentence head-cut.
Symptoms users see
- The inline message ends abruptly mid-thought after the first ~500 chars.
- The inline message often contains the report's introduction ("I will analyse X by …") but never its conclusion ("…and the answer is Y"). Users have to download the attachment to find the answer.
- The thresholds (4096 char trigger, 600 char paragraph cap, 500 char fallback, 8000 char tool-output trigger) are all magic numbers, not configurable. There is no way for an operator to say "my Slack channel prefers short summaries even for 2K-char responses" or "my Telegram channel can take 10K-char text natively, don't attach a file at all".
- When the runtime attaches a tool-output file part, the LLM text we use as the "summary" is itself whatever the LLM produced — which may already be 4 KB of prose. We then run that 4 KB through the same head-truncation. The summary is still just the head of the LLM text, which is itself just the head of the actual report.
Proposed solution
Replace the "head-truncation = summary" model with an explicit two-output model:
-
Below threshold → full inline message, no attachment. Unchanged. ("Brevity is preserved naturally.")
-
Above threshold → LLM-generated summary inline + the full markdown attached. The summary is a real summary (3–5 bullets / 1–2 paragraphs / configurable), produced by:
- Preferred: a dedicated summarisation step in the agent loop — when
len(finalResponse) > threshold, run one extra LLM call asking the model to produce a short summary of its own output, attach the original full output as the markdown file, return both. The runtime hands the channel adapter a Summary field on a2a.Message (or a designated summary file part) so the adapter does not have to guess.
- Fallback when the agent runs without a summariser (custom skill, no LLM in scope): keep the existing head-truncation but rename it from
summary to preview everywhere so the contract is honest.
-
Configurable thresholds in forge.yaml under a new channels.response block:
channels:
response:
inline_max_chars: 4096 # default — full message inline up to this
summary_target_chars: 800 # target length for LLM-generated summary
attach_full_report: true # if false, just trim and skip the attachment
attachment_filename: research-report.md
Per-channel overrides under channels.slack.response / channels.telegram.response.
-
A2A type extension. Add Summary string to a2a.Message (or a typed PartKindSummary). The runtime fills it when it produced one; channels prefer it over SplitSummaryAndReport(text). This is the contract that lets us evolve the summariser without touching every channel.
-
Make SplitSummaryAndReport the fallback only. It stays, but is no longer the primary code path. Renaming to headPreview would also be reasonable.
Alternatives considered
- Just make the thresholds configurable, keep the head-truncation. Easier to ship, doesn't solve the actual quality problem — users still see a head cut, just at a different offset.
- Always send the full message as an attachment, no inline text. Loses Slack/Telegram's native preview affordance; people scrolling the channel see nothing of substance. Rejected.
- Use the first N lines instead of first N chars. Marginal improvement, still not a summary; a 12-line introduction is no more useful than a 500-char one.
- Ask the channel platform to do the previewing. Slack/Telegram render the first ~3 lines of a file attachment as a preview, but this is platform-specific, not styleable, and doesn't help us emit a thoughtful summary message.
Additional context
- Affected files:
forge-plugins/channels/markdown/split.go, forge-plugins/channels/slack/slack.go, forge-plugins/channels/telegram/telegram.go, forge-core/runtime/loop.go (file-part attachment), forge-core/a2a/types.go (if we add Summary), docs/core-concepts/channels.md ("Large Response Handling" section currently says "first paragraph, up to 600 characters" — would need rewriting).
- Today's behaviour is documented at
docs/core-concepts/channels.md:198-207 ("Large Response Handling"). That doc would be the natural place to describe the new contract.
- Backwards compatibility: the
Summary field is additive on a2a.Message; absent it, channel adapters fall back to SplitSummaryAndReport exactly as today. Configurable thresholds default to today's hardcoded values. No breaking change.
- Related: the file-part path (
extractLargestFile in slack.go:935, telegram.go:621) attaches the largest tool output as the channel file when present. That path also benefits from this change — once the runtime can emit a real summary, the file-part case can stop using the LLM's prose text as a "summary" too.
Component
forge-plugins (channel plugins, markdown converter)
Scope
Medium (multiple files / new command)
Problem statement
When an agent response is large, channel adapters (Slack, Telegram) already split it into an inline message plus a
research-report.mdattachment. The mechanism works end-to-end, but the inline message is not a summary — it is just the head of the report sliced at a sentence boundary. For a long research response with one big opening paragraph, the user sees the first 500 characters of the report in the channel followed byFull report attached as file above., and has to download the markdown to find out what the report actually concludes.This breaks the goal of "response brevity": the brevity that exists today is accidental (we ran out of budget after 500 chars), not intentional (we summarised the work).
Current behaviour — Slack
forge-plugins/channels/slack/slack.go:539-605:forge-core/runtime/loop.go:189)SplitSummaryAndReportto first paragraph capped at 600 chars → uploaded file asresearch-report.md(or runtime-supplied name)len(text) > 4096summary, report := SplitSummaryAndReport(text)—report == textverbatim,summaryis first paragraph if ≤ 600 chars elsetruncateAtSentence(text, 500)len(text) ≤ 4096Telegram (
forge-plugins/channels/telegram/telegram.go:322-405) mirrors the same logic with 4096 / 600 / 500 char thresholds.The actual "summarizer" —
forge-plugins/channels/markdown/split.gotruncateAtSentencewalks backwards from char 500 looking for.,!, or?. There is no semantic summarisation step. If the LLM happened to produce a structured# Header\\n\\n…short lede…\\n\\n…body…document, the lede becomes the "summary" by luck. If it produced one long unbroken paragraph (very common for research output), the user gets a mid-sentence head-cut.Symptoms users see
Proposed solution
Replace the "head-truncation = summary" model with an explicit two-output model:
Below threshold → full inline message, no attachment. Unchanged. ("Brevity is preserved naturally.")
Above threshold → LLM-generated summary inline + the full markdown attached. The summary is a real summary (3–5 bullets / 1–2 paragraphs / configurable), produced by:
len(finalResponse) > threshold, run one extra LLM call asking the model to produce a short summary of its own output, attach the original full output as the markdown file, return both. The runtime hands the channel adapter aSummaryfield ona2a.Message(or a designated summary file part) so the adapter does not have to guess.summarytoprevieweverywhere so the contract is honest.Configurable thresholds in
forge.yamlunder a newchannels.responseblock:Per-channel overrides under
channels.slack.response/channels.telegram.response.A2A type extension. Add
Summary stringtoa2a.Message(or a typedPartKindSummary). The runtime fills it when it produced one; channels prefer it overSplitSummaryAndReport(text). This is the contract that lets us evolve the summariser without touching every channel.Make
SplitSummaryAndReportthe fallback only. It stays, but is no longer the primary code path. Renaming toheadPreviewwould also be reasonable.Alternatives considered
Additional context
forge-plugins/channels/markdown/split.go,forge-plugins/channels/slack/slack.go,forge-plugins/channels/telegram/telegram.go,forge-core/runtime/loop.go(file-part attachment),forge-core/a2a/types.go(if we addSummary),docs/core-concepts/channels.md("Large Response Handling" section currently says "first paragraph, up to 600 characters" — would need rewriting).docs/core-concepts/channels.md:198-207("Large Response Handling"). That doc would be the natural place to describe the new contract.Summaryfield is additive ona2a.Message; absent it, channel adapters fall back toSplitSummaryAndReportexactly as today. Configurable thresholds default to today's hardcoded values. No breaking change.extractLargestFileinslack.go:935,telegram.go:621) attaches the largest tool output as the channel file when present. That path also benefits from this change — once the runtime can emit a real summary, the file-part case can stop using the LLM's prose text as a "summary" too.