[Feature]: Channel responses should produce an LLM-generated summary alongside the markdown attachment, not a head-truncation

### Component

forge-plugins (channel plugins, markdown converter)

### Scope

Medium (multiple files / new command)

### Problem statement

When an agent response is large, channel adapters (Slack, Telegram) already split it into an inline message plus a `research-report.md` attachment. The mechanism works end-to-end, but the inline message is not a *summary* — it is just the **head of the report** sliced at a sentence boundary. For a long research response with one big opening paragraph, the user sees the first 500 characters of the report in the channel followed by `Full report attached as file above.`, and has to download the markdown to find out what the report actually concludes.

This breaks the goal of \"response brevity\": the brevity that exists today is accidental (we ran out of budget after 500 chars), not intentional (we summarised the work).

#### Current behaviour — Slack

`forge-plugins/channels/slack/slack.go:539-605`:

| Condition | What gets sent to the channel |
|---|---|
| Runtime attached a file part (tool output > 8000 chars, see ` forge-core/runtime/loop.go:189`) | LLM text → truncated by ` SplitSummaryAndReport` to first paragraph capped at 600 chars → uploaded file as `research-report.md` (or runtime-supplied name) |
| No file part, `len(text) > 4096` | `summary, report := SplitSummaryAndReport(text)` — **`report == text` verbatim**, `summary` is first paragraph if ≤ 600 chars else `truncateAtSentence(text, 500)` |
| No file part, `len(text) ≤ 4096` | Full text, chunked at 4000 chars |

Telegram (`forge-plugins/channels/telegram/telegram.go:322-405`) mirrors the same logic with 4096 / 600 / 500 char thresholds.

#### The actual \"summarizer\" — `forge-plugins/channels/markdown/split.go`

```go
func SplitSummaryAndReport(text string) (summary, report string) {
    report = text
    if idx := strings.Index(text, \"\\n\\n\"); idx > 0 && idx <= 600 {
        summary = text[:idx]
    } else if idx > 600 {
        summary = truncateAtSentence(text, 500)
    } else {
        summary = truncateAtSentence(text, 500)
    }
    summary = strings.TrimSpace(summary)
    return summary, report
}
```

`truncateAtSentence` walks backwards from char 500 looking for `.`, `!`, or `?`. There is no semantic summarisation step. If the LLM happened to produce a structured `# Header\\n\\n…short lede…\\n\\n…body…` document, the lede becomes the \"summary\" by luck. If it produced one long unbroken paragraph (very common for research output), the user gets a mid-sentence head-cut.

#### Symptoms users see

1. The inline message ends abruptly mid-thought after the first ~500 chars.
2. The inline message often contains the report's introduction (\"I will analyse X by …\") but never its conclusion (\"…and the answer is Y\"). Users have to download the attachment to find the answer.
3. The thresholds (4096 char trigger, 600 char paragraph cap, 500 char fallback, 8000 char tool-output trigger) are all magic numbers, not configurable. There is no way for an operator to say \"my Slack channel prefers short summaries even for 2K-char responses\" or \"my Telegram channel can take 10K-char text natively, don't attach a file at all\".
4. When the runtime attaches a tool-output file part, the LLM text we use as the \"summary\" is itself whatever the LLM produced — which may already be 4 KB of prose. We then run that 4 KB through the same head-truncation. The summary is still just the head of the LLM text, which is itself just the head of the actual report.

### Proposed solution

Replace the \"head-truncation = summary\" model with an explicit two-output model:

1. **Below threshold → full inline message, no attachment.** Unchanged. (\"Brevity is preserved naturally.\")

2. **Above threshold → LLM-generated summary inline + the full markdown attached.** The summary is a real summary (3–5 bullets / 1–2 paragraphs / configurable), produced by:
   - **Preferred**: a dedicated summarisation step in the agent loop — when `len(finalResponse) > threshold`, run one extra LLM call asking the model to produce a short summary of its own output, attach the original full output as the markdown file, return both. The runtime hands the channel adapter a `Summary` field on `a2a.Message` (or a designated summary file part) so the adapter does not have to guess.
   - **Fallback** when the agent runs without a summariser (custom skill, no LLM in scope): keep the existing head-truncation but rename it from `summary` to `preview` everywhere so the contract is honest.

3. **Configurable thresholds in `forge.yaml`** under a new `channels.response` block:

   ```yaml
   channels:
     response:
       inline_max_chars: 4096        # default — full message inline up to this
       summary_target_chars: 800     # target length for LLM-generated summary
       attach_full_report: true      # if false, just trim and skip the attachment
       attachment_filename: research-report.md
   ```

   Per-channel overrides under `channels.slack.response` / `channels.telegram.response`.

4. **A2A type extension.** Add `Summary string` to `a2a.Message` (or a typed `PartKindSummary`). The runtime fills it when it produced one; channels prefer it over `SplitSummaryAndReport(text)`. This is the contract that lets us evolve the summariser without touching every channel.

5. **Make `SplitSummaryAndReport` the fallback only.** It stays, but is no longer the primary code path. Renaming to `headPreview` would also be reasonable.

### Alternatives considered

- **Just make the thresholds configurable, keep the head-truncation.** Easier to ship, doesn't solve the actual quality problem — users still see a head cut, just at a different offset.
- **Always send the full message as an attachment, no inline text.** Loses Slack/Telegram's native preview affordance; people scrolling the channel see nothing of substance. Rejected.
- **Use the first N lines instead of first N chars.** Marginal improvement, still not a summary; a 12-line introduction is no more useful than a 500-char one.
- **Ask the channel platform to do the previewing.** Slack/Telegram render the first ~3 lines of a file attachment as a preview, but this is platform-specific, not styleable, and doesn't help us emit a thoughtful summary message.

### Additional context

- Affected files: `forge-plugins/channels/markdown/split.go`, `forge-plugins/channels/slack/slack.go`, `forge-plugins/channels/telegram/telegram.go`, `forge-core/runtime/loop.go` (file-part attachment), `forge-core/a2a/types.go` (if we add `Summary`), `docs/core-concepts/channels.md` (\"Large Response Handling\" section currently says \"first paragraph, up to 600 characters\" — would need rewriting).
- Today's behaviour is documented at `docs/core-concepts/channels.md:198-207` (\"Large Response Handling\"). That doc would be the natural place to describe the new contract.
- Backwards compatibility: the `Summary` field is additive on `a2a.Message`; absent it, channel adapters fall back to `SplitSummaryAndReport` exactly as today. Configurable thresholds default to today's hardcoded values. No breaking change.
- Related: the file-part path (`extractLargestFile` in `slack.go:935`, `telegram.go:621`) attaches the largest tool output as the channel file when present. That path also benefits from this change — once the runtime can emit a real summary, the file-part case can stop using the LLM's prose text as a \"summary\" too.

Condition	What gets sent to the channel
Runtime attached a file part (tool output > 8000 chars, see `forge-core/runtime/loop.go:189`)	LLM text → truncated by `SplitSummaryAndReport` to first paragraph capped at 600 chars → uploaded file as `research-report.md` (or runtime-supplied name)
No file part, `len(text) > 4096`	`summary, report := SplitSummaryAndReport(text)` — `report == text` verbatim, `summary` is first paragraph if ≤ 600 chars else `truncateAtSentence(text, 500)`
No file part, `len(text) ≤ 4096`	Full text, chunked at 4000 chars

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Channel responses should produce an LLM-generated summary alongside the markdown attachment, not a head-truncation #64

Component

Scope

Problem statement

Current behaviour — Slack

The actual "summarizer" — `forge-plugins/channels/markdown/split.go`

Symptoms users see

Proposed solution

Alternatives considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Channel responses should produce an LLM-generated summary alongside the markdown attachment, not a head-truncation #64

Description

Component

Scope

Problem statement

Current behaviour — Slack

The actual "summarizer" — forge-plugins/channels/markdown/split.go

Symptoms users see

Proposed solution

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The actual "summarizer" — `forge-plugins/channels/markdown/split.go`