Vision
Each rendered chat part (thinking, tool call, error, code block, etc.) is a container that holds its full data and renders one of two views:
- Summary view (default): compact one-liner or truncated body
- Detail view (on interaction): full content rendered from the same underlying data
The data is already streamed end-to-end from the backend — we just choose which view to render. Toggling expansion is a pure view-state flip on the part, no fetching or loading state.
Why
Today the chat view is a flat string blob in a viewport. Each `MessagePart` is rendered once into a `string` and concatenated. There's no way to:
- Expand a thinking part to see the full reasoning (currently shows only the first 60 chars)
- Reveal full tool call args + duration + tool_call_id under a tool call header
- Show stack traces under errors
- Expand truncated code/diff blocks
The user has all the data — it's stored on `MessagePart.ToolCall`, `Content`, etc. We just don't surface it.
Conceptual model
```
ThinkingPart (container)
├─ summary: "thinking: "
└─ detail: full reasoning markdown
ToolCallPart (container)
├─ summary: ✓ [read_file] (current header)
└─ detail: + full args JSON, + duration, + tool_call_id, + raw result
ErrorPart (container)
├─ summary: error message
└─ detail: + stack trace, + recoverable, + context
CodePart / DiffPart (containers)
├─ summary: first N lines or hunk count
└─ detail: full file / full diff
```
The component itself owns both views. The renderer reads an `Expanded bool` and dispatches.
The interaction problem
Bubble Tea has zero hit-testing. The view function returns a single `string`; the framework writes it to the terminal. There's no widget tree, no spatial registry. Mouse clicks arrive as raw `(X, Y)` coordinates with no awareness of what was rendered there.
To make parts clickable, we need to build the bridge ourselves: track which content lines belong to which part during render, then translate click coordinates → content line → part identity → toggle action.
Approaches considered
A. Region map (pragmatic, ~100 lines)
During render, accumulate `[]Region{startLine, endLine, msgIdx, partIdx}` as a side effect. On click: convert screen Y → viewport Y → content Y, walk regions, find part, toggle `Expanded`. Render and hit-test live in the same pass. Doesn't restructure the model.
B. Component-per-part (the right shape, ~500+ lines)
Each part type becomes a Bubble Tea sub-model with its own `Init/Update/View/LineCount`. Chat model holds `[]Message{Parts: []Interactive}`. Renderer walks parts accumulating Y offsets. Updates are localized — clicking a thinking part calls its own `Update`, doesn't touch siblings. Still needs a region map for dispatch routing.
C. Renderable type with embedded metadata (~150 lines)
Render functions return `Renderable{Content string, Regions []Region}` instead of `string`. Chat model concatenates them, regions stitch together with running line offsets. Cleaner separation than A — spatial tracking factored into a small library.
D. Cards pattern
Each addressable thing is a Card with `{ID, Render(expanded), Click()}`. Sharper unit of interactivity than full sub-models, same dispatch problem.
Recommendation
Long-term shape is B (hierarchical sub-components) because the conceptual model maps cleanly: each part owns its data and its render variants, expansion state is local. But this is a real refactor.
Short-term, we could ship A to get expandable parts working with minimal disruption, then graduate to B when we want richer per-part behavior (animations, scrolling within parts, nested interactivity).
Building blocks already available
- `tea.MouseClickMsg` / `tea.MouseReleaseMsg` / `tea.MouseMotionMsg` — Bubble Tea v2 dispatches all of these
- `lipgloss.Height(rendered)` — line count of any rendered chunk
- `viewport.YOffset()` — current scroll position
- `Mouse handling already wired` — chat model routes `MouseWheelMsg` for scrolling, so the program is mouse-enabled
Out of scope
- Hover highlighting (phase 2)
- Visual focus cursor for keyboard navigation (phase 2)
- Animated expand/collapse (terminal animation is limited)
- Nested clickables within an expanded part (deferred)
Acceptance criteria (for the eventual implementation)
References
- Discussion in EP-008 implementation session
- Spec: `docs/specs/2026-03-22-message-parts.md`
- Bubble Tea is on the "no built-in hit-testing" side, like Ratatui — Textual / Flutter / DOM all maintain widget trees with bounding boxes
Vision
Each rendered chat part (thinking, tool call, error, code block, etc.) is a container that holds its full data and renders one of two views:
The data is already streamed end-to-end from the backend — we just choose which view to render. Toggling expansion is a pure view-state flip on the part, no fetching or loading state.
Why
Today the chat view is a flat string blob in a viewport. Each `MessagePart` is rendered once into a `string` and concatenated. There's no way to:
The user has all the data — it's stored on `MessagePart.ToolCall`, `Content`, etc. We just don't surface it.
Conceptual model
```
ThinkingPart (container)
├─ summary: "thinking: "
└─ detail: full reasoning markdown
ToolCallPart (container)
├─ summary: ✓ [read_file] (current header)
└─ detail: + full args JSON, + duration, + tool_call_id, + raw result
ErrorPart (container)
├─ summary: error message
└─ detail: + stack trace, + recoverable, + context
CodePart / DiffPart (containers)
├─ summary: first N lines or hunk count
└─ detail: full file / full diff
```
The component itself owns both views. The renderer reads an `Expanded bool` and dispatches.
The interaction problem
Bubble Tea has zero hit-testing. The view function returns a single `string`; the framework writes it to the terminal. There's no widget tree, no spatial registry. Mouse clicks arrive as raw `(X, Y)` coordinates with no awareness of what was rendered there.
To make parts clickable, we need to build the bridge ourselves: track which content lines belong to which part during render, then translate click coordinates → content line → part identity → toggle action.
Approaches considered
A. Region map (pragmatic, ~100 lines)
During render, accumulate `[]Region{startLine, endLine, msgIdx, partIdx}` as a side effect. On click: convert screen Y → viewport Y → content Y, walk regions, find part, toggle `Expanded`. Render and hit-test live in the same pass. Doesn't restructure the model.
B. Component-per-part (the right shape, ~500+ lines)
Each part type becomes a Bubble Tea sub-model with its own `Init/Update/View/LineCount`. Chat model holds `[]Message{Parts: []Interactive}`. Renderer walks parts accumulating Y offsets. Updates are localized — clicking a thinking part calls its own `Update`, doesn't touch siblings. Still needs a region map for dispatch routing.
C. Renderable type with embedded metadata (~150 lines)
Render functions return `Renderable{Content string, Regions []Region}` instead of `string`. Chat model concatenates them, regions stitch together with running line offsets. Cleaner separation than A — spatial tracking factored into a small library.
D. Cards pattern
Each addressable thing is a Card with `{ID, Render(expanded), Click()}`. Sharper unit of interactivity than full sub-models, same dispatch problem.
Recommendation
Long-term shape is B (hierarchical sub-components) because the conceptual model maps cleanly: each part owns its data and its render variants, expansion state is local. But this is a real refactor.
Short-term, we could ship A to get expandable parts working with minimal disruption, then graduate to B when we want richer per-part behavior (animations, scrolling within parts, nested interactivity).
Building blocks already available
Out of scope
Acceptance criteria (for the eventual implementation)
References