Streaming pipeline has multiple output failure modes: duplicate content, cut-off replies, blank bubbles, and stale thread context

## Updated scope

This issue was filed as a duplicate-content-at-continuation-boundary bug. Based on observed behavior across multiple sessions, the streaming pipeline has a wider set of failure modes that all trace to the same structural source. Expanding scope here.

---

## Failure modes (all observed in production)

### 1. Duplicate / repeated content at chunk boundary

A long reply (≥2200 chars after tool work) splits into a stream bubble + continuation post. Content near the split boundary appears to repeat or restart because:

- `shouldAutoStartStreaming` fires at `deltaCount >= 2` on provisional pre-tool prose
- `onToolCall` arrives after streaming has already started — cannot retract
- `awaitingPostToolAssistantMessage` is NOT set, so `onAssistantMessageStart` after tool completion does NOT reset `streamedReplyState`
- Final reply appends to the same accumulator as the provisional prefix
- Combined length (provisional + final) misaligns the split point vs. what the user expects to see
- Slack renders the bubble ending mid-content and the continuation starting at what looks like a duplicated section

Related: #200

### 2. Cut-off / truncated replies

Replies that overflow the continuation budget are split via `splitSlackReplyText`. The split is computed on accumulated raw delta text, but `normalizeForSlack` (runs `ensureBlockSpacing`) expands the rendered text after the fact by inserting blank lines between content blocks. The result:

- The rendered content in the stream bubble is longer than the raw character count suggests
- The continuation starts at a raw-offset boundary that doesn't correspond to a clean semantic break
- To the user: the first message appears cut off mid-sentence; the second message re-starts in the middle of content they've already seen

Additionally: the continuation post has no guaranteed delivery. If `RetryableTurnError` fires (JUNIOR-1D, still active post-0.23.0) or the post fails, the overflow content is silently lost — no truncation marker, no user-visible signal.

Related: #187

### 3. Blank message / stream stall

The `pendingStreamText` redundancy gate in `reply-executor.ts` holds all deltas until the accumulated text doesn't match any known ack prefix ("ok", "sure", "let me", "on it", partial emoji). Simultaneously, `createNormalizingStream` in the streaming path holds content until a newline is seen. Both gates compound:

- The stream message post is created by Slack before any content lands
- The user sees a blank bubble for 15–60+ seconds
- For tool-heavy turns, this window extends further because the LLM emits no text deltas during tool execution

Related: #97

### 4. Missing or stale thread context

In multi-turn threads, the runtime injects thread transcript context at the start of each turn. Several confirmed failure modes:

- **The injected transcript lags the live thread.** If a turn fails (RetryableTurnError, delivery error), the transcript still reflects the last successful state — the bot "doesn't know" about its own failed or partial reply in the previous turn and cannot correct for it.
- **Provisional pre-tool text in a failed stream doesn't get cleaned up.** If a stream starts (bubble created), tool execution fails, and the turn errors out, the partial bubble stays in Slack. On the next turn, the transcript doesn't include that orphaned bubble, creating a permanent disconnect between what the user sees and what the bot knows about the thread.
- **Split replies are seen as one message by the runtime but two by the user.** The bot tracks that it sent "the answer" but doesn't know the user saw it split mid-sentence. This makes the bot confidently describe a complete reply when the user experienced a truncated one.

### 5. Output/streaming disconnect (general)

The structural problem across all four failure modes: the pipeline commits a visible Slack artifact (stream bubble) before it knows whether the content is final, complete, or correctly budgeted. There is no retraction path and no delivery confirmation signal fed back into the runtime's thread model. Concrete gaps:

- No span/log when `streamOverflowed` triggers — no way to correlate "overflow happened" with "user saw duplicate"
- No TTL or orphan cleanup for stream bubbles that were opened but whose turns failed
- No signal from Slack delivery back to the transcript/context layer — the bot's view of the thread is permanently optimistic

---

## Root cause summary

All four failure modes share one upstream source: **`shouldAutoStartStreaming` fires too early (at `deltaCount >= 2`) and there is no retraction path once a stream bubble is opened.**

Fix hierarchy:
1. **Fix #200** — tighten the stream-start gate so provisional prose cannot start the stream before tool intent is known. This is the root condition for failure modes 1 and 4.
2. **Fix #97** — remove `createNormalizingStream` from the streaming delivery path (normalization at finalize time only). This eliminates the compound buffering that causes the blank-bubble stall.
3. **Add overflow observability** — emit a span when `streamOverflowed` triggers, log stream bubble IDs so orphaned bubbles can be detected and cleaned up.
4. **Add a truncation/delivery guarantee** — if the continuation post fails, post a short fallback marker rather than silently dropping content.
5. **Thread context sync** — investigate whether orphaned stream bubbles and failed turns can be tracked so the transcript reflects actual user-visible state, not just what was successfully posted.

---

## Related

- #200 — provisional pre-tool text leak (root of failure modes 1, 4)
- #97 — blank message / stream stall (failure mode 3)
- #187 — long reply truncation (failure mode 2)
- Sentry: `JUNIOR-1D` (RetryableTurnError, active post-0.23.0)
- Sentry: `JUNIOR-1G` (message_not_in_streaming_state, pre-0.23.0, monitoring for recurrence)

Action taken on behalf of David Cramer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming pipeline has multiple output failure modes: duplicate content, cut-off replies, blank bubbles, and stale thread context #202

Updated scope

Failure modes (all observed in production)

1. Duplicate / repeated content at chunk boundary

2. Cut-off / truncated replies

3. Blank message / stream stall

4. Missing or stale thread context

5. Output/streaming disconnect (general)

Root cause summary

Related

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Streaming pipeline has multiple output failure modes: duplicate content, cut-off replies, blank bubbles, and stale thread context #202

Description

Updated scope

Failure modes (all observed in production)

1. Duplicate / repeated content at chunk boundary

2. Cut-off / truncated replies

3. Blank message / stream stall

4. Missing or stale thread context

5. Output/streaming disconnect (general)

Root cause summary

Related

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions