Skip to content

Streaming pipeline has multiple output failure modes: duplicate content, cut-off replies, blank bubbles, and stale thread context #202

@sentry-junior

Description

@sentry-junior

Updated scope

This issue was filed as a duplicate-content-at-continuation-boundary bug. Based on observed behavior across multiple sessions, the streaming pipeline has a wider set of failure modes that all trace to the same structural source. Expanding scope here.


Failure modes (all observed in production)

1. Duplicate / repeated content at chunk boundary

A long reply (≥2200 chars after tool work) splits into a stream bubble + continuation post. Content near the split boundary appears to repeat or restart because:

  • shouldAutoStartStreaming fires at deltaCount >= 2 on provisional pre-tool prose
  • onToolCall arrives after streaming has already started — cannot retract
  • awaitingPostToolAssistantMessage is NOT set, so onAssistantMessageStart after tool completion does NOT reset streamedReplyState
  • Final reply appends to the same accumulator as the provisional prefix
  • Combined length (provisional + final) misaligns the split point vs. what the user expects to see
  • Slack renders the bubble ending mid-content and the continuation starting at what looks like a duplicated section

Related: #200

2. Cut-off / truncated replies

Replies that overflow the continuation budget are split via splitSlackReplyText. The split is computed on accumulated raw delta text, but normalizeForSlack (runs ensureBlockSpacing) expands the rendered text after the fact by inserting blank lines between content blocks. The result:

  • The rendered content in the stream bubble is longer than the raw character count suggests
  • The continuation starts at a raw-offset boundary that doesn't correspond to a clean semantic break
  • To the user: the first message appears cut off mid-sentence; the second message re-starts in the middle of content they've already seen

Additionally: the continuation post has no guaranteed delivery. If RetryableTurnError fires (JUNIOR-1D, still active post-0.23.0) or the post fails, the overflow content is silently lost — no truncation marker, no user-visible signal.

Related: #187

3. Blank message / stream stall

The pendingStreamText redundancy gate in reply-executor.ts holds all deltas until the accumulated text doesn't match any known ack prefix ("ok", "sure", "let me", "on it", partial emoji). Simultaneously, createNormalizingStream in the streaming path holds content until a newline is seen. Both gates compound:

  • The stream message post is created by Slack before any content lands
  • The user sees a blank bubble for 15–60+ seconds
  • For tool-heavy turns, this window extends further because the LLM emits no text deltas during tool execution

Related: #97

4. Missing or stale thread context

In multi-turn threads, the runtime injects thread transcript context at the start of each turn. Several confirmed failure modes:

  • The injected transcript lags the live thread. If a turn fails (RetryableTurnError, delivery error), the transcript still reflects the last successful state — the bot "doesn't know" about its own failed or partial reply in the previous turn and cannot correct for it.
  • Provisional pre-tool text in a failed stream doesn't get cleaned up. If a stream starts (bubble created), tool execution fails, and the turn errors out, the partial bubble stays in Slack. On the next turn, the transcript doesn't include that orphaned bubble, creating a permanent disconnect between what the user sees and what the bot knows about the thread.
  • Split replies are seen as one message by the runtime but two by the user. The bot tracks that it sent "the answer" but doesn't know the user saw it split mid-sentence. This makes the bot confidently describe a complete reply when the user experienced a truncated one.

5. Output/streaming disconnect (general)

The structural problem across all four failure modes: the pipeline commits a visible Slack artifact (stream bubble) before it knows whether the content is final, complete, or correctly budgeted. There is no retraction path and no delivery confirmation signal fed back into the runtime's thread model. Concrete gaps:

  • No span/log when streamOverflowed triggers — no way to correlate "overflow happened" with "user saw duplicate"
  • No TTL or orphan cleanup for stream bubbles that were opened but whose turns failed
  • No signal from Slack delivery back to the transcript/context layer — the bot's view of the thread is permanently optimistic

Root cause summary

All four failure modes share one upstream source: shouldAutoStartStreaming fires too early (at deltaCount >= 2) and there is no retraction path once a stream bubble is opened.

Fix hierarchy:

  1. Fix Streaming leaks provisional pre-tool text into Slack replies, causing partial sentence then pause #200 — tighten the stream-start gate so provisional prose cannot start the stream before tool intent is known. This is the root condition for failure modes 1 and 4.
  2. Fix Blank message shown for ~1min before streaming text appears #97 — remove createNormalizingStream from the streaming delivery path (normalization at finalize time only). This eliminates the compound buffering that causes the blank-bubble stall.
  3. Add overflow observability — emit a span when streamOverflowed triggers, log stream bubble IDs so orphaned bubbles can be detected and cleaned up.
  4. Add a truncation/delivery guarantee — if the continuation post fails, post a short fallback marker rather than silently dropping content.
  5. Thread context sync — investigate whether orphaned stream bubbles and failed turns can be tracked so the transcript reflects actual user-visible state, not just what was successfully posted.

Related

Action taken on behalf of David Cramer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions