Skip to content

fix(client): network drops reconnecting behaviour#3775

Merged
icecrasher321 merged 8 commits intostagingfrom
fix/client-disconnect-mothership
Mar 26, 2026
Merged

fix(client): network drops reconnecting behaviour#3775
icecrasher321 merged 8 commits intostagingfrom
fix/client-disconnect-mothership

Conversation

@icecrasher321
Copy link
Collaborator

Summary

Client side disconnect resilience

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@cursor
Copy link

cursor bot commented Mar 25, 2026

PR Summary

Medium Risk
Moderate risk: refactors client SSE stream processing/reconnect flow and message queue behavior, which can affect chat delivery/order and stop/retry semantics during network drops.

Overview
Improves chat resilience to network drops by introducing a more robust SSE stream reattach/replay flow that tracks eventIds, detects terminal stream statuses, and can reopen the live tail after non-terminal disconnects.

Updates useChat to handle active-stream conflicts by restoring prior optimistic state, rehydrating chat history, and queueing a pending recovery message that is auto-sent once the stream is recovered; UI (MothershipChat, Home, workflow panel) now treats isReconnecting as stream-active for autoscroll/streaming indicators.

Adds extra server-side SSE logging around client disconnect/cancel in chat-streaming.ts, and exports fetchChatHistory for recovery flows.

Written by Cursor Bugbot for commit 03d6cab. Configure here.

@vercel
Copy link

vercel bot commented Mar 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 26, 2026 0:25am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR adds client-side disconnect resilience for SSE chat streams. When a live stream is interrupted by a network drop, the client now re-attaches by replaying buffered events from a batch endpoint and then opening a fresh SSE tail — looping until the server signals a terminal status. It also handles the race condition where a new message is sent while the server still has an active stream for the same chat (conflict error), rolling back the optimistic UI update and queuing the user's message as a pendingRecoveryMessage to be retried automatically once the in-flight stream finishes.

Key changes:

  • use-chat.ts: Extracts attachToExistingStream, fetchStreamBatch, and applyChatHistorySnapshot callbacks; upgrades processSSEStream to track eventId and support replaying into existing state (preserveExistingState); adds pendingRecoveryMessage state for conflict-retry queuing.
  • mothership-chat.tsx / home.tsx / panel.tsx: Propagates new isReconnecting prop so the UI correctly disables input and shows streaming indicators during reconnection.
  • tasks.ts: Exports fetchChatHistory for direct use during recovery.
  • chat-streaming.ts: Extracts markClientDisconnected helper and adds structured logging on stream close/cancel.

One concrete bug found: stopGeneration clears pendingRecoveryMessageRef.current (the ref) but never calls setPendingRecoveryMessage(null) (the React state). This leaves the recovery message permanently stuck in visibleMessageQueue — the "Send Now" and remove buttons both operate on the already-nulled ref and silently do nothing, making the item impossible to dismiss without a page reload.

Confidence Score: 3/5

  • Not yet safe to merge — the stopGeneration / pendingRecoveryMessage state mismatch will permanently wedge the queue UI in specific but reachable user flows.
  • The architectural approach is sound and the happy-path reconnect logic is well-structured. Five of the six changed files are clean. The single concrete P1 bug — stopGeneration clearing only the ref and not the React state for pendingRecoveryMessage — produces a permanently stuck UI element that cannot be dismissed, which is a visible regression in the conflict-recovery path. A one-line fix (setPendingRecoveryMessage(null)) resolves it.
  • apps/sim/app/workspace/[workspaceId]/home/hooks/use-chat.ts — specifically the stopGeneration function and the two navigation reset effects.

Important Files Changed

Filename Overview
apps/sim/app/workspace/[workspaceId]/home/hooks/use-chat.ts Major refactor extracting stream-recovery logic into attachToExistingStream, fetchStreamBatch, and applyChatHistorySnapshot callbacks; introduces pendingRecoveryMessage for conflict-aware retry — contains a P1 bug where stopGeneration clears the ref but not the React state, permanently wedging the recovery message in the queue UI.
apps/sim/app/workspace/[workspaceId]/home/components/mothership-chat/mothership-chat.tsx Adds optional isReconnecting prop and derives `isStreamActive = isSending
apps/sim/app/workspace/[workspaceId]/home/home.tsx Threads isReconnecting from useChat through to MothershipChat — minimal, correct wiring change.
apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/panel.tsx Passes copilotIsReconnecting to MothershipChat in the panel copilot view — correct prop threading.
apps/sim/hooks/queries/tasks.ts Exports fetchChatHistory so it can be called directly from use-chat.ts during recovery — safe, minimal change.
apps/sim/lib/copilot/chat-streaming.ts Extracts markClientDisconnected helper to deduplicate the flag-setting and add structured logging; adds a completion-after-disconnect log and a cancel() log — observability improvements, no logic change.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant SC as sendMessage
    participant ATES as attachToExistingStream
    participant FSB as fetchStreamBatch
    participant PSS as processSSEStream
    participant SG as stopGeneration
    participant F as finalize

    U->>SC: send message
    SC->>SC: POST /api/copilot/chat
    alt Primary stream ends cleanly
        SC->>PSS: processSSEStream(reader, assistantId)
        PSS-->>SC: { sawStreamError, sawDoneEvent, lastEventId }
        SC->>FSB: fetchStreamBatch(streamId, lastEventId, signal)
        FSB-->>SC: { status, events }
        alt Terminal status
            SC->>F: finalize()
        else Non-terminal (network drop)
            SC->>ATES: attachToExistingStream(...)
            loop Until terminal status or abort
                ATES->>PSS: processSSEStream(replayStream)
                PSS-->>ATES: termination result
                ATES->>ATES: open live SSE tail
                PSS-->>ATES: live termination result
                ATES->>FSB: fetchStreamBatch(next batch)
                FSB-->>ATES: { status, events }
            end
            ATES-->>SC: { aborted, error }
            SC->>F: finalize()
        end
    else Conflict error (stream already active)
        SC->>SC: rollback optimistic state
        SC->>SC: setPendingRecoveryMessage(queuedMessage)
        SC->>SC: preparePendingStreamRecovery(chatId)
        SC->>ATES: attachToExistingStream(existing stream)
        ATES-->>SC: { aborted, error }
        SC->>F: finalize()
        F->>F: detect pendingRecoveryMessage
        F->>SC: sendMessage(recoveryMessage) [via queueMicrotask]
    end

    U-->>SG: stop generation (optional)
    SG->>SG: abort controller, clear refs
    Note over SG: ⚠️ pendingRecoveryMessage state NOT cleared
Loading

Comments Outside Diff (1)

  1. apps/sim/app/workspace/[workspaceId]/home/hooks/use-chat.ts, line 1852-1860 (link)

    P1 stopGeneration leaves pendingRecoveryMessage state permanently stuck

    stopGeneration correctly sets pendingRecoveryMessageRef.current = null (the ref), but never calls setPendingRecoveryMessage(null) (the React state). This creates a permanently broken UI state:

    1. The visibleMessageQueue memo derives from pendingRecoveryMessage (state), so the recovery message stays visible in the queue.
    2. When the user tries Send Now on that stuck item, sendNow reads from pendingRecoveryMessageRef.current — which is now null — so it can't find the message and returns early silently.
    3. When the user tries to Remove the item, removeFromQueue also checks the ref first (pendingRecoveryMessageRef.current?.id === idfalse), then searches messageQueueRef, which doesn't have it either — so setPendingRecoveryMessage(null) is never reached.

    The recovery message is permanently wedged in the queue until the component unmounts.

    The same omission exists in the navigatedToDifferentChat cleanup path and the isHomePage effect reset, where setPendingRecoveryMessage(null) is also never called.

Reviews (2): Last reviewed commit: "address queued message conflicts during ..." | Re-trigger Greptile

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

@greptile

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

@icecrasher321
Copy link
Collaborator Author

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@icecrasher321 icecrasher321 merged commit 5a5c33d into staging Mar 26, 2026
12 checks passed
waleedlatif1 pushed a commit that referenced this pull request Mar 26, 2026
* fix(client): network drops reconnecting behaviour

* address bugbot comments

* address comments

* address queued message conflicts during retries

* fix more review comments

* fix branch

* fix non-clear bug

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant