fix(gateway/chat): emit terminal chat final via fallback when registry entry drifts between peek and shift#74678
Conversation
…y entry drifts between peek and shift When chatRunState.registry.peek() succeeds but the subsequent shift() returns undefined (race between concurrent terminal lifecycle events), the handler was returning early and silently suppressing emitChatFinal. This left Control UI and TUI stuck in streaming/running state even though the assistant text had already arrived. Fix: invert the if/else — when finished is undefined, fall through to the same sessionKey/eventRunId fallback path used when chatLink is absent rather than returning early. Added regression test for the peek-succeeds, shift-misses race. Fixes openclaw#74614
|
Closing this PR because the author has more than 10 active PRs in this repo. Please reduce the active PR queue and reopen or resubmit once it is back under the limit. You can close your own PRs to get back under the limit. |
|
Closing this PR because the author has more than 20 active PRs in this repo. Please reduce the active PR queue and reopen or resubmit once it is back under the limit. You can close your own PRs to get back under the limit. |
|
@clawsweeper re-review |
|
🦞🧹 Reason: re-review requires an open issue or PR. |
|
@hclsys I think this is a good fix but can't stay open with >20 prs. Would you rather I cherry pick your commits to a new PR or would you prefer to close some other PRs to stay under the 20 limit? EDIT found an override label |
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: Do we have a high-confidence way to reproduce the issue? Not yet as a real runtime race; source inspection confirms current main returns before PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Where did the egg go?
Real behavior proof Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Keep the fallback behavior, but make the regression test force Do we have a high-confidence way to reproduce the issue? Do we have a high-confidence way to reproduce the issue? Not yet as a real runtime race; source inspection confirms current main returns before Is this the best way to solve the issue? Is this the best way to solve the issue? Partly: the production fallback is narrow and maintainable, but the test should be corrected to hit the claimed branch before treating the solution as proven. Label justifications:
Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 6f18decb7a2c. |
Problem
Fixes #74614.
When
chatRunState.registry.peek(evt.runId)succeeds but the subsequentregistry.shift(evt.runId)returnsundefined(race between concurrent terminal lifecycle events for the same run), the gateway lifecycle handler was returning early before callingemitChatFinal. This left Control UI and TUI stuck instreaming/ running state even though the assistant text had already arrived.Fix
Invert the condition so that when
shift()misses, the code falls through to the samesessionKey/eventRunIdfallback path used whenchatLinkis absent — the terminal event is never silently suppressed.Changes
src/gateway/server-chat.ts— invert the shift-miss branch to fall through to fallbackemitChatFinalsrc/gateway/server-chat.agent-events.test.ts— add regression test for the peek-succeeds-shift-misses raceCHANGELOG.md— Unreleased entryTests
53/53
server-chat.agent-eventstests pass locally including the new regression test.