fix: stop infinite reconnect storm on multi-session (close code 4001)#57
Merged
Conversation
修复多 Claude 会话导致的无限重连循环
When a second Claude Code session connects to the daemon, the old session
is kicked with close code 4001. Previously, the kicked client treated this
as a generic disconnect and auto-reconnected, displacing the new session
and creating an infinite loop.
Now the client distinguishes close code 4001 ("replaced") from other close
codes. A replaced session enters a dormant disabled state instead of
reconnecting. The existing disabledRecoveryPoller handles recovery if the
replacing session later disconnects.
Changes:
- control-protocol.ts: export CLOSE_CODE_REPLACED = 4001
- daemon-client.ts: emit "replaced" (not "disconnect") on code 4001
- bridge.ts: handle "replaced" event via enterDisabledState()
- daemon.ts: use CLOSE_CODE_REPLACED constant instead of magic number
- daemon-client.test.ts: 3 new tests for 4001 vs non-4001 behavior
Closes #55 (Phase 1)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
added 5 commits
April 2, 2026 15:25
改进审批请求生命周期可靠性,对标 codex-plugin-cc 协议处理模式: - TUI 断连后 requeue in-flight server requests 并在重连时 replay - app-server 断连时 buffer approval responses,重连后 flush - 去掉 TTL timer,改用 requeue 策略保证不丢弃有效请求 - 拆分 bridge disabled 状态,区分 killed 和 replaced 原因 - replaced session 永久 dormant,不再启动 recovery poller 避免 ping-pong - 新增 verify:plugin-sync 脚本,CI 改用 bun run check 统一检查
修复 Codex review 发现的 correctness bug:app-server 断连后 approval 请求/响应状态被错误保留,可能将带旧 server ID 的响应 flush 到新连接。 - 提取 handleAppServerClose() 方法,调用 clearResponseTrackingState() 全量清理 approval 状态(serverRequestToProxy、pendingServerRequests、 pendingServerResponses) - 新增回归测试覆盖 app-server close 清理路径 - TUI reconnect 维度的 requeue/replay 逻辑不受影响 根据设计文档 issue-37 的约束:审批 ID 是 session-scoped, app-server 重连后旧 ID 无效,应丢弃审批状态。
将所有单元测试文件从 src/ 根目录移至 src/unit-test/,减少源码目录混杂。 新增 src/unit-test/e2e/ 目录,按 PR 记录 E2E 手动测试步骤。 - 13 个 .test.ts 文件迁移,更新相对 import 路径 - 新增 pr-57-close-code-4001.md E2E 测试文档
改变多会话设计方向:新 Claude 连接被拒绝,旧会话不受影响。 - daemon.ts: attachClaude() 检查 readyState !== CLOSED(含 CLOSING) 拒绝后来者而非踢旧连接 - 全链路重命名 replaced → rejected:类型、事件名、错误消息、测试 - E2E 文档更新为新的"拒绝新连接"语义
abg dev 现在会先自动执行 bun run build:plugin,确保 plugin 产物与源码同步,避免用旧 build 测试新代码。
d5f82ef to
efde6df
Compare
fix: 支持 Codex TUI resume 时的 secondary WebSocket 连接 Codex TUI uses two parallel WebSocket connections during thread resume: a picker connection (secondary) and the main session connection (primary). The proxy's "latest connection wins" model was dropping thread/resume messages from the primary connection, causing the resume flow to freeze. Changes: - Add secondary connection support with dedicated app-server WS per secondary connection (raw passthrough, no id remapping) - Add app-server generation counter to prevent stale close handlers - Add fresh-session reconnect: buffer TUI messages during app-server reconnect on initialize, replay after reconnection - Fix orphaned app-server WS if picker disconnects before onopen - Fix zombie secondary if app-server WS closes first - Clean up verbose diagnostic logging (app-server → proxy, stale message content preview, [track] per-message logging) - Update "replaced" → "rejected" semantics for multi-session handling - Sync compiled plugin files (bridge-server.js, daemon.js) - Add codex-plugin-cc/ to .gitignore - Update verify-plugin-sync script Tests: 171 pass, 0 fail
This was referenced Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary / 概要
修复多 Claude Code 会话导致的无限重连循环,并改进审批请求生命周期可靠性。
Part 1: Close Code 4001 — 止血修复
When a second Claude Code session connects to the daemon, the old session is kicked with close code 4001. Previously the kicked client auto-reconnected, creating an infinite reconnect storm.
Root Cause / 根因:
daemon-client.ts的onclosehandler 不区分 close code 4001(被替代)和其他 close code(daemon 崩溃),统一 emit"disconnect"触发重连。Fix / 修复:
control-protocol.ts— 导出CLOSE_CODE_REPLACED = 4001常量daemon-client.ts—onclose检查event.code === 4001,emit"replaced"而非"disconnect"bridge.ts— 监听"replaced"事件,永久进入 dormant 状态(不启动 recovery poller 避免 ping-pong)daemon.ts— 使用CLOSE_CODE_REPLACED常量替代 magic numberPart 2: Approval Lifecycle Reliability — 对标 codex-plugin-cc 协议
参考 codex-plugin-cc 的连接生命周期管理模式,改进 AgentBridge 的审批请求 passthrough 可靠性:
clearTransientResponseTrackingState()与clearResponseTrackingState()分离,app-server 断连不清理缓冲的响应Part 3: Bridge Disabled State 改进
bridge-disabled-state.ts— 提取BridgeDisabledReason类型("killed"|"replaced"),根据原因返回不同错误消息Part 4: CI / Infra
scripts/verify-plugin-sync.cjs— 新增插件同步校验脚本,确保 build 产物与源码一致.github/workflows/ci.yml— CI 改用bun run check统一检查(typecheck + test + plugin sync + version check)package.json— 添加verify:plugin-sync脚本Test plan / 测试计划
bun run typecheck— 通过bun test src/— 166 tests 全部通过New / updated tests / 新增和更新的测试
Close code 4001 (daemon-client):
emits replaced (not disconnect) when server closes with code 4001emits disconnect (not replaced) for non-4001 close codespending replies rejected on replaced close (code 4001)Approval lifecycle (codex-adapter):
4.
approval response buffered when app-server disconnected, flushed on reconnect5.
approval response send failure is buffered for retry6.
requeues in-flight server requests on TUI disconnect and replays them on reconnect7.
new TUI connection replays in-flight server requests before the old socket closesBridge disabled state:
8.
bridge-disabled-state.test.ts— disabled reason type 测试Closes #55 (Phase 1)
Relates to #39, #58
🤖 Generated with Claude Code