[codex] Add Codex WebSocket transport proxy#29
Conversation
lis186
left a comment
There was a problem hiding this comment.
繁中摘要
- Issue #28 兩個子 bug 都已關掉:
detectOpenAISession不再在 body 為 null 時提早 return;restore.js透過openai_prompt_meta_*.jsonsidecar 保留worker/exploreragent type。 - Helper 抽到
server/ws-proxy.js+server/openai-session.js是正確的擺放方式,後續 frame parsing 有地方接。ws已在dependencies(^8.19.0),不必補裝。 - 驗證
store.detectSession(store.js:168–179):顯式session_id走第一條分支、不設inferred,因此 PR 合成的{metadata:{session_id}}路徑會收斂為inferred: false,UI 不會誤掛 inferred badge。 - 合併前需補三項都跟
/v1/realtime有關的修正;另建議補一項 idle timeout hardening。Frame parsing / cost / WS intercept 依 PR 描述切到後續 draft 合理,不擋此 PR。
Summary
- Both sub-bugs called out in issue #28 are closed:
detectOpenAISessionno longer early-returns on null body, andrestore.jskeepsworker/exploreragent type via theopenai_prompt_meta_*.jsonsidecar. - Helper extraction into
server/ws-proxy.js+server/openai-session.jsis the right shape for layering frame parsing later.wsis already independencies(^8.19.0), no install change needed. - Verified
store.detectSession(store.js:168–179) takes the explicit-session_idbranch and does not setinferred, so the synthesized{metadata:{session_id}}path resolves toinferred: false. No "inferred" UI badge regression. - Three required fixes before merge — all
/v1/realtimerelated — plus one optional idle-timeout hardening. Frame parsing / cost / WS intercept are correctly deferred per the PR description.
Required before merge
server/config.js::getProviderForRequest— add/v1/realtime→'openai'. Without this, realtime upgrades fall back to the Anthropic upstream.server/ws-proxy.js::isOpenAIResponsesWebSocket— path gate is hardcoded to/v1/responses, so/v1/realtimeupgrades hitwriteSocketResponse(socket, 404, 'Not Found'). Widen the gate (or add a siblingisOpenAIRealtimeWebSocket) so both paths are accepted.test/websocket-proxy.test.js— add a/v1/realtimeupgrade fixture so the gate can't be silently re-tightened.
Optional, recommended
- Socket idle timeout in
ws-proxy.js. Ping/pong forwarding is wired, but a hung peer is never detected — a ~60s idle deadline + 1011 close would clean up leaked sockets.
Out of scope, fine as-is
Frame parsing, cost calculation from WS usage, and intercept-on-WS are correctly deferred to a follow-up draft per the PR description. Transport-only dashboard entries are a UX tradeoff, not a blocker.
|
Thanks for the review, I should've turn on notifications for this one, let me address it today |
|
Addressed the review feedback in
Validation:
|
- Arm idle timer in wss.handleUpgrade callback so a stalled upstream (accepts TCP, never sends 101) is bounded by IDLE_TIMEOUT_MS instead of hanging forever. - Cap client→upstream send buffer at CCXRAY_WS_MAX_QUEUE_BYTES (default 4 MiB) and close 1009 on overflow; the previous queue was unbounded. - Destroy the upstream HTTP request/response on unexpected-response so the underlying socket doesn't leak. ws library hands ownership to the user once a listener is attached. - Drop the unreachable clientQueue path: clientWs is OPEN inside the handleUpgrade callback, so only client→upstream ever needs buffering. - Clamp WS close reasons to 120 bytes (spec cap is 123); the ws library throws RangeError on overflow. - Cover the new behavior with tests: pre-handshake stall timeout, auth token gating, accepted bearer, non-OpenAI 404, subprotocol forwarding. - Document ws-proxy.js / openai-session.js modules and the CCXRAY_WS_IDLE_TIMEOUT_MS / CCXRAY_WS_MAX_QUEUE_BYTES tunables. - Comment detectOpenAISession's intentional behavior: header session_id is honored even when parsedBody is null (covers WS upgrades and body-less HTTP retries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code reviewFound 1 issue:
Lines 434 to 449 in f695539 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
感謝你的四頁漫畫!太用心了。close-code 的部分你判斷完全正確,proxy 確實會 exit,你 PR #6 的 normalize 已經完整解決這個問題,所以就跟著 #6 一起處理,#29 我這邊準備合了。 First — thank you for the comic. Four pages of Traditional Chinese to explain a proxy lifecycle bug is not something I expected to receive this week, and it genuinely made my day. On the close-code finding above: confirmed by hand, your PR #6 normalize handles it cleanly. So the plan from my side is simple: I'll get #29 merged so you're unblocked, and the close-code normalization rides along with PR #6 since it's already part of that stack. No need to split anything out or rework the diff. Whenever you're ready to send #6 upstream, just open it — no checklist, no ceremony. You've clearly thought this through more carefully than most drive-by PRs I see, and I trust your judgment on the timing. Thanks again for the care you put into this. The comic is going on my wall. — Justin |
|
FYI both PRs are ready |
…ader forwarding Three integration tests that lock in behaviors verified during PR #33 sign-off: - test/auth-token-strip.e2e.test.js — spawns ccxray with AUTH_TOKEN set against a fake Anthropic upstream, sends `?token=...&trace=keepme`, and asserts the secret never reaches the upstream URL, SSE broadcasts, disk entry logs, or console output, while non-auth params are preserved (covers a5d28f0). - test/socket-error-survival.e2e.test.js — exercises both the client-abort mid-SSE path and the upstream `socket.destroy()` path against a slow fake upstream, asserting the proxy stays alive (follow-up probe returns 200) and stderr contains no uncaughtException trace (covers efd4a70). - test/websocket-headers-forward.e2e.test.js — opens a real WebSocket through ccxray to a fake WS upstream with `chatgpt-account-id` set, and asserts the custom and openai-beta headers reach upstream intact, host is rewritten, and ChatGPT routing transforms `/v1/realtime` to `/backend-api/codex/realtime` (covers PR #29 + 0ff5507). npm test: 480 → 483 pass, 0 fail.
|
Merged via #33 — thanks @shhtheonlyperson. Two follow-up fixes I added on top, in case you want to glance: |




繁中摘要
/v1/responses的Upgrade: websocket可以經過 ccxray,而不是 handshake 失敗或完全沒有 dashboard entry。openai-beta、session_id、x-codex-turn-metadata等 headers。worker/explorer,避免只靠 instruction text 猜回default。Summary
Scope
This is intentionally transport-first. It does not parse conversation frames, reconstruct full request/response payloads, calculate cost from WS usage events, or support intercept/editing for WebSocket traffic yet. Those should be separate draft PRs after transport behavior is stable.
Screenshots
Captured from isolated local fixtures. The before state represents the pre-#29 behavior for WebSocket-only Codex traffic: no usable dashboard turn is recorded for the WS upgrade path, and restored Codex prompt agent type falls back to instruction-text inference.
Session grouping
Transport request metadata
Prompt agent restore
This full-page fixture uses two neutral Codex prompts whose text does not contain
workerorexplorer. Before, both restore intoCodex Default; after, sidecar metadata restores separateCodex ExplorerandCodex Workerbuckets.Follow-up Fix / 後續修正
/goal say hello worldrun.openai_base_urlandchatgpt_base_urlthrough ccxray, keeps ccxray alive when Codex closes the WebSocket abnormally with code1006, and classifies Codex side-channel REST traffic ascodex/openaiinstead ofclaude/anthropic.codex/openai/101WebSocket transport entry.Validation
node --test test/config.test.js test/websocket-proxy.test.jsnode --test test/startup.test.js test/websocket-proxy.test.jsnpm test— 462 passingRefs #28