Skip to content

[codex] Add Codex WebSocket transport proxy#29

Merged
lis186 merged 4 commits into
lis186:mainfrom
shhtheonlyperson:codex/ws-transport-v1
May 22, 2026
Merged

[codex] Add Codex WebSocket transport proxy#29
lis186 merged 4 commits into
lis186:mainfrom
shhtheonlyperson:codex/ws-transport-v1

Conversation

@shhtheonlyperson
Copy link
Copy Markdown
Contributor

@shhtheonlyperson shhtheonlyperson commented May 9, 2026

繁中摘要

  • 這支 PR 先補 Codex/codex-rs 的 WebSocket transport 缺口,讓 /v1/responsesUpgrade: websocket 可以經過 ccxray,而不是 handshake 失敗或完全沒有 dashboard entry。
  • WebSocket frame 目前只做 transport-level proxy:雙向轉送文字/二進位 frames,保留 OpenAI auth、openai-betasession_idx-codex-turn-metadata 等 headers。
  • Dashboard 會先記錄 transport-only entry,包含 session id、agent type、cwd、frame/byte counts、close/error metadata;完整 conversation frame parsing 另開後續 draft PR 處理。
  • 順手修正 Codex prompt restore 的 agent 分類漂移:重啟後會用 metadata sidecar 保留 worker / explorer,避免只靠 instruction text 猜回 default

Summary

  • Adds an OpenAI Responses WebSocket upgrade path for Codex/codex-rs traffic.
  • Forwards text and binary frames bidirectionally while preserving OpenAI auth, beta, session, and Codex metadata headers.
  • Records a transport-only dashboard/log entry with session id, agent type, cwd, frame counts, byte counts, close/error metadata.
  • Extracts shared OpenAI/Codex session header helpers and preserves prompt agent type across restore via a small shared metadata sidecar.

Scope

This is intentionally transport-first. It does not parse conversation frames, reconstruct full request/response payloads, calculate cost from WS usage events, or support intercept/editing for WebSocket traffic yet. Those should be separate draft PRs after transport behavior is stable.

Screenshots

Captured from isolated local fixtures. The before state represents the pre-#29 behavior for WebSocket-only Codex traffic: no usable dashboard turn is recorded for the WS upgrade path, and restored Codex prompt agent type falls back to instruction-text inference.

Session grouping

before after
Before: no WebSocket session captured After: WebSocket transport entry grouped by session_id

Transport request metadata

before after
Before: no request metadata available for the failed WebSocket path After: transport-only request metadata with session, agent, and cwd

Prompt agent restore

This full-page fixture uses two neutral Codex prompts whose text does not contain worker or explorer. Before, both restore into Codex Default; after, sidecar metadata restores separate Codex Explorer and Codex Worker buckets.

before after
Before: neutral restored prompts collapse into Codex Default After: metadata sidecars restore Codex Explorer and Codex Worker buckets

Follow-up Fix / 後續修正

  • Follow-up PR: Route ChatGPT-auth Codex traffic through ccxray shhtheonlyperson/ccxray#6 validates this transport work against a real ChatGPT-auth Codex /goal say hello world run.
  • It fixes the launcher/config path by routing both openai_base_url and chatgpt_base_url through ccxray, keeps ccxray alive when Codex closes the WebSocket abnormally with code 1006, and classifies Codex side-channel REST traffic as codex/openai instead of claude/anthropic.
  • Net result: Codex no longer produces 0 ccxray entries on that runtime path; the main turn is captured as a codex/openai/101 WebSocket transport entry.

Validation

  • node --test test/config.test.js test/websocket-proxy.test.js
  • node --test test/startup.test.js test/websocket-proxy.test.js
  • npm test — 462 passing

Refs #28

Copy link
Copy Markdown
Owner

@lis186 lis186 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

繁中摘要

  • Issue #28 兩個子 bug 都已關掉:detectOpenAISession 不再在 body 為 null 時提早 return;restore.js 透過 openai_prompt_meta_*.json sidecar 保留 worker / explorer agent type。
  • Helper 抽到 server/ws-proxy.js + server/openai-session.js 是正確的擺放方式,後續 frame parsing 有地方接。ws 已在 dependencies^8.19.0),不必補裝。
  • 驗證 store.detectSession(store.js:168–179):顯式 session_id 走第一條分支、不設 inferred,因此 PR 合成的 {metadata:{session_id}} 路徑會收斂為 inferred: false,UI 不會誤掛 inferred badge。
  • 合併前需補三項都跟 /v1/realtime 有關的修正;另建議補一項 idle timeout hardening。Frame parsing / cost / WS intercept 依 PR 描述切到後續 draft 合理,不擋此 PR。

Summary

  • Both sub-bugs called out in issue #28 are closed: detectOpenAISession no longer early-returns on null body, and restore.js keeps worker / explorer agent type via the openai_prompt_meta_*.json sidecar.
  • Helper extraction into server/ws-proxy.js + server/openai-session.js is the right shape for layering frame parsing later. ws is already in dependencies (^8.19.0), no install change needed.
  • Verified store.detectSession (store.js:168–179) takes the explicit-session_id branch and does not set inferred, so the synthesized {metadata:{session_id}} path resolves to inferred: false. No "inferred" UI badge regression.
  • Three required fixes before merge — all /v1/realtime related — plus one optional idle-timeout hardening. Frame parsing / cost / WS intercept are correctly deferred per the PR description.

Required before merge

  1. server/config.js::getProviderForRequest — add /v1/realtime'openai'. Without this, realtime upgrades fall back to the Anthropic upstream.
  2. server/ws-proxy.js::isOpenAIResponsesWebSocket — path gate is hardcoded to /v1/responses, so /v1/realtime upgrades hit writeSocketResponse(socket, 404, 'Not Found'). Widen the gate (or add a sibling isOpenAIRealtimeWebSocket) so both paths are accepted.
  3. test/websocket-proxy.test.js — add a /v1/realtime upgrade fixture so the gate can't be silently re-tightened.

Optional, recommended

  1. Socket idle timeout in ws-proxy.js. Ping/pong forwarding is wired, but a hung peer is never detected — a ~60s idle deadline + 1011 close would clean up leaked sockets.

Out of scope, fine as-is

Frame parsing, cost calculation from WS usage, and intercept-on-WS are correctly deferred to a follow-up draft per the PR description. Transport-only dashboard entries are a UX tradeoff, not a blocker.

@shhtheonlyperson
Copy link
Copy Markdown
Contributor Author

Thanks for the review, I should've turn on notifications for this one, let me address it today

@shhtheonlyperson
Copy link
Copy Markdown
Contributor Author

shhtheonlyperson commented May 12, 2026

Addressed the review feedback in 50331d8:

  • Added /v1/realtime to getProviderForRequest so realtime traffic resolves to the OpenAI upstream.
  • Widened the WebSocket upgrade gate to accept both /v1/responses and /v1/realtime.
  • Added a /v1/realtime WebSocket proxy fixture.
  • Added idle timeout hardening with CCXRAY_WS_IDLE_TIMEOUT_MS override for tests, defaulting to 60s.

Validation:

  • node --test test/config.test.js test/websocket-proxy.test.js
  • npm test — 462 passing

@shhtheonlyperson shhtheonlyperson marked this pull request as ready for review May 13, 2026 06:21
@shhtheonlyperson shhtheonlyperson marked this pull request as draft May 13, 2026 06:23
@shhtheonlyperson shhtheonlyperson marked this pull request as ready for review May 13, 2026 16:24
- Arm idle timer in wss.handleUpgrade callback so a stalled upstream
  (accepts TCP, never sends 101) is bounded by IDLE_TIMEOUT_MS instead
  of hanging forever.
- Cap client→upstream send buffer at CCXRAY_WS_MAX_QUEUE_BYTES (default
  4 MiB) and close 1009 on overflow; the previous queue was unbounded.
- Destroy the upstream HTTP request/response on unexpected-response so
  the underlying socket doesn't leak. ws library hands ownership to the
  user once a listener is attached.
- Drop the unreachable clientQueue path: clientWs is OPEN inside the
  handleUpgrade callback, so only client→upstream ever needs buffering.
- Clamp WS close reasons to 120 bytes (spec cap is 123); the ws library
  throws RangeError on overflow.
- Cover the new behavior with tests: pre-handshake stall timeout, auth
  token gating, accepted bearer, non-OpenAI 404, subprotocol forwarding.
- Document ws-proxy.js / openai-session.js modules and the
  CCXRAY_WS_IDLE_TIMEOUT_MS / CCXRAY_WS_MAX_QUEUE_BYTES tunables.
- Comment detectOpenAISession's intentional behavior: header session_id
  is honored even when parsedBody is null (covers WS upgrades and
  body-less HTTP retries).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shhtheonlyperson
Copy link
Copy Markdown
Contributor Author

Visual reference only: four-page Traditional Chinese comic summary of this PR.

PR #29 comic page 1

PR #29 comic page 2

PR #29 comic page 3

PR #29 comic page 4

@shhtheonlyperson
Copy link
Copy Markdown
Contributor Author

Visual reference only: four-page Traditional Chinese comic summary of this PR.

PR #29 comic page 1

PR #29 comic page 2

PR #29 comic page 3

PR #29 comic page 4

just for FUN!

@lis186
Copy link
Copy Markdown
Owner

lis186 commented May 16, 2026

Code review

Found 1 issue:

  1. Forwarding the peer's WebSocket close code verbatim can throw and skip finalize(). The ws library rejects reserved local-only codes (1005 "no status received", 1006 "abnormal closure") when passed to .close(), and those codes are routinely emitted by the 'close' event when the peer closes without a status frame or the TCP socket drops. When the throw escapes the handler, finalize() on the next line is never called — the entry is not written, activeRequests[sessionId] is not decremented, and the idle timer leaks; on the upstream-close branch it can take down the proxy. Map reserved codes to 1000/1011 before forwarding (or call .close() with no arguments when the code is reserved).

ccxray/server/ws-proxy.js

Lines 434 to 449 in f695539

});
clientWs.on('close', (code, reason) => {
const reasonStr = reason.toString();
if (upstreamWs.readyState === WebSocket.OPEN || upstreamWs.readyState === WebSocket.CONNECTING) {
upstreamWs.close(code, clampWsReason(reasonStr));
}
finalize({ status: 101, close: { side: 'client', code, reason: reasonStr } });
});
upstreamWs.on('close', (code, reason) => {
const reasonStr = reason.toString();
if (clientWs.readyState === WebSocket.OPEN || clientWs.readyState === WebSocket.CONNECTING) {
clientWs.close(code, clampWsReason(reasonStr));
}
finalize({ status: 101, close: { side: 'upstream', code, reason: reasonStr } });
});
clientWs.on('error', err => {

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@lis186
Copy link
Copy Markdown
Owner

lis186 commented May 17, 2026

感謝你的四頁漫畫!太用心了。close-code 的部分你判斷完全正確,proxy 確實會 exit,你 PR #6 的 normalize 已經完整解決這個問題,所以就跟著 #6 一起處理,#29 我這邊準備合了。


First — thank you for the comic. Four pages of Traditional Chinese to explain a proxy lifecycle bug is not something I expected to receive this week, and it genuinely made my day.

On the close-code finding above: confirmed by hand, your PR #6 normalize handles it cleanly.

So the plan from my side is simple: I'll get #29 merged so you're unblocked, and the close-code normalization rides along with PR #6 since it's already part of that stack. No need to split anything out or rework the diff.

Whenever you're ready to send #6 upstream, just open it — no checklist, no ceremony. You've clearly thought this through more carefully than most drive-by PRs I see, and I trust your judgment on the timing.

Thanks again for the care you put into this. The comic is going on my wall.

— Justin

@shhtheonlyperson
Copy link
Copy Markdown
Contributor Author

FYI both PRs are ready
#29
shhtheonlyperson#6

lis186 added a commit that referenced this pull request May 22, 2026
…ader forwarding

Three integration tests that lock in behaviors verified during PR #33 sign-off:

- test/auth-token-strip.e2e.test.js — spawns ccxray with AUTH_TOKEN set against
  a fake Anthropic upstream, sends `?token=...&trace=keepme`, and asserts the
  secret never reaches the upstream URL, SSE broadcasts, disk entry logs, or
  console output, while non-auth params are preserved (covers a5d28f0).

- test/socket-error-survival.e2e.test.js — exercises both the client-abort
  mid-SSE path and the upstream `socket.destroy()` path against a slow fake
  upstream, asserting the proxy stays alive (follow-up probe returns 200) and
  stderr contains no uncaughtException trace (covers efd4a70).

- test/websocket-headers-forward.e2e.test.js — opens a real WebSocket through
  ccxray to a fake WS upstream with `chatgpt-account-id` set, and asserts the
  custom and openai-beta headers reach upstream intact, host is rewritten,
  and ChatGPT routing transforms `/v1/realtime` to `/backend-api/codex/realtime`
  (covers PR #29 + 0ff5507).

npm test: 480 → 483 pass, 0 fail.
@lis186 lis186 merged commit f695539 into lis186:main May 22, 2026
2 checks passed
@lis186
Copy link
Copy Markdown
Owner

lis186 commented May 22, 2026

Merged via #33 — thanks @shhtheonlyperson. Two follow-up fixes I added on top, in case you want to glance: a5d28f0 (AUTH_TOKEN strip covering both HTTP + your new WS path), 06e0f56 (extended self-loop guard to chatgpt_base_url).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants