fix(responses): native Responses-API passthrough for gpt-5.x + Codex CLI#427
fix(responses): native Responses-API passthrough for gpt-5.x + Codex CLI#427
Conversation
Follow-up to #425. The tool format normalization in #425 handles flat function-tools but cannot express Responses-API-only features: - top-level `instructions` and `input` fields - custom tools `{type: "custom", name, format}` (apply_patch) - built-in tools `{type: "web_search"}`, `image_generation`, etc. - reasoning.effort, prompt_cache_key, encrypted reasoning content Codex CLI and the AI SDK v5 Responses transport rely on these. Trying to collapse them into Chat Completions format drops information and produces the same class of "tools.0.function undefined" error #425 was meant to fix. This PR adds a native passthrough path: - AIProvider gains an optional `responses(body, options)` method that forwards an opaque body to the upstream `/responses` endpoint. - vercel-gateway implements it against Vercel AI Gateway's `/responses` passthrough — no body transformation. - /v1/responses detects native Responses-API payloads (gpt-5.x model, presence of `instructions`, or any non-function tool) and routes them through the passthrough, streaming the upstream response back verbatim. Older clients still go through the transform + normalize path unchanged. - Credits are reserved via estimateRequestCost (50% safety buffer already applies) and settled to the reserved amount on completion. Accurate reconciliation from the `response.completed` SSE event is a TODO — filed for follow-up. Tests cover the three routing branches (gpt-5.x + instructions → passthrough, custom tools on any model → passthrough, vanilla nested tools on gpt-4o → transform path) plus the existing 15-test normalization suite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Builds on the initial native Responses-API passthrough with the follow-ups required to make it actually work against Vercel AI Gateway's /v1/responses endpoint and to cover the payment paths we missed in the first cut. - Model prefix rewrite: Vercel AI Gateway requires `provider/model` format (e.g. `openai/gpt-5.4`), but Codex CLI and the AI SDK v5 Responses transport send bare ids like `gpt-5.4`. Added `normalizeGatewayModelId()` which prefixes `openai/` for bare OpenAI ids, `anthropic/` for `claude-*`, `google/` for `gemini-*`, and leaves already-prefixed ids untouched. The rewrite builds a shallow clone of the caller body so the original is not mutated. - Loosened the "is native Responses payload" detector to match `gpt-5` and `gpt-5-mini` (previously it only matched `gpt-5.x`). - Custom/web_search/local_shell tool pass-through explicitly tested — verified from Codex source that these are the only four tool variants it emits, and Vercel AI Gateway accepts them all verbatim on /v1/responses per live test against the production endpoint. Tests added (13 new, bringing the file to 28 total): - Model prefix rewrite for bare / prefixed / claude / gemini ids - Caller body not mutated by rewrite - Bare `gpt-5` / `gpt-5-mini` detected as native Responses - Upstream 4xx forwarded verbatim with credit refund - Insufficient credits → 402, upstream not touched - Provider without .responses() method → 400 unsupported_provider - Anonymous users skip credit reservation in passthrough - Reconcile called with reserved amount on success - Custom / web_search / local_shell tools forwarded verbatim in body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code ReviewClean implementation overall — the detection + passthrough approach is the right call for Responses-API-only features. A few findings below. Bugs / Issues1.
// app/api/v1/responses/route.ts
async function handleNativeResponsesPassthrough(
...
apiKey: { id: string } | null, // ← never used
...At minimum, include 2. Credit reconciliation fires before stream is consumed The This is noted as a TODO but I'd flag it explicitly: the 50% safety buffer helps, but a cancelled mid-stream request (e.g. Codex CLI ctrl-C) will still be charged the full estimate. Consider settling on stream Code Quality3. It's passed through but never used — no timing metrics for passthrough requests. The regular path tracks elapsed time for logging. Passthrough latency is invisible. 4. // types.ts — "Providers that cannot proxy Responses API should throw."
// route.ts — if (!providerInstance.responses) { return 400 }The contract is actually "if missing, route returns 400 before calling". The JSDoc is misleading — it implies a throwing implementation is expected, but the check prevents the call entirely. 5. Upstream response headers forwarded verbatim (beyond hop-by-hop)
Minor6. For large Codex sessions (long TestsCoverage is solid for a new path (13 tests covering the main branches). Two notes:
Summary
The |
Addresses the Claude bot review on #427: 1. apiKey attribution gap (blocker) — `handleNativeResponsesPassthrough` accepted an `apiKey` parameter but never used it, creating an audit-trail hole. Now threads `apiKey?.id` into both the `reserveAndDeductCredits` and `reconcile` metadata, and includes it in the final `logger.info` so credits can be traced back to the paying key. 2. Dead code: `_startTime` → `startTime` is now used to compute and log `durationMs` on every passthrough response. The regular path had this; the passthrough was invisible. 3. Settlement-before-stream trade-off — explicit comment now calls out the Codex CLI Ctrl-C / mid-stream-cancel case that the previous "TODO" glossed over. Keeps the behavior (50% safety buffer is an acceptable overcharge upper bound for now) but documents the gap. 4. `AIProvider.responses?()` JSDoc said "should throw" but the route actually checks for method existence and returns 400. Fixed the contract description so the next reader doesn't look for a throwing implementation that never existed. 5. Header leak: previously we stripped only the four hop-by-hop headers, passing `x-vercel-*`, `cf-ray`, `server`, `via` etc. through to clients unchanged. Now strips all gateway-internal infra headers while explicitly keeping `x-ratelimit-*` for client transparency. 6. Fallback credit floor: when `estimateRequestCost` throws we now reserve $0.10 instead of $0.01. The old floor was too small to protect against a runaway session when estimation silently fails — the Chat Completions path gets a 50% safety buffer from `estimateRequestCost` that the passthrough was bypassing. New tests (3, bringing total to 31): - api_key_id flows into reserve + reconcile metadata - gateway-internal headers stripped, x-ratelimit-* forwarded - fallback floor is 0.10 when estimateRequestCost throws Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the review! All six findings addressed in 6f5224e:
3 new tests (bringing total to 31):
The test-side |
Code Review — #427Overall this is solid work. The design rationale is well-documented, the moderation/credit paths mirror the Chat Completions route correctly, and the 31-test suite covers the important branches. A few things worth addressing before merge: Bugs / Correctness1. Provider capability check happens after credit reservation
2. The upstream gateway could theoretically return 3. estimatedCost = await estimateRequestCost(
model,
pseudoMessages as never, // <-- bypasses type safety
...
);
Security4. No body size guard The raw request body is forwarded to the upstream gateway without an explicit size check. Chat Completions presumably has a size limit applied somewhere in the middleware stack — verify that limit also covers the Minor / Code quality5. The test suite covers 6. Redundant condition on lines 108-115 const bodyForUpstream =
gatewayModel === model ? body : { ...body, model: gatewayModel };
if (gatewayModel !== model) {
logger.debug(...)
}The second 7. At ~300 lines this function is doing a lot: validation, moderation, cost estimation, credit reservation, provider lookup, upstream fetch, error handling, header filtering, logging, and settlement. Consider extracting the credit-reservation block into a Acknowledged trade-offs (no action required)
Verdict: two correctness items (#1, #2) and the type-safety issue (#3) should be fixed before merge. The rest are minor polish. The test coverage and credit-audit improvements from the previous review round are appreciated. |
…ough Builds on #427. The native passthrough previously settled the credit reservation to the reserved estimate on every request regardless of actual usage — so a Codex turn that emitted 200 output tokens was charged at the same rate as one that emitted 4000, as long as both fit under the reserved upper bound. This PR wraps the upstream ReadableStream with a pass-through reader that also extracts `response.usage` from the terminal `response.completed` SSE event. When the stream ends, we compute the real cost via `calculateCost(model, provider, inputTokens, outputTokens)` and reconcile the reservation down to actual (capped at the reservation as an upper bound — over-runs would need a separate post-hoc ledger entry which is out of scope). ### Design - Zero behavioral impact on the client: bytes flow in the exact order and size the upstream produced them. We do not batch, rewrite, or buffer beyond a single SSE frame for parser bookkeeping. - SSE events are parsed out-of-band on the side. Parse errors are swallowed — a malformed frame must never break the forward path. - Exactly one terminal callback per stream lifecycle: `end`, `cancel`, or `error`. Client cancel (Codex CLI Ctrl-C, tab close) fires the callback with whatever usage we had seen before the cancel, so a turn that completed-then-cancelled still reconciles to actual. - Pull-based ReadableStream so the upstream is only drained when the client reads, matching the semantics of a direct proxy. - Cost is clamped at the reservation: `actualCost = min(computed, reserved)`. ### Trade-offs - If the client aborts mid-stream before `response.completed` arrives, we settle to the reserved estimate (same as pre-stream-wrap behavior). The 50% safety buffer in `estimateRequestCost` stays as the upper bound for that case. - If `calculateCost` throws, we fall back to settling at the reserved amount rather than crashing the reconciliation. - We cap at reserved rather than allowing over-collection. Anything beyond the reservation would need a separate post-hoc charge which this PR doesn't implement. ### Tests 14 unit tests for `wrapWithUsageExtraction` covering: - Passthrough fidelity (byte-exact, chunk-split frames, [DONE] sentinel, malformed JSON recovery) - Usage extraction (headline + cached + reasoning tokens, missing fields default to 0, null when no completed event) - Termination paths (end, cancel before completed, cancel after completed, error, throwing callback swallowed) 4 new integration tests in the route suite: - Reconciles to actual cost when response.completed reports usage - Caps actual cost at reserved when model over-runs the estimate - Falls back to reserved when no response.completed arrives - Existing reconcile/api_key_id tests updated to actually drain the stream (pull-based wrapper gates the callback on reader progress) Total: 48 passing tests across the two affected suites (14 + 34). ### Stacked on #427 This branch is stacked on `fix/responses-native-passthrough`. Merge #427 first, then rebase this onto `dev`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses the Claude bot's second review on #427. All three correctness items (#1-#3) fixed; two of the minor items (#5, #6) fixed; the body size guard (#4) was flagged as a concern worth resolving so I added an explicit one. 1. Provider check now happens BEFORE credit reservation. Previously an unsupported provider caused a reserve → refund round-trip on every request that touched the credits ledger for no reason. Now the check returns 400 before any ledger activity. Also eliminates the failure mode where a reservation could leak if the refund path threw. 2. `set-cookie` added to STRIP_HEADERS. Forwarding upstream cookies through the proxy breaks cookie domain/path semantics and leaks upstream session state — must be stripped regardless of what the gateway returns. 3. `pseudoMessages as never` removed. The previous cast was silencing a non-existent type mismatch: `{role, content: string}[]` is a valid subtype of `{role, content: string | object}[]` which is what `estimateRequestCost` accepts. Plain assignment works. Added an explicit type annotation on `pseudoMessages` so the subtyping relationship is visible at the call site. 4. Body size guard (4 MiB cap via `Content-Length` header) added at the top of `handlePOST` before any auth/parse/reserve work. The passthrough forwards bodies verbatim upstream, so an unbounded payload could bypass whatever limits the middleware stack has elsewhere. Chat Completions is still unguarded — that's a separate gap, but the passthrough is where we're most exposed. Returns 413 with `code: "request_too_large"`. 5. Added an explicit test for bare `model: "gpt-5"` (no suffix) to lock in the `/^gpt-5(\b|[-.])/` regex behavior. 6. Removed the redundant `gatewayModel !== model` check — now uses a single `needsModelRewrite` local and a cleaner ternary. Fixes the double-compare noise the review pointed out. Type-system housekeeping: captured a local `providerResponses` reference after the early check so the non-null narrowing survives across the intervening credit reservation block (previously used a `!` non-null assertion which biome flagged). Added an unreachable safety fallback that returns 500 if provider.responses somehow vanishes between check and call — pure type-level belt and braces. New tests (5, bringing total to 36): - Bare `gpt-5` → native Responses detection - Provider check happens before credit reservation (ledger untouched) - set-cookie header stripped from passthrough response - Request body over 4 MiB cap rejected with 413 - Request body well under cap accepted normally Deferred: function length (#7) — acknowledged as worth extracting into helpers but the reviewer flagged as not a blocker; the Chat Completions route has the same shape, so doing one without the other is more churn than signal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ough Builds on #427. The native passthrough previously settled the credit reservation to the reserved estimate on every request regardless of actual usage — so a Codex turn that emitted 200 output tokens was charged at the same rate as one that emitted 4000, as long as both fit under the reserved upper bound. This PR wraps the upstream ReadableStream with a pass-through reader that also extracts `response.usage` from the terminal `response.completed` SSE event. When the stream ends, we compute the real cost via `calculateCost(model, provider, inputTokens, outputTokens)` and reconcile the reservation down to actual (capped at the reservation as an upper bound — over-runs would need a separate post-hoc ledger entry which is out of scope). ### Design - Zero behavioral impact on the client: bytes flow in the exact order and size the upstream produced them. We do not batch, rewrite, or buffer beyond a single SSE frame for parser bookkeeping. - SSE events are parsed out-of-band on the side. Parse errors are swallowed — a malformed frame must never break the forward path. - Exactly one terminal callback per stream lifecycle: `end`, `cancel`, or `error`. Client cancel (Codex CLI Ctrl-C, tab close) fires the callback with whatever usage we had seen before the cancel, so a turn that completed-then-cancelled still reconciles to actual. - Pull-based ReadableStream so the upstream is only drained when the client reads, matching the semantics of a direct proxy. - Cost is clamped at the reservation: `actualCost = min(computed, reserved)`. ### Trade-offs - If the client aborts mid-stream before `response.completed` arrives, we settle to the reserved estimate (same as pre-stream-wrap behavior). The 50% safety buffer in `estimateRequestCost` stays as the upper bound for that case. - If `calculateCost` throws, we fall back to settling at the reserved amount rather than crashing the reconciliation. - We cap at reserved rather than allowing over-collection. Anything beyond the reservation would need a separate post-hoc charge which this PR doesn't implement. ### Tests 14 unit tests for `wrapWithUsageExtraction` covering: - Passthrough fidelity (byte-exact, chunk-split frames, [DONE] sentinel, malformed JSON recovery) - Usage extraction (headline + cached + reasoning tokens, missing fields default to 0, null when no completed event) - Termination paths (end, cancel before completed, cancel after completed, error, throwing callback swallowed) 4 new integration tests in the route suite: - Reconciles to actual cost when response.completed reports usage - Caps actual cost at reserved when model over-runs the estimate - Falls back to reserved when no response.completed arrives - Existing reconcile/api_key_id tests updated to actually drain the stream (pull-based wrapper gates the callback on reader progress) Total: 48 passing tests across the two affected suites (14 + 34). ### Stacked on #427 This branch is stacked on `fix/responses-native-passthrough`. Merge #427 first, then rebase this onto `dev`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the second pass. All three correctness items addressed in 9dbeb4f, plus both minor items, plus an explicit body-size guard for finding #4. Details:
Type-system housekeeping: captured a local Deferred (#7, function length): acknowledged as worth an extraction into helpers but the Chat Completions route has the same shape, so doing one without the other is more noise than value. Worth a dedicated cleanup PR after the passthrough stabilizes. Acknowledged trade-offs update: the settlement-before-stream-consumed trade-off you listed as "documented and acceptable" is actually resolved in the follow-up PR #428 (stacked on this branch). That PR wraps the upstream Test totals (this PR): 36/36 passing (was 31 + 5 new). |
Code ReviewOverall this is a well-structured PR with thorough test coverage and clear intent. The architecture of the passthrough path is sound — detect early, bail before touching the credits ledger if unsupported, forward verbatim. A few findings below. SecurityBody size guard can be bypassed (medium) The For a passthrough route that forwards bodies verbatim upstream this is the highest-exposure spot in the file. Recommend enforcing a hard read limit: // Before req.json(), enforce the actual read limit
const bodyText = await req.text();
if (bodyText.length > MAX_RESPONSES_BODY_BYTES) {
return Response.json({ error: { message: `...`, code: "request_too_large" } }, { status: 413 });
}
const rawBody = JSON.parse(bodyText) as Record<string, unknown>;The existing Upstream error object forwarded verbatim (low) In const gwErr = err as { status: number; error: unknown };
return Response.json({ error: gwErr.error }, { status: gwErr.status });
CorrectnessBackground settlement race in serverless environments (medium) void (async () => {
try {
await settle(reserved);
} catch (err) { ... }
})();In Next.js on Vercel (serverless), the function invocation can be frozen or terminated once the Recommendation: if
if (typeof body.instructions === "string") return true;If a malformed payload sends Double The const providerResponses = providerInstance.responses;
if (!providerResponses) {
return Response.json({ error: { code: "unsupported_provider" } }, { status: 400 });
}
// providerResponses is now definitively non-null — no second guard neededTest CoverageCoverage is solid for the happy path and most error branches. A few gaps:
Minor
Summary
The passthrough architecture itself is the right call — trying to collapse Responses-API-only features into Chat Completions would lose information. The test suite is well-structured and the credit/reservation logic is carefully handled (particularly the provider check before reservation). The items above are improvements rather than blockers, with the body size enforcement being the most actionable security fix. |
Addresses the Claude bot's third review on #427. All 5 substantive findings fixed plus the 4 test gaps the reviewer flagged. 1. Body size guard bypassable via chunked encoding (medium): the Content-Length header check is now an early fast path only. The real enforcement is a post-read length check on the buffered request text. Clients using `Transfer-Encoding: chunked` (which omits Content-Length) or lying about Content-Length will be caught by the post-read check before we touch req.json(). The header check stays for the cheap-fast-path benefit. 2. Background settle in serverless (medium): for non-streaming responses (content-type != "text/event-stream") we now `await` the settle synchronously before returning. The body is fully materialized at that point so deferring serves no purpose, and on Vercel the function can be frozen once the Response is returned, leaving a background promise to never complete and stranding the reservation. Streaming responses still fire-and-forget — that path is the one PR #428 addresses properly via stream-teed reconciliation. 3. Upstream error object forwarded verbatim (low): added `sanitizeGatewayError()` which extracts only the well-known OpenAI-compatible fields (message, type, code, optional param) from an unknown gateway error envelope. Stack traces, infrastructure host names, and arbitrary nested objects are stripped. Whitelist + length cap means even a hostile gateway payload can't leak internals through this path. 4. `isNativeResponsesPayload` non-string instructions (low): now triggers passthrough on ANY presence of `instructions`, not just string. A malformed `instructions: 42` payload routes through the passthrough so the upstream returns a coherent validation error instead of falling through to Chat Completions which would choke on the unexpected field. 5. Double `providerResponses` null guard (nit): captured the narrowed local at the early-bail check site so the second guard before the forward call is gone. Removed both the redundant if-block AND the unreachable 500 fallback that was load-bearing only for the type system. New tests (6, bringing total to 42): - 413 with no Content-Length (chunked encoding bypass test) - 400 invalid_json on malformed JSON body - Synchronous settle for non-streaming passthrough (no flush needed) - Non-streaming JSON passthrough body forwarded unchanged - Sanitized gateway error envelope (hostile fields stripped) - Non-string `instructions: 42` routes to passthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review — Round 4Great work getting through three rounds of fixes. The code is in good shape. Two remaining issues I can see in the current state: Bugs1. Gateway-internal headers leak through non-ok upstream responses (low) The success path carefully builds if (!upstreamResponse.ok) {
await settleReservation?.(0);
return new Response(upstreamResponse.body, {
status: upstreamResponse.status,
statusText: upstreamResponse.statusText,
headers: upstreamResponse.headers, // ← raw, unfiltered
});
}A gateway 429 or 503 response could still carry 2.
if (isNativeResponsesPayload(rawBody)) {
rawBody = JSON.parse(bodyText);
if (!rawBody || typeof rawBody !== "object" || Array.isArray(rawBody)) {
return Response.json(
{ error: { message: "Request body must be a JSON object", type: "invalid_request_error", code: "invalid_json" } },
{ status: 400 },
);
}Minor
Summary
The three-round cleanup has been thorough — the credit-audit trail, error sanitization, body-size enforcement, serverless settlement fix, and header stripping on the success path are all solid. These two low items are the only things I can find still open. |
…ough Builds on #427. The native passthrough previously settled the credit reservation to the reserved estimate on every request regardless of actual usage — so a Codex turn that emitted 200 output tokens was charged at the same rate as one that emitted 4000, as long as both fit under the reserved upper bound. This PR wraps the upstream ReadableStream with a pass-through reader that also extracts `response.usage` from the terminal `response.completed` SSE event. When the stream ends, we compute the real cost via `calculateCost(model, provider, inputTokens, outputTokens)` and reconcile the reservation down to actual (capped at the reservation as an upper bound — over-runs would need a separate post-hoc ledger entry which is out of scope). - Zero behavioral impact on the client: bytes flow in the exact order and size the upstream produced them. We do not batch, rewrite, or buffer beyond a single SSE frame for parser bookkeeping. - SSE events are parsed out-of-band on the side. Parse errors are swallowed — a malformed frame must never break the forward path. - Exactly one terminal callback per stream lifecycle: `end`, `cancel`, or `error`. Client cancel (Codex CLI Ctrl-C, tab close) fires the callback with whatever usage we had seen before the cancel, so a turn that completed-then-cancelled still reconciles to actual. - Pull-based ReadableStream so the upstream is only drained when the client reads, matching the semantics of a direct proxy. - Cost is clamped at the reservation: `actualCost = min(computed, reserved)`. - If the client aborts mid-stream before `response.completed` arrives, we settle to the reserved estimate (same as pre-stream-wrap behavior). The 50% safety buffer in `estimateRequestCost` stays as the upper bound for that case. - If `calculateCost` throws, we fall back to settling at the reserved amount rather than crashing the reconciliation. - We cap at reserved rather than allowing over-collection. Anything beyond the reservation would need a separate post-hoc charge which this PR doesn't implement. 14 unit tests for `wrapWithUsageExtraction` covering: - Passthrough fidelity (byte-exact, chunk-split frames, [DONE] sentinel, malformed JSON recovery) - Usage extraction (headline + cached + reasoning tokens, missing fields default to 0, null when no completed event) - Termination paths (end, cancel before completed, cancel after completed, error, throwing callback swallowed) 4 new integration tests in the route suite: - Reconciles to actual cost when response.completed reports usage - Caps actual cost at reserved when model over-runs the estimate - Falls back to reserved when no response.completed arrives - Existing reconcile/api_key_id tests updated to actually drain the stream (pull-based wrapper gates the callback on reader progress) Total: 48 passing tests across the two affected suites (14 + 34). This branch is stacked on `fix/responses-native-passthrough`. Merge Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Third review round addressed in c91a33d. All 5 substantive findings + the 4 test gaps fixed.
New tests (6, bringing total to 42):
#428 has been rebased onto this branch — clean conflict resolution combining the streaming wrapper (from #428) with the streaming/non-streaming dispatch (from this round). 59/59 passing on the rebased #428. Test totals (this PR): 42/42 passing (was 36 + 6 new). |
- CodingAgentSettingsSection: move updateConfig out of the setPrefs state updater into a debounced useEffect (400ms). React state updaters must be pure; the old path fired twice per keystroke in Strict Mode and persisted 40+ partial-key snapshots while a user typed an API key. The debounce coalesces rapid edits into a single POST so a mid-flight failure can no longer leave milady.json with a partial credential. - coding-agent-bridge: drop the unused duplicate of the POST /api/coding-agents/auth/:agent handler. The canonical copy lives in server.ts's private handleCodingAgentsFallback — the bridge export was never wired in, so keeping both would silently drift. - agent-orchestrator-compat: stop treating Codex as auth-ready under cloudReady. elizaOS/cloud#427 (responses-stream reconciliation) has not deployed, so Codex-through-cloud would fail at runtime with no explanation. Restore once cloud#427/#428 ship. - credentials: call readClaudeCodeOAuthBlob() once per provider row and destructure, instead of hitting it twice (the second call shelled out to `security` on macOS on every status poll). deploy/cloud-agent-template/package.json pin is left in place: the release contract check requires exact versions there (the template is published as a standalone deploy artifact, not a workspace member, so `workspace:*` won't resolve against any registry). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Follow-up to #425. The tool format normalization in #425 handled flat function-tools but can't express the full set of Responses-API-only features that gpt-5.x clients (Codex CLI, AI SDK v5 Responses transport) send. This PR adds a native passthrough path that proxies the raw body to Vercel AI Gateway's
/v1/responsesendpoint unchanged.What #425 couldn't handle
instructionsandinputfieldstype: "custom"tools (Codex'sapply_patchwith Lark grammar)type: "web_search",type: "local_shell"built-in toolsinclude: ["reasoning.encrypted_content"]prompt_cache_key,parallel_tool_calls, reasoning summariesCollapsing these into Chat Completions format drops information and produces the same class of
tools.0.function undefinederror #425 was meant to fix.What this PR adds
AIProvider.responses(body, options)— optional method on the provider interface for native Responses-API passthrough.vercel-gateway.ts— implementsresponses()against Vercel AI Gateway'shttps://ai-gateway.vercel.sh/v1/responsespassthrough. No body transformation./v1/responsesroute — new early branch that detects native Responses-API payloads and routes them through the passthrough. Detection triggers on:instructionstype !== "function"(custom, web_search, local_shell, image_generation, …)gpt-5provider/modelformat. Codex sends baregpt-5.4, sonormalizeGatewayModelId()prefixesopenai/for bare OpenAI ids,anthropic/forclaude-*,google/forgemini-*. Caller body is cloned, not mutated.ChatCompletionsToolandChatCompletionsToolChoiceexported fromproviders/types.tsand shared withOpenAIChatRequest, eliminating structural drift.estimateRequestCost(50% safety buffer applies) and settles on completion. Accurate reconciliation fromresponse.completedSSE event is a TODO.Verified against production upstream
Before opening this PR we ran a live HTTP proxy (
scripts/codex-sniff.tsin the milady repo) between Codex CLI andhttps://ai-gateway.vercel.sh/v1/responseswith a real gateway key. Captured and confirmed:openai/gpt-5.4,instructions,input, 12 tools (10 function + 1 customapply_patchwith Lark grammar + 1 web_search),include: ["reasoning.encrypted_content"],reasoning: {effort: "medium"},prompt_cache_key,stream: true→ 200 OK SSE streamCross-referenced Vercel AI Gateway Responses API docs and Codex source (
codex-rs/core/src/client_common.rsToolSpec enum +codex-rs/core/src/client.rs:build_responses_request) to confirm payload shape is stable.Test coverage
28 tests in
packages/tests/unit/api/v1-responses-route.test.ts(15 from #425 still passing + 13 new):Routing
instructions+ flat/custom/web_search tools → passthroughopenai/gpt-5.4→ passthroughModel prefix rewrite
gpt-5.4→openai/gpt-5.4claude-sonnet-4.6→anthropic/claude-sonnet-4.6gemini-2.5-pro→google/gemini-2.5-proPayment path
actualCost: 0).responses()→ 400unsupported_providerPassthrough body integrity
instructionsfield forwardedFollow-ups (not blocking)
vercel-gateway.responses(). Hit Vercel's free-tier throttle during testing and Codex's own retry budget exhausted. Exponential backoff inside the gateway wrapper would make this invisible to clients.response.completedSSE event (currently settles to reserved estimate).RELAXED(200/min). Agentic sessions can burn that in a burst — consider anAGENTICpreset or raising to 600/min on this route.Test plan
bun test packages/tests/unit/api/v1-responses-route.test.ts— 28/28 passingPARALLAX_LLM_PROVIDER=cloudflow🤖 Generated with Claude Code