Skip to content

feat: forward cache routing headers for Responses API prompt caching#165

Merged
mcowger merged 2 commits intomcowger:mainfrom
zicochaos:feat/forward-cache-routing-headers
Apr 15, 2026
Merged

feat: forward cache routing headers for Responses API prompt caching#165
mcowger merged 2 commits intomcowger:mainfrom
zicochaos:feat/forward-cache-routing-headers

Conversation

@zicochaos
Copy link
Copy Markdown
Contributor

@zicochaos zicochaos commented Apr 15, 2026

Summary

  • forward session_id and x-client-request-id headers from incoming Responses API requests to upstream providers for prompt cache routing
  • add /v1/codex/responses route alias so Codex CLI and pi-agent clients can connect through plexus without URL mismatch
  • fall back to prompt_cache_key from the request body when client headers are absent

Details

  • OpenAI's Codex backend uses session_id and x-client-request-id headers to route requests to the same backend server that holds cached prompt prefixes. Without these headers, prompt_cache_key in the body is ineffective and every request is a full-price cache miss.
  • The official Codex CLI (codex-rs/codex-api/src/requests/headers.rs) sends these headers unconditionally. Third-party clients (pi-agent, hermes-agent) also send them. Plexus was silently dropping them in setupHeaders().
  • Cache routing headers are captured in responses.ts route handler, attached to UnifiedChatRequest.cacheRoutingHeaders, and forwarded by dispatcher.ts setupHeaders(). When client headers are absent, prompt_cache_key from the body is used as fallback.
  • The /v1/codex/responses alias is needed because pi-agent's openai-codex-responses wire API appends /codex/responses to the base URL. The handler is shared -- no code duplication.

Files changed

File Change
packages/backend/src/types/unified.ts add cacheRoutingHeaders to UnifiedChatRequest interface
packages/backend/src/routes/inference/responses.ts capture incoming session_id/x-client-request-id headers with prompt_cache_key fallback; extract handler for dual-route registration; add /v1/codex/responses alias
packages/backend/src/services/dispatcher.ts forward cacheRoutingHeaders to upstream in setupHeaders()

Configuration note

No config schema changes needed. Providers that support the Responses API can already be configured using the existing api_base_url record format:

clawbay-direct:
  api_key: "ca_v1.YOUR_TOKEN"
  api_base_url:
    responses: "https://api.theclawbay.com/backend-api/codex"
  models:
    gpt-5.4:
      access_via: ["responses"]

This makes getProviderTypes() return ['responses'], the incoming type matches directly, pass-through optimization activates, and the cache routing headers are forwarded.

OAuth-based providers (api_base_url: oauth://) also work -- the OAuth path handles codex responses natively.

Verification

  • bun run typecheck -- zero type errors from changed files (all errors are pre-existing in test files, frontend, and index.ts bundle types)
  • cd packages/backend && bun test src/routes/inference/__tests__/auth.test.ts src/services/__tests__/dispatcher-failover.test.ts -- 35 tests pass, 0 failures
  • deployed to local plexus instance at 192.168.66.12:4000 against live config/SQLite:
    • /v1/codex/responses accepts requests and returns responses correctly
    • pi-agent with openai-codex-responses wire API (baseUrl http://192.168.66.12:4000/v1) successfully connects through plexus to theclawbay OAuth provider
    • session_id and x-client-request-id headers forwarded to upstream
    • no regression on existing /v1/responses endpoint -- pass-through optimization active for responses-to-responses routing

@zicochaos
Copy link
Copy Markdown
Contributor Author

The claude-review check failure is unrelated to this PR -- it's an OIDC token permissions issue in the Claude Code Review workflow (added in #164). The error is:

Could not fetch an OIDC token. Did you remember to add `id-token: write` to your workflow permissions?

The workflow likely needs id-token: write in its permissions block and/or the ANTHROPIC_API_KEY secret configured for fork PRs.

Copy link
Copy Markdown
Owner

@mcowger mcowger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but it appears you accidentally captured some changes to the workflow definition.

Can you remove those (thanks for the note about the token fix) and then I'll merge?

Sebastian Bochna and others added 2 commits April 15, 2026 08:47
Clients (Codex CLI, pi-agent, hermes-agent) send session_id and
x-client-request-id headers for server-side cache routing. Without
these headers, upstream providers (theclawbay, OpenAI) cannot route
requests to the same backend server that holds cached prompt prefixes,
causing every request to be a cache miss.

Changes:
- types/unified.ts: add cacheRoutingHeaders to UnifiedChatRequest
- routes/inference/responses.ts: capture session_id and
  x-client-request-id from incoming request headers, with fallback
  to prompt_cache_key from the body
- services/dispatcher.ts: forward cache routing headers to upstream
  in setupHeaders()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Codex CLI and pi-agent with openai-codex-responses wire API send
requests to /v1/codex/responses. Register the same handler on both
/v1/responses and /v1/codex/responses so codex clients can go through
plexus without URL mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@zicochaos zicochaos force-pushed the feat/forward-cache-routing-headers branch from 6d8eb96 to 543b95c Compare April 15, 2026 07:48
@zicochaos
Copy link
Copy Markdown
Contributor Author

Removed the workflow file changes -- force-pushed with only the 3 intended files:

  • packages/backend/src/types/unified.ts
  • packages/backend/src/routes/inference/responses.ts
  • packages/backend/src/services/dispatcher.ts

Those workflow files came from a sync merge with upstream before branching. Sorry about that.

@mcowger mcowger merged commit aaac2c6 into mcowger:main Apr 15, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants