A small OpenAI-compatible HTTP proxy that forwards /v1/chat/completions
traffic to the Claude Agent SDK,
so IDEs and tools that only speak the OpenAI format can use a Claude subscription.
Built primarily for Perception.cx but works with any OpenAI-compatible client (Continue, Cursor, etc.).
npm install
cp .env.example .env # optional; defaults are sane
node server.jsThe proxy listens on :4001 by default. Point your IDE at
http://localhost:4001/v1 and pick sonnet, opus, or haiku as the model.
- OpenAI-compatible:
/v1/chat/completions(streaming + non-streaming),/v1/models. Tool calls are translated between the OpenAI JSON format and Claude's native tool format in both directions. - Persistent SDK sessions: an LRU-keyed pool reuses warm Claude sessions across turns. Same conversation prefix + tool list = cache hit, dropping warm-turn latency by 50–70%. Pool size and idle TTL are tunable.
- Per-project context: hijacks the IDE's
update_notestool and routes it to<workspace>/.proxy/context.md. Context is workspace-scoped, so notes for one game don't bleed into a different project. Workspace root is inferred from absolute paths in tool calls and tool results. - Auto-fallback: on
529/overloaded_error/rate_limit_error, idle timeout, or the SDK's ~120K-token "Usage Policy" refusal, the proxy walksopus → sonnet → haikuand retries on a fresh AbortController. The IDE sees a status note in the stream. - Thinking visibility: surfaces extended-thinking deltas as
<think>...</think>blocks (default),reasoning_contentdeltas (DeepSeek/Cursor convention), or both. - Image input: data URLs and
image_urlfields on user messages are decoded and passed through as native Claude image blocks. - Built-in FS tools across every drive:
Read/Glob/Grepare executed inside the proxy via an internal text-tool protocol, so the model can search the whole filesystem (every drive root on Windows,/on Unix) instead of being capped at the IDE workspace. OptionalBash,Write/Edit, andWebSearchtools are registered as native SDK tools when enabled. - 1M-context beta on by default: the
context-1m-2025-08-07beta is enabled out of the box so long conversations don't get clipped on plans where Opus isn't auto-upgraded. Toggle withCLAUDE_1M_CONTEXT=0or override the full list withCLAUDE_BETAS. - Input-size guards: per-tool-result, per-message, and total-prompt char caps clip oversized history (e.g. a 5MB tool dump) before it reaches the SDK, preserving the tail of the conversation so the active query stays intact.
- Two-tier timeouts: a hard wall-clock cap plus an idle-token cap. The idle cap resets on every streaming delta and catches silently stuck upstreams that would otherwise hang for minutes.
All configuration is via environment variables. See .env.example
for the full annotated list with defaults.
Core
| Variable | Default | Notes |
|---|---|---|
PORT |
4001 |
HTTP listen port |
CLAUDE_MODEL |
sonnet |
default alias when the request doesn't specify |
CLAUDE_THINKING |
high |
off / low / medium / high / max |
STREAM_THINKING |
tags |
off / tags / reasoning_content / both |
CLAUDE_FALLBACK |
1 |
0 disables the opus→sonnet→haiku fallback |
INCLUDE_GLOBAL_NOTES |
1 |
0 strips the IDE's global notes block entirely |
Sessions & timeouts
| Variable | Default | Notes |
|---|---|---|
CLAUDE_SESSIONS |
1 |
0 disables the LRU session pool (one-shot) |
CLAUDE_SESSION_MAX |
20 |
warm sessions held in the pool |
CLAUDE_SESSION_TTL_MS |
1800000 |
per-session idle TTL |
CLAUDE_REQUEST_TIMEOUT_MS |
600000 |
hard wall-clock cap per request |
CLAUDE_IDLE_TIMEOUT_MS |
120000 |
abort if no streaming token within this window |
Built-in SDK tools
Read / Glob / Grep are executed inside the proxy (this is what gives
the model access to drives outside the IDE workspace); the rest are
registered with the SDK as native tools when enabled.
| Variable | Default | Notes |
|---|---|---|
ENABLE_FS_TOOLS |
1 |
0 drops Read/Glob/Grep |
ENABLE_BASH_TOOL |
0 |
1 adds Bash (shell exec) |
ENABLE_WRITE_TOOLS |
0 |
1 adds Write + Edit |
ENABLE_WEB_SEARCH |
0 |
1 adds the SDK WebSearch tool |
EXTRA_SDK_TOOLS |
(empty) | comma-separated extra SDK tool names |
FS_ADDITIONAL_DIRS |
(all drives) | comma-separated roots; defaults to every drive on Win, / elsewhere |
Internal FS tool resource caps
| Variable | Default | Notes |
|---|---|---|
INTERNAL_READ_MAX_BYTES |
2097152 |
max bytes returned by one Read |
INTERNAL_GLOB_MAX |
1000 |
max paths returned by Glob |
INTERNAL_GREP_MAX |
200 |
max matches returned by Grep |
INTERNAL_RESULT_MAX_CHARS |
100000 |
char cap on any single tool result |
INTERNAL_WALK_DEPTH |
12 |
max recursion depth |
MAX_INTERNAL_TURNS |
20 |
sequential internal-tool calls per request |
INTERNAL_TOOL_TIMEOUT_MS |
60000 |
per-call timeout |
Input-size guards
| Variable | Default | Notes |
|---|---|---|
MAX_TOOL_RESULT_CHARS |
100000 |
per-tool-result body cap (≈ 25K tokens) |
MAX_HISTORY_MESSAGE_CHARS |
200000 |
per non-tool history message cap |
MAX_TOTAL_PROMPT_CHARS |
3000000 |
total chars across all messages (≈ 750K tokens) |
PROTECT_LAST_N_MESSAGES |
2 |
never clip the tail (active query stays intact) |
SDK betas
| Variable | Default | Notes |
|---|---|---|
CLAUDE_1M_CONTEXT |
1 |
0 disables the 1M-context beta |
CLAUDE_BETAS |
context-1m-2025-08-07 |
comma-separated; overrides CLAUDE_1M_CONTEXT |
Observability
| Variable | Default | Notes |
|---|---|---|
VERBOSE |
0 |
1 enables [proxy] logs (also --verbose) |
LIVE |
0 |
1 enables the colored live request feed (--live) |
LOG_REQUESTS |
0 |
1 dumps every request body to logs/requests/ |
PROXY_TEST_HOOKS |
0 |
1 exposes internals on module.exports for tests |
POST /v1/chat/completions # OpenAI-compatible, streaming or one-shot
GET /v1/models # lists sonnet, opus, haiku, plus a few aliases
GET / # health
GET /debug/status # loopback-only; runtime config + session count
GET /debug/last-exchange # loopback-only; the last full request/response
GET /debug/last-request # loopback-only; the last logged request body
GET /debug/exchanges # loopback-only; recent exchange ring
GET /debug/tools # loopback-only; tool catalog from the last request
GET /debug/workspace # loopback-only; resolved per-project workspace cache
POST /debug/workspace/reset # loopback-only; clear the workspace cache
Debug endpoints are bound to loopback only and reject non-127.0.0.1 callers.
Whenever the model invokes the IDE's update_notes tool, the proxy intercepts
the content and writes it to <workspace_root>/.proxy/context.md instead of
the IDE's single global notes pool. On the next request, the proxy strips the
IDE's === YOUR WORKING NOTES === block from the system prompt and replaces
it with the project's own context.
Workspace root is inferred from any absolute file path the model or IDE has
mentioned. Detection is sticky across turns, since Perception elides
tool_call/tool_result from history each turn. If the wrong root is picked
up, POST /debug/workspace/reset clears the cache.
The INCLUDE_GLOBAL_NOTES=0 env var drops the IDE's globals entirely;
otherwise they're re-emitted as a clearly-labelled read-only "GLOBAL NOTES"
reference block alongside per-project notes.
npm testBoots the proxy on :4099, runs 28 unit + integration cases (message
conversion, tool-call round-trip, image pass-through, usage mapping, hashing,
session reuse, streaming, fallback). Requires a working Claude subscription
since some tests hit the live API.
server.js the whole proxy (one file, ~2100 lines)
.env.example every configurable env var
package.json start | dev | test scripts
tests/ integration test harness
logs/requests/ request dumps when LOG_REQUESTS=1 (rotated, last 50)