Skip to content

systems thinking proxy

Nik Anand edited this page Jun 15, 2026 · 3 revisions

Thinking proxy

Active contributors: Ran, Nik

Purpose

The thinking proxy is a thin TCP HTTP proxy that listens on localhost:8317 and is the endpoint Droid CLI connects to. It is built directly on Apple's Network.framework (NWListener / NWConnection) rather than a high-level HTTP server, so it can stream large request and response bodies and edit raw bytes without re-serializing JSON.

Its job is to receive each HTTP request, apply a small, closed set of mutations, and forward the request to the bundled cli-proxy-api backend on 127.0.0.1:8318 (or, for Cursor models, to the Cursor API). Reasoning effort is owned by Droid CLI — the proxy forwards whatever reasoning/thinking values the client sends and does not inject them.

The whole proxy lives in src/Sources/ThinkingProxy.swift. Claude thinking-block stripping is delegated to src/Sources/ClaudeThinkingBlockSanitizer.swift, which is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift.

Directory layout

src/Sources/ThinkingProxy.swift                                  # the proxy (listener, parsing, mutations, forwarding)
src/Sources/ClaudeThinkingBlockSanitizer.swift                   # strips stale Claude thinking/redacted_thinking blocks
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift  # sanitizer unit tests
src/Tests/CLIProxyMenuBarTests/ThinkingProxySonnetMaxThinkingTests.swift  # Sonnet 4.6 max-thinking transform tests

Key abstractions

Type / function File Purpose
ThinkingProxy src/Sources/ThinkingProxy.swift The whole proxy: listener lifecycle, request parsing, mutation pipeline, forwarding, response streaming.
start() / startListener(allowCustomBindAddress:) / stop() src/Sources/ThinkingProxy.swift Listener lifecycle; custom bind-address handling with fallback to all-interfaces on failure.
receiveNextChunk(from:accumulatedData:) src/Sources/ThinkingProxy.swift Iterative request accumulation honoring Content-Length.
processRequest(data:connection:) src/Sources/ThinkingProxy.swift The decision tree: routing, mutations, path rewrites, logging.
RequestJSONFields / inspectRequestJSONFields(in:) src/Sources/ThinkingProxy.swift Surgical extraction of model / service_tier / thinking locations without full JSON parse.
headersForForwarding(_:requestFields:) / shouldRequestVisibleClaudeThinking(_:) src/Sources/ThinkingProxy.swift Anthropic-Beta header rewriting for visible Claude thinking.
processOpenAIFastMode(jsonString:path:fields:) src/Sources/ThinkingProxy.swift Injects service_tier:"priority" for enabled GPT 5.x fast-mode models.
applySonnetMaxThinking(jsonString:fields:) src/Sources/ThinkingProxy.swift Converts a Sonnet 4.6 output_config.effort:max request to classic extended thinking and pins max_tokens.
rewriteAntigravityModelAlias(...) / rewriteCursorModelAlias(...) src/Sources/ThinkingProxy.swift Surgical model-name aliasing.
forwardRequest(...) / streamNextChunk(...) / finishStreaming(target:client:) src/Sources/ThinkingProxy.swift Forward to :8318 backend and stream the response back.
forwardToCursor(...) / loadCursorApiKey() / receiveCursorResponse(...) src/Sources/ThinkingProxy.swift Cursor model routing to cursor-api.standardagents.ai.
findObjectFieldLocations(...) / consumeJSONValue(...) / parseJSONStringToken(...) src/Sources/ThinkingProxy.swift Hand-written JSON scanner used for all surgical edits.
ClaudeThinkingBlockSanitizer.sanitize(_:) src/Sources/ClaudeThinkingBlockSanitizer.swift Strips stale assistant thinking / redacted_thinking blocks.
fileLog(_:) src/Sources/ThinkingProxy.swift Appends to /tmp/droidproxy-debug.log.

How it works

Starting and stopping the listener

start() calls startListener(allowCustomBindAddress: true). The listener uses NWParameters.tcp with allowLocalEndpointReuse = true on port 8317. If AppPreferences.bindAddress is set to something other than 0.0.0.0, the listener is restricted to that address via requiredLocalEndpoint; otherwise it binds to all interfaces.

Because a user-supplied bind address can be malformed or unassigned to any local interface, the listener has a one-shot fallback: if the stateUpdateHandler reports .failed while useCustomBind is true, it cancels the failed listener and re-invokes startListener(allowCustomBindAddress: false) on a background queue, falling back to the default all-interfaces bind so a bad address can't leave the proxy permanently down. The same fallback runs if NWListener init throws. stop() cancels the listener under stateQueue and clears isRunning.

Receiving a request

handleConnection starts the connection and calls receiveRequest, which delegates to receiveNextChunk. That method reads up to 1 MiB per receive call and accumulates bytes. It locates the end of headers with a binary match on \r\n\r\n (Data([13, 10, 13, 10])) to avoid repeated O(N²) UTF-8 decodes on each chunk. Once headers are present, it parses Content-Length (parseContentLength) and keeps scheduling more reads while the body received is shorter than advertised and the stream is still open. When the body is complete (or the header/stream is truncated), it calls processRequest.

receiveNextChunk re-schedules itself asynchronously via connection.receive's completion handler rather than recursing synchronously, so a long stream of chunks does not build up the call stack.

The processRequest decision tree

processRequest decodes the request to a string, splits the request line into method / path / version, collects headers preserving original casing, and finds the body after \r\n\r\n. It then walks a fixed decision tree:

graph TD
    A["processRequest"] --> E{"POST with body?"}
    E -->|no| K["forwardRequest to :8318"]
    E -->|yes| S["applySonnetMaxThinking<br/>(Sonnet 4.6 effort:max -> classic)"]
    S --> F["rewriteAntigravityModelAlias"]
    F --> G{"isCursorModel?"}
    G -->|yes| G1["gate on BETA_FLAG + cursor enabled,<br/>rewriteCursorModelAlias,<br/>forwardToCursor"]
    G -->|no| H["processOpenAIFastMode<br/>(inject service_tier)"]
    H --> I{"isClaudeModel?"}
    I -->|yes| I1["ClaudeThinkingBlockSanitizer.sanitize"]
    I -->|no| J["reasoningSummaryLog -> fileLog"]
    I1 --> J
    J --> L{"responses path AND<br/>OAuth Code Assist Gemini?"}
    L -->|yes| L1["rewrite /responses -> /chat/completions"]
    L -->|no| M["headersForForwarding<br/>(Anthropic-Beta rewrite)"]
    L1 --> M
    M --> K
Loading

Notable branches:

  • Sonnet 4.6 max-thinking conversionapplySonnetMaxThinking runs first, before any model-alias rewrite. See the subsection below.
  • Antigravity model alias rewriterewriteAntigravityModelAlias maps client-facing aliases (ag-c46s-thinkingclaude-sonnet-4-6, ag-c46o-thinkingclaude-opus-4-6-thinking) by surgically replacing just the model value.
  • Cursor model gating + rewrite + forward — when the model starts with cursor-, the proxy requires BETA_FLAG and the cursor provider to be enabled (isCursorEnabled), rewrites the alias (cursor-composer-2.5composer-2.5), then forwards directly to the Cursor API with forwardToCursor. Failures return 400.
  • OpenAI fast modeprocessOpenAIFastMode injects service_tier:"priority"; see ../features/fast-mode.md.
  • Claude thinking-block sanitization — for Claude models, ClaudeThinkingBlockSanitizer.sanitize strips stale assistant thinking; see the subsection below.
  • Reasoning summary logreasoningSummaryLog builds the REQUEST REASONING: line written to /tmp/droidproxy-debug.log.
  • Gemini path rewrite — for /v1/responses (and /api/v1/responses) on an OAuth Code Assist Gemini model, the path is rewritten to /v1/chat/completions.

What it does and does not mutate

The proxy applies only this closed set of mutations:

  1. Sonnet 4.6 max-thinking conversion (output_config.effort:max → classic extended thinking).
  2. Anthropic-Beta header rewrite for visible Claude thinking.
  3. service_tier:"priority" injection for enabled GPT 5.x fast-mode models.
  4. Antigravity and Cursor model-name alias rewrites.
  5. Claude thinking-block sanitization (stripping stale thinking / redacted_thinking).
  6. Path rewrite: OAuth Code Assist Gemini /v1/responses/v1/chat/completions.

Apart from the Sonnet 4.6 max-thinking conversion, it does not inject reasoning, reasoning_effort, thinking, output_config, budget_tokens, generationConfig.thinkingConfig, or any other reasoning field. Those are owned entirely by Droid CLI and forwarded unchanged. See ../features/reasoning-and-models.md.

Sonnet 4.6 max-thinking conversion

Sonnet 4.6 exposes a max reasoning effort in Droid's selector, but Sonnet's adaptive thinking rejects output_config.effort:max upstream with HTTP 400 ("level max not supported"). applySonnetMaxThinking is the one mutation that rewrites a reasoning field, and only for this exact case.

When the request's model is claude-sonnet-4-6 and sonnetRequestsMaxEffort finds output_config.effort == "max", the proxy:

  1. Replaces (or, if absent, inserts after model) the thinking field with classic extended thinking: {"type":"enabled","budget_tokens":63999}.
  2. Pins max_tokens to 64000 (the model's output ceiling) so the budget stays strictly below it — budget_tokens must be less than max_tokens.

Lower efforts return nil from the transform and pass through as adaptive untouched. The sonnetRequestsMaxEffort scan is scoped to the Sonnet path so non-Sonnet requests never pay for the extra output_config scan (which would otherwise defeat the routing scan's early-exit before the large messages array). Edits are surgical in-place value replacements / single insertions to preserve JSON key ordering. applySonnetMaxThinking(in:) is the static test entry point exercised by src/Tests/CLIProxyMenuBarTests/ThinkingProxySonnetMaxThinkingTests.swift.

Anthropic-Beta header rewriting

When a Claude request enables thinking, Anthropic otherwise emits only signed empty thinking blocks unless the redact-thinking-2026-02-12 beta is dropped and the visible-thinking beta set is present. headersForForwarding gates on shouldRequestVisibleClaudeThinking, which is true only when the model is a Claude model (claude- or gemini-claude- prefix) and the request's thinking.type is enabled, adaptive, or auto.

When that holds, headersWithVisibleClaudeThinkingBetas:

  1. Collects every existing Anthropic-Beta value (comma-split) and appends Config.claudeVisibleThinkingBetas:
    static let claudeVisibleThinkingBetas = [
        "claude-code-20250219",
        "oauth-2025-04-20",
        "interleaved-thinking-2025-05-14",
        "context-management-2025-06-27",
        "prompt-caching-scope-2026-01-05",
        "structured-outputs-2025-12-15",
        "fast-mode-2026-02-01",
        "token-efficient-tools-2026-03-28"
    ]
  2. De-duplicates case-insensitively and drops redact-thinking-2026-02-12.
  3. Re-emits a single Anthropic-Beta header with the merged, comma-joined list.

Surgical JSON helpers

The proxy never round-trips request bodies through JSONSerialization, because that reorders object keys alphabetically and would break Anthropic's prompt-cache matching (the cache key depends on exact byte ordering). Instead it uses a hand-written scanner that locates fields by String.Index range and edits the raw string in place.

Two performance details matter:

  • routingInspectionKeys vs reasoningLogInspectionKeys. inspectRequestJSONFields scans only the small routingInspectionKeys set (model, service_tier, thinking). Because findObjectFieldLocations early-exits as soon as it has found all requested keys, routing decisions usually finish before the scanner ever reaches the (potentially huge) messages array. The debug-only keys (reasoning, reasoning_effort, output_config, generationConfig) are scanned separately in reasoningSummaryLog, and only when about to log, so a missing optional key never forces routing to traverse messages.
  • The scanner primitives. findObjectFieldLocations walks an object's key/value pairs; consumeJSONValue advances past a scalar, string, or composite value (delegating nested {}/[] to consumeCompositeJSONValue, which tracks string/escape state and brace depth); parseJSONStringToken reads a quoted string honoring \ escapes. Each returns the String.Index range of the matched span, which callers use with replaceSubrange / insert to mutate exact bytes.

Claude thinking-block sanitizer

ClaudeThinkingBlockSanitizer.sanitize (in src/Sources/ClaudeThinkingBlockSanitizer.swift) removes stale assistant reasoning blocks of type thinking and redacted_thinking from the messages array before forwarding.

The core rule: the latest active tool-use turn keeps its thinking. latestAssistantIndexWithTrailingToolResults walks backward from the end of messages across a run of trailing user turns that contain a tool_result block (isToolResultTurn); if that run is immediately preceded by an assistant turn, that assistant index is preserved. Anthropic requires the thinking block to remain on the assistant turn whose tool calls are still being answered, so stripping it would break the request. Every other assistant turn has its thinking / redacted_thinking blocks removed.

Clustered-merge exception. Droid sometimes squashes several reasoning/tool steps into one assistant message, producing a "clustered" layout where every thinking block sits before every tool_use block ([thinking, thinking, …, tool_use, tool_use, …]) instead of the valid interleaved ordering (thinking → tool_use → thinking → tool_use …). Anthropic rejects that turn with a 400 because the signed thinking sequence is out of order — so preserving it is exactly what triggers the failure. isClusteredThinkingMerge detects this shape (two or more removable thinking blocks whose last index precedes the first tool_use index) on the turn that would otherwise be preserved, and when it matches, preservation is dropped so the thinking is stripped like every other stale turn. The >= 2 thinking guard keeps the common single-thinking-then-parallel-tools turn untouched, and clean interleaved turns (even with multiple thinking blocks) stay preserved. Stripping leaves the tool_use blocks and their matching tool_result ids intact, keeping the request valid.

Two edge cases are handled explicitly:

  • Empty content guard. If removing the blocks would leave "content":[] (which Anthropic rejects), the content array is replaced with emptyContentPlaceholder = [{"type":"text","text":"..."}] instead of being emptied or the message dropped, so user/assistant role alternation is preserved.
  • Surgical removal. Like the proxy, the sanitizer edits raw string ranges (grouping adjacent removable blocks so commas stay valid) rather than re-serializing.

The sanitizer is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift:

Test What it proves
testPreservesThinkingForTrailingToolResults Thinking on the assistant turn answered by trailing tool_results is left untouched.
testPreservesRedactedThinkingForTrailingToolResults Same for redacted_thinking.
testStripsThinkingAfterNormalUserContent Thinking is stripped once a normal (non-tool-result) user turn follows.
testStripsRedactedThinkingAfterNormalUserContent Same for redacted_thinking.
testStripsEarlierAssistantThinkingWhenLatestToolResultCycleIsPreserved Only the latest tool-use cycle's thinking (signature:"new") is kept; the older one (signature:"old") is stripped.
testPreservesThinkingWhenTrailingToolResultIncludesText A trailing user turn mixing tool_result + text still counts as an active tool-result turn, so thinking is preserved.
testReplacesEmptiedAssistantContentWithPlaceholder An assistant turn whose only block was thinking becomes the placeholder text instead of "content":[]; roles still alternate user, assistant, user.
testStripsClusteredThinkingInLatestAssistantTurn A latest assistant turn with clustered thinking before its tool_use blocks has its thinking stripped (not preserved); tool_use/tool_result ids survive and output is valid JSON.
testStripsClusteredThinkingWithInterposedTextBlock A text block between the clustered thinking and tool_use blocks does not stop the clustered turn from being stripped.
testStripsClusteredRedactedThinkingInLatestAssistantTurn Same clustered stripping for redacted_thinking.
testPreservesInterleavedThinkingInLatestAssistantTurn Correctly interleaved thinking → tool_use → thinking → tool_use is left byte-for-byte unchanged even with two thinking blocks.
testPreservesSingleLeadingThinkingBeforeParallelToolUse A single leading thinking block followed by parallel tool_use blocks stays preserved (the >= 2 guard).

Forwarding and response streaming

forwardRequest opens a plain TCP NWConnection to 127.0.0.1:8318, rebuilds the request line and headers (excluding content-length, host, transfer-encoding), overrides Host to the backend, always sets Connection: close (the proxy does not support keep-alive/pipelining), recomputes Content-Length from the possibly-mutated body, and sends it. On .failed it returns 502.

receiveResponsestreamNextChunk streams the backend's response back to the client in ≤64 KiB chunks, again re-scheduling asynchronously rather than recursing. When the stream completes it signals end-of-stream (send(content: nil, isComplete: true)) and cancels both connections. finishStreaming is the shared idempotent teardown.

Cursor forwarding. forwardToCursor opens a TLS connection to cursor-api.standardagents.ai:443, strips the client authorization header, and injects Authorization: Bearer <key> where the key comes from loadCursorApiKey — which scans ~/.cli-proxy-api/ for the first enabled *.json whose type is cursor and reads its apiKey. receiveCursorResponse streams the reply back verbatim.

Debug log

fileLog appends ISO-8601-timestamped lines to /tmp/droidproxy-debug.log on a dedicated serial queue. Per request it can emit INCOMING REQUEST, REWRITE MODEL, INJECTED service_tier=priority, SANITIZED CLAUDE THINKING BLOCKS, REWRITE PATH, and the reasoning summary. The reasoning line has the form:

REQUEST REASONING: model=<model> reasoning=<snippet> reasoning_effort=<snippet> thinking=<snippet> ...

Fields appear in reasoningSummaryOrder (reasoning, reasoning_effort, thinking, output_config, service_tier, generationConfig), each raw value truncated to reasoningSummarySnippetLimit (512) chars with CR/LF flattened to spaces. If only model= would be present, no line is emitted.

Integration points

  • cli-proxy-api backend (127.0.0.1:8318). Default forward target via forwardRequest. The backend process is owned by ServerManager — see server-manager.md.
  • cursor-api.standardagents.ai:443. Cursor model requests, authenticated with the key from cursor.json in the auth directory.
  • AppPreferences. Reads bindAddress (listener bind) and the fast-mode flags gpt54FastMode, gpt55FastMode.
  • UserDefaults enabledProviders. isCursorEnabled reads this dictionary to gate Cursor routing.
  • AuthPaths.authDirectory (~/.cli-proxy-api/). Source of the Cursor API key.
  • /tmp/droidproxy-debug.log. Per-request diagnostics.

Entry points for modification

  • Add/adjust a request mutation: edit the decision tree in processRequest. Keep mutations surgical — reuse inspectRequestJSONFields and the JSON scanner helpers; never re-serialize the body. See ../how-to-contribute/patterns-and-conventions.md.
  • New model alias: add to antigravityModelAliases / cursorModelAliases.
  • Change visible-thinking betas: edit Config.claudeVisibleThinkingBetas / claudeRedactedThinkingBeta.
  • Fast-mode model set: edit the switch in processOpenAIFastMode (and the matching AppPreferences flag).
  • Sonnet max-thinking transform: edit applySonnetMaxThinking / sonnetRequestsMaxEffort and the sonnetMaxThinking* constants; extend ThinkingProxySonnetMaxThinkingTests.
  • Thinking-block stripping rules: edit ClaudeThinkingBlockSanitizer and extend ClaudeThinkingBlockSanitizerTests.
  • A new routing target / path rewrite: add a branch in processRequest and a dedicated forwardTo… if it needs a different upstream.

Key source files

File Role
src/Sources/ThinkingProxy.swift The proxy: listener, request parsing, mutation pipeline, forwarding, response streaming, Cursor routing, debug log.
src/Sources/ClaudeThinkingBlockSanitizer.swift Strips stale Claude thinking / redacted_thinking blocks while preserving the latest active tool-use turn.
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift Unit tests pinning the sanitizer's preserve/strip rules, clustered-merge stripping, and the empty-content placeholder behavior.
src/Tests/CLIProxyMenuBarTests/ThinkingProxySonnetMaxThinkingTests.swift Unit tests for the Sonnet 4.6 effort:max → classic extended thinking conversion.

Related pages

Clone this wiki locally