Skip to content

how to contribute testing

Nik Anand edited this page Jun 15, 2026 · 2 revisions

Testing

DroidProxy uses XCTest. The test target is CLIProxyMenuBarTests, defined in /Users/nikhilanand/droidproxy/src/Package.swift and located at /Users/nikhilanand/droidproxy/src/Tests/CLIProxyMenuBarTests/.

Running tests

cd src && swift test

Run a single test class or method while iterating:

cd src && swift test --filter ClaudeThinkingBlockSanitizerTests
cd src && swift test --filter ClaudeThinkingBlockSanitizerTests/testStripsThinkingAfterNormalUserContent

Current coverage

The target holds four test files, all under /Users/nikhilanand/droidproxy/src/Tests/CLIProxyMenuBarTests/:

Test file Covers
ClaudeThinkingBlockSanitizerTests.swift ClaudeThinkingBlockSanitizer.sanitize(_:) — stale-thinking stripping, clustered-merge handling, placeholder behavior.
ThinkingProxySonnetMaxThinkingTests.swift ThinkingProxy.applySonnetMaxThinking(in:) — Sonnet 4.6 effort:max → classic extended thinking.
DroidProxyModelCatalogTests.swift DroidProxyModelCatalog.settingsModels() reasoning metadata for Fable 5 and Sonnet 4.6.
OAuthUsageTrackerTests.swift OAuthUsageTracker.parseClaudeWindows(_:) — percent clamping, edge cases, malformed JSON.

The largest suite is ClaudeThinkingBlockSanitizerTests, exercising ClaudeThinkingBlockSanitizer.sanitize(_:) — the code that removes stale Claude thinking / redacted_thinking blocks from request bodies without breaking role alternation or Anthropic's prompt cache. See Thinking proxy for the runtime context.

Sanitizer test cases:

  • Preserves thinking for trailing tool_results (testPreservesThinkingForTrailingToolResults, plus the redacted_thinking variant) — when the last user turn is only tool_result blocks answering the preceding assistant's tool_use, the assistant's thinking is the active cycle and must be kept. Asserts the request is returned byte-for-byte unchanged.
  • Strips stale thinking after normal user content (testStripsThinkingAfterNormalUserContent, plus the redacted_thinking variant) — when a plain user text turn follows the tool cycle, the old thinking block is removed while the surrounding tool_use, tool_result, and text blocks survive.
  • Preserves the latest assistant tool-use cycle (testStripsEarlierAssistantThinkingWhenLatestToolResultCycleIsPreserved) — with two assistant tool-use cycles, the older thinking (signature:"old") is stripped while the most recent (signature:"new") is preserved; both tool_use_ids remain.
  • Preserves thinking when a trailing tool_result also carries text (testPreservesThinkingWhenTrailingToolResultIncludesText) — a final user turn with both a tool_result and a text block still counts as the active tool cycle, so thinking is kept.
  • Replaces emptied content with a placeholder (testReplacesEmptiedAssistantContentWithPlaceholder) — if stripping a block would leave an assistant message with "content":[] (which Anthropic rejects), the content is replaced with placeholder text so the assistant turn survives. Asserts the output contains no "content":[], no "type":"thinking", is valid JSON, and that roles still alternate user, assistant, user.
  • Strips clustered thinking in the latest assistant turn (testStripsClusteredThinkingInLatestAssistantTurn, plus the interposed-text and redacted_thinking variants) — when the latest assistant turn clusters two or more thinking blocks before its tool_use blocks (the shape Anthropic rejects), its thinking is stripped instead of preserved, while tool_use/tool_result ids survive and the output stays valid JSON.
  • Preserves valid interleaved / single-leading thinking (testPreservesInterleavedThinkingInLatestAssistantTurn, testPreservesSingleLeadingThinkingBeforeParallelToolUse) — correctly interleaved thinking and a single leading thinking block before parallel tool_use are left byte-for-byte unchanged, confirming the clustered-merge guard only fires on the malformed shape.

How these tests are written

The tests feed raw JSON request strings into the sanitizer and assert on the string contents of the result (XCTAssertTrue(sanitized.contains(...)), XCTAssertEqual(sanitized, request) for no-op cases) plus JSON validity (JSONSerialization.jsonObject(with:)). A private roleSequence(_:) helper parses the output to verify role alternation. There are no fixtures or mocks — each case is a self-contained literal request.

What to test when you change the proxy

The proxy's surgical JSON editing is the riskiest code in the repo: it mutates request bodies in flight via string-range insertion (never re-serialization, to preserve the prompt cache — see Patterns and conventions). A wrong byte range can corrupt JSON or break Anthropic's cache. When you touch ClaudeThinkingBlockSanitizer, the JSON-location helpers, fast-mode injection, or path/model rewriting, add cases that:

  • Assert the output is still valid JSON (JSONSerialization.jsonObject(with:)).
  • Assert role alternation is preserved where messages are involved.
  • Cover the no-op path with XCTAssertEqual(sanitized, request) so unrelated requests pass through byte-for-byte.
  • Exercise edge cases: empty content arrays, multiple cycles, blocks at the start/end of an array, and redacted_thinking alongside thinking.

Manual / UI testing

There is no UI test harness. Exercise the menu-bar app, settings window, and live request handling manually with /Users/nikhilanand/droidproxy/dev-relaunch.sh (see Development workflow) while watching the per-request log at /tmp/droidproxy-debug.log. See Debugging for what those log lines mean.

Clone this wiki locally