-
Notifications
You must be signed in to change notification settings - Fork 14
how to contribute testing
DroidProxy uses XCTest. The test target is CLIProxyMenuBarTests, defined in /Users/nikhilanand/droidproxy/src/Package.swift and located at /Users/nikhilanand/droidproxy/src/Tests/CLIProxyMenuBarTests/.
cd src && swift testRun a single test class or method while iterating:
cd src && swift test --filter ClaudeThinkingBlockSanitizerTests
cd src && swift test --filter ClaudeThinkingBlockSanitizerTests/testStripsThinkingAfterNormalUserContentThe target holds four test files, all under /Users/nikhilanand/droidproxy/src/Tests/CLIProxyMenuBarTests/:
| Test file | Covers |
|---|---|
ClaudeThinkingBlockSanitizerTests.swift |
ClaudeThinkingBlockSanitizer.sanitize(_:) — stale-thinking stripping, clustered-merge handling, placeholder behavior. |
ThinkingProxySonnetMaxThinkingTests.swift |
ThinkingProxy.applySonnetMaxThinking(in:) — Sonnet 4.6 effort:max → classic extended thinking. |
DroidProxyModelCatalogTests.swift |
DroidProxyModelCatalog.settingsModels() reasoning metadata for Fable 5 and Sonnet 4.6. |
OAuthUsageTrackerTests.swift |
OAuthUsageTracker.parseClaudeWindows(_:) — percent clamping, edge cases, malformed JSON. |
The largest suite is ClaudeThinkingBlockSanitizerTests, exercising ClaudeThinkingBlockSanitizer.sanitize(_:) — the code that removes stale Claude thinking / redacted_thinking blocks from request bodies without breaking role alternation or Anthropic's prompt cache. See Thinking proxy for the runtime context.
Sanitizer test cases:
-
Preserves thinking for trailing tool_results (
testPreservesThinkingForTrailingToolResults, plus theredacted_thinkingvariant) — when the last user turn is onlytool_resultblocks answering the preceding assistant'stool_use, the assistant's thinking is the active cycle and must be kept. Asserts the request is returned byte-for-byte unchanged. -
Strips stale thinking after normal user content (
testStripsThinkingAfterNormalUserContent, plus theredacted_thinkingvariant) — when a plain usertextturn follows the tool cycle, the old thinking block is removed while the surroundingtool_use,tool_result, andtextblocks survive. -
Preserves the latest assistant tool-use cycle (
testStripsEarlierAssistantThinkingWhenLatestToolResultCycleIsPreserved) — with two assistant tool-use cycles, the older thinking (signature:"old") is stripped while the most recent (signature:"new") is preserved; bothtool_use_ids remain. -
Preserves thinking when a trailing tool_result also carries text (
testPreservesThinkingWhenTrailingToolResultIncludesText) — a final user turn with both atool_resultand atextblock still counts as the active tool cycle, so thinking is kept. -
Replaces emptied content with a placeholder (
testReplacesEmptiedAssistantContentWithPlaceholder) — if stripping a block would leave an assistant message with"content":[](which Anthropic rejects), the content is replaced with placeholder text so the assistant turn survives. Asserts the output contains no"content":[], no"type":"thinking", is valid JSON, and that roles still alternateuser, assistant, user. -
Strips clustered thinking in the latest assistant turn (
testStripsClusteredThinkingInLatestAssistantTurn, plus the interposed-text andredacted_thinkingvariants) — when the latest assistant turn clusters two or more thinking blocks before its tool_use blocks (the shape Anthropic rejects), its thinking is stripped instead of preserved, while tool_use/tool_result ids survive and the output stays valid JSON. -
Preserves valid interleaved / single-leading thinking (
testPreservesInterleavedThinkingInLatestAssistantTurn,testPreservesSingleLeadingThinkingBeforeParallelToolUse) — correctly interleaved thinking and a single leading thinking block before parallel tool_use are left byte-for-byte unchanged, confirming the clustered-merge guard only fires on the malformed shape.
The tests feed raw JSON request strings into the sanitizer and assert on the string contents of the result (XCTAssertTrue(sanitized.contains(...)), XCTAssertEqual(sanitized, request) for no-op cases) plus JSON validity (JSONSerialization.jsonObject(with:)). A private roleSequence(_:) helper parses the output to verify role alternation. There are no fixtures or mocks — each case is a self-contained literal request.
The proxy's surgical JSON editing is the riskiest code in the repo: it mutates request bodies in flight via string-range insertion (never re-serialization, to preserve the prompt cache — see Patterns and conventions). A wrong byte range can corrupt JSON or break Anthropic's cache. When you touch ClaudeThinkingBlockSanitizer, the JSON-location helpers, fast-mode injection, or path/model rewriting, add cases that:
- Assert the output is still valid JSON (
JSONSerialization.jsonObject(with:)). - Assert role alternation is preserved where messages are involved.
- Cover the no-op path with
XCTAssertEqual(sanitized, request)so unrelated requests pass through byte-for-byte. - Exercise edge cases: empty content arrays, multiple cycles, blocks at the start/end of an array, and
redacted_thinkingalongsidethinking.
There is no UI test harness. Exercise the menu-bar app, settings window, and live request handling manually with /Users/nikhilanand/droidproxy/dev-relaunch.sh (see Development workflow) while watching the per-request log at /tmp/droidproxy-debug.log. See Debugging for what those log lines mean.