fix: auto-redact tool_calls[*].function.arguments#588
Conversation
The un-redact path walked message.content / reasoning_content /
reasoning but not tool_calls[*].function.arguments, so when a model
emitted a tool call whose JSON arguments echoed the user's (now-
redacted) PII, the placeholder leaked to the client unchanged.
Caught while testing an agentic round-trip on staging:
POST /v1/chat/completions x-auto-redact: on
messages=[{user, "Send a thank-you to alice.chen@gmail.com ..."}]
tools=[{send_email(to, subject, body)}]
Provider (claude-sonnet-4-6) correctly received <email1><email2> in
its prompt, but the resulting tool_calls[0].function.arguments came
back as {"to":"<email1>",...} instead of {"to":"alice.chen@gmail",...}.
Fix:
- Non-streaming (unredact_chat_response_in_place): walk
message.tool_calls[*].function.arguments and apply map.unredact.
- Streaming (unredact_chunk_in_place + StreamUnredactStates): add a
per-(choice_index, tool_call_index) sliding-tail state map so
arguments JSON streamed in fragments handles split placeholders
the same way content does.
New e2e test auto_redact_unredacts_tool_call_arguments configures
the MockProvider to emit a send_email tool call with <email1> in
its arguments and asserts the client receives the substituted
original.
Known limitations (not addressed here; separate follow-ups):
- privacy-filter sometimes splits one PII into multiple adjacent
spans (alice.chen@gmail.com -> <email1><email2>), and a model that
treats placeholders as separate values may issue multiple tool
calls (one per placeholder). This is a privacy-filter span-merging
concern, not an un-redact bug.
- End-of-stream flush for tool_call_arguments tails is not yet
emitted as a synthetic chunk (the existing flush covers content
fields). Most tool-call args flows complete cleanly without mid-
token truncation, so deferring to a follow-up.
Critical ReviewTargeted fix; non-streaming path and the new test look correct. Two concerns worth addressing before merge, plus one nit.
|
There was a problem hiding this comment.
Pull request overview
This PR fixes an auto-redact round-trip gap in /v1/chat/completions where redacted placeholders inside tool_calls[*].function.arguments were not being un-redacted before returning to the client—particularly important for agentic flows that execute tool calls with PII in arguments.
Changes:
- Non-streaming: un-redact
choices[*].message.tool_calls[*].function.argumentsin-place using the request’sRedactionMap. - Streaming: add per-(choice_index, tool_call_index) un-redaction state so tool-call argument fragments can be safely un-redacted across chunk boundaries.
- Tests: add a new e2e test asserting provider sees placeholders while the client receives un-redacted tool-call arguments.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/api/src/routes/completions.rs | Extends streaming + non-streaming un-redaction to include tool-call function arguments and maintains per-tool-call stream state. |
| crates/api/tests/e2e_all/auto_redact.rs | Adds an e2e regression test covering un-redaction of tool-call arguments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses claude-review on #588: 1. JSON corruption (blocker): tool_calls[*].function.arguments is a JSON-encoded string. map.unredact() did raw substring replacement, so PII originals containing `"`, `\`, control chars, or non-ASCII would break the surrounding JSON. - Add RedactionMap::unredact_json_string which JSON-escapes each replacement via serde_json::to_string + strip outer quotes. - Add StreamUnredact::new_for_json_string variant for the streaming path; routes substitutions through unredact_json_string. - Non-streaming unredact_chat_response_in_place now uses the json variant for tool_calls[*].function.arguments. - Streaming unredact_chunk_in_place creates per-(choice_idx, tc_idx) state via new_for_json_string. Tests: - placeholders::unredact_json_string_{escapes_quotes, escapes_backslash, escapes_newline, safe_for_simple_pii} - stream_unredact::json_string_variant_{escapes_quote_in_replacement, escapes_across_chunk_split, no_op_for_simple_pii} 2. End-of-stream flush gap (blocker): build_flush_chunks drained content/reasoning fields but ignored tool_call_arguments — partial placeholders held in the tail were silently dropped (or could leak as literal `<email1>` if the tail contained an incomplete-but-recognizable shape). - Extend build_flush_chunks with a second pass that drains states.tool_call_arguments and emits a synthetic SSE chunk per (choice_idx, tc_idx) with the held bytes as a tool_calls delta. - Stable ordering: sort by (choice_idx, tc_idx) for deterministic output. - New e2e auto_redact_unredacts_tool_call_arguments_streaming covers the streamed-args reassembly path end-to-end. 3. Nit (collision safety): Streaming chunk handler no longer falls back to tc.index = 0 for indexless tool calls — uses enumerate position instead, so two parallel indexless tool calls in one delta don't collide on (idx, 0). Test counts: - services unit: 43 pass (was 38; +5 JSON-escape tests) - e2e auto_redact: 10 pass (was 9; +1 streaming-args test) - clippy -D warnings: clean - cargo fmt --check: clean
|
Pushed `11e8203` addressing both blockers and the nit. 1. JSON corruption (claude-review #1): Tested with names containing `"`, paths with `\`, addresses with newlines — all round-trip parse as valid JSON via `serde_json::from_str`. Simple PII (emails, phones) produces identical output to the non-JSON path so no behavior change for the common case. 2. End-of-stream flush gap (claude-review #2, also Copilot): New e2e `auto_redact_unredacts_tool_call_arguments_streaming` exercises the full streamed-args reassembly path end-to-end (mock streams arguments split by spaces, client reassembles, asserts valid JSON with the un-redacted email). 3. Nit — indexless tool-call collision: Test count:
|
There was a problem hiding this comment.
Code Review
This pull request implements un-redaction for tool call arguments in both standard and streaming chat completion responses, ensuring that PII placeholders are correctly restored before reaching the client. It includes updates to the state management for streaming responses and a new end-to-end test to verify the behavior. A review comment identifies a critical issue where the new tool_call_arguments state is not flushed in the final chunks, which could lead to truncated JSON or missing PII if a placeholder is split at the end of a stream.
|
@gemini-code-assist — appears to be a stale review (the comment is anchored to commit |
|
Thank you for the update and for verifying the implementation. I have reviewed the changes in commit |
Surfaced by an independent subagent code review of #588. Each is a distinct privacy regression vs. the design goal of "provider never sees raw PII". 1. **Input tool_calls.arguments now redacted.** In an agent loop the user resubmits the assistant's prior tool_call as part of conversation history. `collect_text_fragments` only walked `message.content`, so the JSON arguments string (which often echoes the original PII verbatim) was forwarded to the provider raw on every follow-up turn. - New `TextRef::ToolCallArg { msg_idx, tc_idx }` variant in `apply.rs`; `collect_text_fragments` + `write_back` extended to cover it. The arguments string is redacted as opaque text — our placeholders are pure ASCII (`<emailN>`) so they stay valid JSON. - Unit tests: collect_walks_assistant_tool_call_arguments, write_back_updates_tool_call_arguments. - E2E test: auto_redact_redacts_input_tool_call_arguments — sends a 4-turn history with a tool_call carrying bob@example.com and asserts the provider sees <email1>, not the raw email. 2. **Response message.refusal now un-redacted.** Safety-tuned models may quote our placeholders back ("I can't email <email1> per policy"). Without un-redacting `choice.message.refusal`, the placeholder leaked to the client. - Walk the field in `unredact_chat_response_in_place`. - E2E test: auto_redact_unredacts_refusal_field. - `ChatDelta` has no `refusal` field, so streaming has no corresponding gap. 3. **Privacy-filter error logs no longer leak upstream response body.** `PrivacyClassifyError::HttpError` carries the verbatim response.text() from the privacy-filter. A misbehaving filter that echoes its input in an error response would have routed customer PII straight to application logs via `tracing::warn!(error = %e, …)`. - Added `Self::privacy_classify_error_category(&e)` returning a bounded `&'static str` (`unauthorized`, `rate_limited`, `unavailable`, `server_error`, `client_error`, `http_other`, `request_failed`). - The "all providers failed" final error now hand-formats the status code only, bypassing `sanitize_error_message` which would have re-introduced the body via `Display`. - Demoted "Privacy classify completed successfully" from info to debug — high-volume info log on every redacted request. 4. **15-second wall-clock timeout on the redact step.** Detector retries cascade through `pool.privacy_classify` with each attempt bounded by `completion_timeout()` (default 600s) × multiple providers. Without an outer bound, a hung detector could hold the user's request hostage for tens of minutes before the 503 fires. Auto-redact is in the critical request path; we cap it tightly. - New `REDACT_TIMEOUT = Duration::from_secs(15)` constant. - `redact_messages` now wraps the inner work in `tokio::time::timeout` and maps `Elapsed` to `DetectorUnavailable`. Also includes 8 new adversarial e2e tests in `auto_redact_adversarial.rs` (from a separate subagent's testing pass): empty messages, empty content, PII in system messages, PII in multimodal content-parts arrays, repeated-PII dedup, user-supplied placeholder collision, 512 KB body, 26 MB body rejection. Test counts: - services unit: 47 pass (was 43; +4 collect/write/timeout tests) - e2e auto_redact: 12 pass (was 10; +2 new tests) - e2e auto_redact_adversarial: 8 pass (new file) - clippy -D warnings: clean - cargo fmt --check: clean
|
Two subagents reviewed the work — one critical code review, one writing adversarial e2e tests. Pushed `bedf8d7` closing all 3 remaining High-priority gaps from the review plus a timeout, and folded in the 8 adversarial tests from the test pass. Reviewer-flagged High items, all addressed:
Bonus (Medium #4): 15s outer timeout on `redact_messages` via `tokio::time::timeout`. Worst case before this was N providers × 600s per-attempt → up to ~20 min hold on the user request. Now: timeout → `DetectorUnavailable` → 503. Constant `REDACT_TIMEOUT` in `auto_redact/mod.rs`. Tester subagent: wrote `crates/api/tests/e2e_all/auto_redact_adversarial.rs` with 8 tests covering empty messages, empty content, system-message PII, multimodal parts, dedup, user-supplied placeholder collision, 512 KB body, 26 MB rejection. None surfaced a bug. Test counts now:
Re: gemini's earlier "flush gap" comment on `11e8203` — that flagged a missing flush on the latest commit, but `build_flush_chunks` Pass 2 was already there. Stale auto-review; replied separately. Items I left as deferred follow-ups (not blockers for this PR):
|
Summary
Caught while testing an agentic round-trip on staging: when a model returns a tool call whose JSON arguments echo redacted PII, the placeholder leaked to the client unchanged.
The non-streaming + streaming un-redact paths walked content + reasoning_content + reasoning but missed
tool_calls[*].function.arguments.Repro on staging before this fix:
```json
// Request: x-auto-redact: on, prompt contains "alice.chen@gmail.com"
// Response (broken):
"tool_calls":[{
"function":{
"name":"send_email",
"arguments":"{\"to\":\"\",\"subject\":\"Thanks!\",\"body\":\"...\"}"
}
}]
```
After this fix the client sees `{"to":"alice.chen@gmail.com",...}`. The provider still only ever saw ``.
Changes
Known limitations (deferred follow-ups)
Test plan
Follow-up to #585.