fix: auto-redact tool_calls[*].function.arguments by Evrard-Nil · Pull Request #588 · nearai/cloud-api

Evrard-Nil · 2026-05-13T13:26:07Z

Summary

Caught while testing an agentic round-trip on staging: when a model returns a tool call whose JSON arguments echo redacted PII, the placeholder leaked to the client unchanged.

The non-streaming + streaming un-redact paths walked content + reasoning_content + reasoning but missed tool_calls[*].function.arguments.

Repro on staging before this fix:

```json
// Request: x-auto-redact: on, prompt contains "alice.chen@gmail.com"
// Response (broken):
"tool_calls":[{
"function":{
"name":"send_email",
"arguments":"{\"to\":\"\",\"subject\":\"Thanks!\",\"body\":\"...\"}"
}
}]
```

After this fix the client sees `{"to":"alice.chen@gmail.com",...}`. The provider still only ever saw ``.

Changes

`unredact_chat_response_in_place`: walk `message.tool_calls[*].function.arguments` and `map.unredact` each.
`unredact_chunk_in_place` + `StreamUnredactStates`: new per-`(choice_index, tool_call_index)` sliding-tail state map, so streamed argument fragments handle split placeholders the same way content does.
New e2e test `auto_redact_unredacts_tool_call_arguments` configures the MockProvider to emit a `send_email` tool call with `` in its args and asserts the client receives the substituted original. 9 auto_redact e2e tests now pass.

Known limitations (deferred follow-ups)

privacy-filter span splitting: a single PII like `alice.chen@gmail.com` can come back as two adjacent spans ``. Some models (Claude) treat that as two values and issue two tool calls, one per placeholder. That's a privacy-filter side concern — not a redaction bug.
End-of-stream flush for tool_call_arguments tails: existing flush covers content fields; tool_call args don't yet. Most flows complete cleanly without mid-token truncation, deferred.

Test plan

`cargo check --workspace --all-targets`
`cargo clippy --workspace --all-targets -- -D warnings`
`cargo fmt --check`
`cargo test --test e2e_all auto_redact -- --test-threads 1` (9 pass)
Staging re-test once merged: re-run the agentic round-trip with `tool_choice: send_email` against Anthropic, confirm `to` field contains real email, provider still saw placeholder.

Follow-up to #585.

The un-redact path walked message.content / reasoning_content / reasoning but not tool_calls[*].function.arguments, so when a model emitted a tool call whose JSON arguments echoed the user's (now- redacted) PII, the placeholder leaked to the client unchanged. Caught while testing an agentic round-trip on staging: POST /v1/chat/completions x-auto-redact: on messages=[{user, "Send a thank-you to alice.chen@gmail.com ..."}] tools=[{send_email(to, subject, body)}] Provider (claude-sonnet-4-6) correctly received <email1><email2> in its prompt, but the resulting tool_calls[0].function.arguments came back as {"to":"<email1>",...} instead of {"to":"alice.chen@gmail",...}. Fix: - Non-streaming (unredact_chat_response_in_place): walk message.tool_calls[*].function.arguments and apply map.unredact. - Streaming (unredact_chunk_in_place + StreamUnredactStates): add a per-(choice_index, tool_call_index) sliding-tail state map so arguments JSON streamed in fragments handles split placeholders the same way content does. New e2e test auto_redact_unredacts_tool_call_arguments configures the MockProvider to emit a send_email tool call with <email1> in its arguments and asserts the client receives the substituted original. Known limitations (not addressed here; separate follow-ups): - privacy-filter sometimes splits one PII into multiple adjacent spans (alice.chen@gmail.com -> <email1><email2>), and a model that treats placeholders as separate values may issue multiple tool calls (one per placeholder). This is a privacy-filter span-merging concern, not an un-redact bug. - End-of-stream flush for tool_call_arguments tails is not yet emitted as a synthetic chunk (the existing flush covers content fields). Most tool-call args flows complete cleanly without mid- token truncation, so deferring to a follow-up.

claude · 2026-05-13T13:29:07Z

Critical Review

Targeted fix; non-streaming path and the new test look correct. Two concerns worth addressing before merge, plus one nit.

⚠️ 1. JSON-encoded `arguments` will be corrupted when the original PII contains JSON-special chars

tool_calls[*].function.arguments is a JSON-encoded string (per OpenAI's schema and how clients parse it). map.unredact(args) does a raw substring substitution — it inserts the original PII verbatim into a JSON string context. For private_email / private_phone / account_number this is fine in practice, but private_name and private_address (both real categories in placeholders.rs:29-30) can absolutely contain double-quotes, backslashes, or newlines.

Concrete failure:

User prompt contains a name like Patrick O"Brien (quote in name) → minted as <name1>.
Model emits {"to":"...","name":"<name1>"} as arguments.
After un-redact: {"to":"...","name":"Patrick O"Brien"} — broken JSON, the client's JSON.parse(arguments) throws.

Same issue for any address with a newline or any value with a backslash.

Suggested fix: JSON-escape each replacement when un-redacting inside arguments. Either a dedicated unredact_into_json_string helper, or parse the args, walk strings, un-redact, and re-serialize. Worth a test with a name/address containing a quote.

This affects both unredact_chat_response_in_place (completions.rs:378-384) and the streaming tool_call_arguments path (completions.rs:440-457).

⚠️ 2. End-of-stream flush gap for `tool_call_arguments` is more than cosmetic

The PR description marks this as a deferred follow-up, but it is a privacy regression, not just a missing tail:

build_flush_chunks (completions.rs:479-541) drains content / reasoning_content / reasoning but never touches states.tool_call_arguments. If a stream ends with a partial placeholder buffered in the sliding tail (e.g. <emai held back waiting for the rest), those bytes are silently dropped when StreamUnredactStates is dropped. Worse — under the right split, the client receives the literal placeholder <email1> because the tail did not have enough bytes to decide. Either way the un-redact contract is broken specifically for streamed tool calls.

If a full flush is genuinely out of scope here, at minimum drain tool_call_arguments and emit a synthetic chunk per (choice_idx, tc_idx) — same shape as the content flush — and add a streaming test that splits <email1> across chunk boundaries inside arguments.

Nits

tc.index.unwrap_or(0) (completions.rs:446): if a provider ever emits two tool calls in one delta, both without an index, they collide on (idx, 0) and cross-contaminate. Vanishingly rare, but a per-position fallback (e.g. enumerate) would be safer and the same line count.
No streaming test for tool calls — the streaming path is the more complex of the two changes and is currently uncovered. auto_redact_streaming_splits_placeholder_across_chunks is great prior art to copy.

⚠️ Issues found — JSON-corruption (#1) is the blocker; the flush gap (#2) should not be deferred.

Copilot

Pull request overview

This PR fixes an auto-redact round-trip gap in /v1/chat/completions where redacted placeholders inside tool_calls[*].function.arguments were not being un-redacted before returning to the client—particularly important for agentic flows that execute tool calls with PII in arguments.

Changes:

Non-streaming: un-redact choices[*].message.tool_calls[*].function.arguments in-place using the request’s RedactionMap.
Streaming: add per-(choice_index, tool_call_index) un-redaction state so tool-call argument fragments can be safely un-redacted across chunk boundaries.
Tests: add a new e2e test asserting provider sees placeholders while the client receives un-redacted tool-call arguments.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
crates/api/src/routes/completions.rs	Extends streaming + non-streaming un-redaction to include tool-call function arguments and maintains per-tool-call stream state.
crates/api/tests/e2e_all/auto_redact.rs	Adds an e2e regression test covering un-redaction of tool-call arguments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses claude-review on #588: 1. JSON corruption (blocker): tool_calls[*].function.arguments is a JSON-encoded string. map.unredact() did raw substring replacement, so PII originals containing `"`, `\`, control chars, or non-ASCII would break the surrounding JSON. - Add RedactionMap::unredact_json_string which JSON-escapes each replacement via serde_json::to_string + strip outer quotes. - Add StreamUnredact::new_for_json_string variant for the streaming path; routes substitutions through unredact_json_string. - Non-streaming unredact_chat_response_in_place now uses the json variant for tool_calls[*].function.arguments. - Streaming unredact_chunk_in_place creates per-(choice_idx, tc_idx) state via new_for_json_string. Tests: - placeholders::unredact_json_string_{escapes_quotes, escapes_backslash, escapes_newline, safe_for_simple_pii} - stream_unredact::json_string_variant_{escapes_quote_in_replacement, escapes_across_chunk_split, no_op_for_simple_pii} 2. End-of-stream flush gap (blocker): build_flush_chunks drained content/reasoning fields but ignored tool_call_arguments — partial placeholders held in the tail were silently dropped (or could leak as literal `<email1>` if the tail contained an incomplete-but-recognizable shape). - Extend build_flush_chunks with a second pass that drains states.tool_call_arguments and emits a synthetic SSE chunk per (choice_idx, tc_idx) with the held bytes as a tool_calls delta. - Stable ordering: sort by (choice_idx, tc_idx) for deterministic output. - New e2e auto_redact_unredacts_tool_call_arguments_streaming covers the streamed-args reassembly path end-to-end. 3. Nit (collision safety): Streaming chunk handler no longer falls back to tc.index = 0 for indexless tool calls — uses enumerate position instead, so two parallel indexless tool calls in one delta don't collide on (idx, 0). Test counts: - services unit: 43 pass (was 38; +5 JSON-escape tests) - e2e auto_redact: 10 pass (was 9; +1 streaming-args test) - clippy -D warnings: clean - cargo fmt --check: clean

Evrard-Nil · 2026-05-13T13:38:34Z

Pushed `11e8203` addressing both blockers and the nit.

1. JSON corruption (claude-review #1):
Added `RedactionMap::unredact_json_string` that JSON-escapes each replacement via `serde_json::to_string` + strip outer quotes. Used for the `arguments` field in both non-streaming and streaming paths. Streaming side has a new `StreamUnredact::new_for_json_string` variant that routes substitutions through the JSON-safe path.

Tested with names containing `"`, paths with `\`, addresses with newlines — all round-trip parse as valid JSON via `serde_json::from_str`. Simple PII (emails, phones) produces identical output to the non-JSON path so no behavior change for the common case.

2. End-of-stream flush gap (claude-review #2, also Copilot):
`build_flush_chunks` now does a second pass that drains `states.tool_call_arguments` and emits a synthetic tool-call SSE delta per `(choice_idx, tc_idx)` with the held bytes. No silent loss; truncated placeholders still emit literally as designed.

New e2e `auto_redact_unredacts_tool_call_arguments_streaming` exercises the full streamed-args reassembly path end-to-end (mock streams arguments split by spaces, client reassembles, asserts valid JSON with the un-redacted email).

3. Nit — indexless tool-call collision:
Replaced `tc.index.unwrap_or(0)` with `tc.index.unwrap_or(pos as i64)` where `pos` is the iter position, so two parallel indexless tool calls in one delta no longer collide on `(idx, 0)`.

Test count:

43 unit pass (was 38)
10 auto_redact e2e pass (was 9)
clippy -D warnings clean, cargo fmt clean

gemini-code-assist

Code Review

This pull request implements un-redaction for tool call arguments in both standard and streaming chat completion responses, ensuring that PII placeholders are correctly restored before reaching the client. It includes updates to the state management for streaming responses and a new end-to-end test to verify the behavior. A review comment identifies a critical issue where the new tool_call_arguments state is not flushed in the final chunks, which could lead to truncated JSON or missing PII if a placeholder is split at the end of a stream.

Evrard-Nil · 2026-05-13T13:46:25Z

@gemini-code-assist — appears to be a stale review (the comment is anchored to commit 11e8203 but the flush IS implemented there). See build_flush_chunks lines 550–612 in crates/api/src/routes/completions.rs — Pass 2 drains states.tool_call_arguments and emits a synthetic tool-call SSE delta per (choice_idx, tc_idx) before [DONE]. Covered by e2e auto_redact_unredacts_tool_call_arguments_streaming. Happy to re-trigger a fresh review if useful.

gemini-code-assist · 2026-05-13T13:46:37Z

Thank you for the update and for verifying the implementation. I have reviewed the changes in commit 11e8203 and confirm that the build_flush_chunks logic in crates/api/src/routes/completions.rs (lines 550–612) correctly drains the tool_call_arguments state and emits the necessary synthetic SSE deltas. The addition of unredact_json_string and the corresponding streaming variant successfully addresses the JSON corruption concerns, and the new e2e tests provide excellent coverage for these paths. The PR looks solid and ready for review/merge.

Surfaced by an independent subagent code review of #588. Each is a distinct privacy regression vs. the design goal of "provider never sees raw PII". 1. **Input tool_calls.arguments now redacted.** In an agent loop the user resubmits the assistant's prior tool_call as part of conversation history. `collect_text_fragments` only walked `message.content`, so the JSON arguments string (which often echoes the original PII verbatim) was forwarded to the provider raw on every follow-up turn. - New `TextRef::ToolCallArg { msg_idx, tc_idx }` variant in `apply.rs`; `collect_text_fragments` + `write_back` extended to cover it. The arguments string is redacted as opaque text — our placeholders are pure ASCII (`<emailN>`) so they stay valid JSON. - Unit tests: collect_walks_assistant_tool_call_arguments, write_back_updates_tool_call_arguments. - E2E test: auto_redact_redacts_input_tool_call_arguments — sends a 4-turn history with a tool_call carrying bob@example.com and asserts the provider sees <email1>, not the raw email. 2. **Response message.refusal now un-redacted.** Safety-tuned models may quote our placeholders back ("I can't email <email1> per policy"). Without un-redacting `choice.message.refusal`, the placeholder leaked to the client. - Walk the field in `unredact_chat_response_in_place`. - E2E test: auto_redact_unredacts_refusal_field. - `ChatDelta` has no `refusal` field, so streaming has no corresponding gap. 3. **Privacy-filter error logs no longer leak upstream response body.** `PrivacyClassifyError::HttpError` carries the verbatim response.text() from the privacy-filter. A misbehaving filter that echoes its input in an error response would have routed customer PII straight to application logs via `tracing::warn!(error = %e, …)`. - Added `Self::privacy_classify_error_category(&e)` returning a bounded `&'static str` (`unauthorized`, `rate_limited`, `unavailable`, `server_error`, `client_error`, `http_other`, `request_failed`). - The "all providers failed" final error now hand-formats the status code only, bypassing `sanitize_error_message` which would have re-introduced the body via `Display`. - Demoted "Privacy classify completed successfully" from info to debug — high-volume info log on every redacted request. 4. **15-second wall-clock timeout on the redact step.** Detector retries cascade through `pool.privacy_classify` with each attempt bounded by `completion_timeout()` (default 600s) × multiple providers. Without an outer bound, a hung detector could hold the user's request hostage for tens of minutes before the 503 fires. Auto-redact is in the critical request path; we cap it tightly. - New `REDACT_TIMEOUT = Duration::from_secs(15)` constant. - `redact_messages` now wraps the inner work in `tokio::time::timeout` and maps `Elapsed` to `DetectorUnavailable`. Also includes 8 new adversarial e2e tests in `auto_redact_adversarial.rs` (from a separate subagent's testing pass): empty messages, empty content, PII in system messages, PII in multimodal content-parts arrays, repeated-PII dedup, user-supplied placeholder collision, 512 KB body, 26 MB body rejection. Test counts: - services unit: 47 pass (was 43; +4 collect/write/timeout tests) - e2e auto_redact: 12 pass (was 10; +2 new tests) - e2e auto_redact_adversarial: 8 pass (new file) - clippy -D warnings: clean - cargo fmt --check: clean

Evrard-Nil · 2026-05-13T14:06:35Z

Two subagents reviewed the work — one critical code review, one writing adversarial e2e tests. Pushed `bedf8d7` closing all 3 remaining High-priority gaps from the review plus a timeout, and folded in the 8 adversarial tests from the test pass.

Reviewer-flagged High items, all addressed:

Input `tool_calls[*].arguments` weren't redacted. In an agent loop the user resubmits the assistant's prior tool_call as conversation history; the JSON arguments string echoed the original PII raw to the provider on every turn. Now `collect_text_fragments` walks them; new `TextRef::ToolCallArg` variant + write_back support. E2E: `auto_redact_redacts_input_tool_call_arguments` sends a 4-turn history with `{"to":"bob@example.com"}` in the tool_call args and asserts the provider only sees ``.
`message.refusal` wasn't un-redacted. Safety-tuned models can quote our placeholders back ("I can't email `` per policy"); the placeholder leaked to the client. Walked in `unredact_chat_response_in_place`. (No streaming counterpart — `ChatDelta` has no refusal field.)
Privacy-filter error logs leaked upstream response body. `PrivacyClassifyError::HttpError` carries `response.text()` verbatim — a misbehaving filter that echoed input could route customer PII to application logs via `tracing::warn!(error = %e)`. Replaced with a bounded `error_category` (`unauthorized`/`rate_limited`/`server_error`/etc.). Final user-facing error now hand-formats just the status code. Demoted "Privacy classify completed" from info → debug.

Bonus (Medium #4): 15s outer timeout on `redact_messages` via `tokio::time::timeout`. Worst case before this was N providers × 600s per-attempt → up to ~20 min hold on the user request. Now: timeout → `DetectorUnavailable` → 503. Constant `REDACT_TIMEOUT` in `auto_redact/mod.rs`.

Tester subagent: wrote `crates/api/tests/e2e_all/auto_redact_adversarial.rs` with 8 tests covering empty messages, empty content, system-message PII, multimodal parts, dedup, user-supplied placeholder collision, 512 KB body, 26 MB rejection. None surfaced a bug.

Test counts now:

services unit: 47 pass (was 43)
e2e auto_redact: 12 pass (was 10)
e2e auto_redact_adversarial: 8 pass (new)
clippy -D warnings: clean
fmt: clean

Re: gemini's earlier "flush gap" comment on `11e8203` — that flagged a missing flush on the latest commit, but `build_flush_chunks` Pass 2 was already there. Stale auto-review; replied separately.

Items I left as deferred follow-ups (not blockers for this PR):

Annotations / audio / function_call (legacy) un-redact — additional response fields that could theoretically carry placeholders. Lower probability than refusal in practice.
Single source of truth for `auto_redact_enabled` (currently fine, just fragile).
Dead `accumulated_bytes` cleanup (pre-existing).

Copilot AI review requested due to automatic review settings May 13, 2026 13:26

Evrard-Nil temporarily deployed to Cloud API test env May 13, 2026 13:26 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Evrard-Nil May 13, 2026 13:26 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread crates/api/src/routes/completions.rs

Evrard-Nil temporarily deployed to Cloud API test env May 13, 2026 13:38 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread crates/api/src/routes/completions.rs

Evrard-Nil temporarily deployed to Cloud API test env May 13, 2026 14:06 — with GitHub Actions Inactive

Evrard-Nil merged commit d91edc9 into main May 13, 2026
3 checks passed

Evrard-Nil mentioned this pull request May 18, 2026

feat(auto_redact): natural-looking dummies + secret category + span merge #599

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-redact tool_calls[*].function.arguments#588

fix: auto-redact tool_calls[*].function.arguments#588
Evrard-Nil merged 3 commits into
mainfrom
fix/auto-redact-tool-calls

Evrard-Nil commented May 13, 2026

Uh oh!

claude Bot commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

gemini-code-assist Bot commented May 13, 2026

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Evrard-Nil commented May 13, 2026

Summary

Changes

Known limitations (deferred follow-ups)

Test plan

Uh oh!

claude Bot commented May 13, 2026

Critical Review

⚠️ 1. JSON-encoded arguments will be corrupted when the original PII contains JSON-special chars

⚠️ 2. End-of-stream flush gap for tool_call_arguments is more than cosmetic

Nits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

gemini-code-assist Bot commented May 13, 2026

Uh oh!

Evrard-Nil commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚠️ 1. JSON-encoded `arguments` will be corrupted when the original PII contains JSON-special chars

⚠️ 2. End-of-stream flush gap for `tool_call_arguments` is more than cosmetic