Skip to content

feat: x-auto-redact for chat completions#585

Merged
Evrard-Nil merged 3 commits into
mainfrom
feat/auto-redact-completions
May 13, 2026
Merged

feat: x-auto-redact for chat completions#585
Evrard-Nil merged 3 commits into
mainfrom
feat/auto-redact-completions

Conversation

@Evrard-Nil
Copy link
Copy Markdown
Collaborator

Summary

Opt-in PII redaction on /v1/chat/completions. The user enables it with either:

  • HTTP header: x-auto-redact: on
  • Body field: "auto_redact": true

Both spellings are equivalent. The body field is stripped before forwarding so strict-schema providers (Anthropic) don't 422.

How it works

  1. The handler invokes the privacy-filter model via pool.privacy_classify (the endpoint added in feat: add /v1/privacy/classify endpoint #584) on the user's message text.
  2. PII spans are replaced with stable placeholders: <email1>, <phone1>, <account1>, etc. Same literal PII within one request reuses the same placeholder (dedup). Ordinals are monotonic per category. If the user's own text already contains a <categoryN>-shaped literal, the allocator skips that ordinal to avoid collision.
  3. The provider — vLLM or 3p — only ever sees the redacted form. Provider-agnostic by design.
  4. The response is un-redacted before being returned to the client:
    • Non-streaming: walks choices[*].message.content, reasoning_content, and reasoning. Note: when auto-redact is on we re-serialize the response (drops the raw_bytes signing path — documented as an explicit tradeoff for the privacy guarantee).
    • Streaming: wraps each SSE chunk with a sliding-window unredacter. A 16-byte tail buffer holds any in-progress <…> token across chunks so a placeholder split mid-flight is never emitted partial.

Fail-closed behavior

If the privacy-filter model is unreachable, errors, or isn't registered in the cloud-api models DB, the handler returns 503 with \"type\": \"auto_redact_unavailable\" rather than silently degrading to send raw PII upstream. This honors the privacy guarantee the user opted into.

Out of scope (deferred)

  • /v1/completions — the legacy handler currently returns 501. The new maybe_redact helper is generic over ServiceCompletionRequest so wiring it in is a 5-line copy when that handler is enabled.
  • /v1/responses — depends on a separate conversation-history persistence decision (design memo discussed: store original PII in DB, redact fresh on each provider call).
  • Category opt-out filter — all-or-nothing for v1. Easy to add via header x-auto-redact-categories: … if needed.
  • raw_bytes signing parity with auto-redact on — the response is re-serialized post-unredact, breaking byte-for-byte signature verification. The user explicitly opted into munging the response; we accept this. (Signature verification still works when auto-redact is off.)
  • Token billing on the redacted form is already what happens since the provider sees the redacted prompt and reports its own token usage — slightly cheaper for PII-heavy prompts.

Module layout

```
crates/services/src/auto_redact/
mod.rs — public API: is_enabled, redact_messages, strip_body_field
placeholders.rs — RedactionMap (bidirectional, collision-safe minting)
detect.rs — call privacy-filter via pool.privacy_classify
apply.rs — walk CompletionMessage content (string + parts arrays)
stream_unredact.rs — sliding 16-byte tail for SSE
```

Tests

  • 34 unit tests in services/auto_redact covering placeholder dedup, collision avoidance, span application, stream chunk-splitting, UTF-8 boundaries, hallucinated placeholders, etc.
  • 7 e2e tests in crates/api/tests/e2e_all/auto_redact.rs covering:
    • Header off → raw PII reaches provider
    • Header "on" → provider sees <email1>, client gets original back
    • Body field auto_redact: true is equivalent to header
    • Body field is stripped from `extra` before forwarding
    • Streaming: placeholder split across SSE chunks is reassembled
    • Fail-closed when PII model is unavailable (503 auto_redact_unavailable)
    • Header off is a true no-op
    • Multi-PII per message (email + phone + SSN) round-trips correctly

The MockProvider's privacy_classify_raw does shape-based detection (email, SSN, phone regex) so e2e tests exercise the full pipeline without a live privacy-filter model.

Test plan

  • `cargo check --workspace --all-targets`
  • `cargo clippy --workspace --all-targets -- -D warnings`
  • `cargo fmt --all -- --check`
  • `cargo test --lib --bins -p services -p inference_providers -p api` (258 unit pass)
  • `cargo test --test e2e_all auto_redact -- --test-threads 1` (7 e2e pass)
  • Staging E2E: enable on a chat completion against a 3p model (e.g. anthropic/claude-sonnet-4-5) and verify provider logs don't contain the PII

Builds on #584. The privacy-filter model must be registered in the cloud-api models DB (already done on staging + prod after #584).

Adds an opt-in header (`x-auto-redact: on`) and body field
(`auto_redact: true`) on /v1/chat/completions that:

  1. Detects PII in prompt messages by calling the privacy-filter model
     via the inference provider pool.
  2. Mints stable per-request placeholders (<email1>, <phone2>, …) and
     rewrites the messages so the provider only ever sees the redacted
     form. Provider scope is intentional: works for vLLM and external
     (Anthropic/OpenAI/Gemini) alike.
  3. Strips the auto_redact body field before forwarding so providers
     with strict JSON schemas don't 422.
  4. Un-redacts the response: walks message.content / reasoning fields
     for non-streaming, wraps SSE chunks in a sliding-window unredacter
     (16-byte tail, holds incomplete <…> tokens across chunks) for
     streaming.

Fails closed: if the PII detector is unavailable the request is
rejected with 503 `auto_redact_unavailable` rather than degrading
silently to send raw PII to the provider.

New module: services::auto_redact
  - placeholders.rs: bidirectional placeholder ↔ original map with
    monotonic per-category ordinals, dedup of repeated PII, and
    collision-safe minting when the user's own text contains a
    <categoryN>-shaped literal.
  - detect.rs: pool.privacy_classify invocation + response parse.
  - apply.rs: walks CompletionMessage content (string + content-parts
    arrays), redacts spans, writes back.
  - stream_unredact.rs: sliding 16-byte tail buffer + regex
    `<[a-z_]+\d+>` for streaming replacement that never splits a
    placeholder across an emitted chunk.

Scope decisions:
  - /v1/completions left out: handler currently returns 501. The
    maybe_redact helper is generic over ServiceCompletionRequest so the
    wire-in is a 5-line copy when that handler is enabled.
  - /v1/responses out of scope for v1 per design; stored-conversation
    semantics need their own decision (PR description).
  - No category opt-out filter (all-or-nothing).

Tests: 34 unit + 7 e2e covering header/body activation parity,
streaming chunk splits, fail-closed on missing detector, body-field
stripping, multi-PII (email/phone/SSN) round-trip, and PII passthrough
when off.

Builds on /v1/privacy/classify (#584). The MockProvider's
privacy_classify_raw now does shape-based PII detection (email, SSN,
phone) so e2e tests exercise the full redact→provider→unredact loop
without a live privacy-filter model.
Copilot AI review requested due to automatic review settings May 12, 2026 13:37
@Evrard-Nil Evrard-Nil had a problem deploying to Cloud API test env May 12, 2026 13:37 — with GitHub Actions Failure
@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Review — auto-redact for chat completions

Solid design overall: fail-closed semantics on detector failure, the streaming sliding-window approach is well-tested, the body-field-stripped-before-forward concern is handled. A few issues worth addressing before merge.

⚠️ Critical

1. redact_one fails OPEN on UTF-8 boundary mismatch — leaks PII

crates/services/src/auto_redact/apply.rs:899-903

if let Ok(s) = std::str::from_utf8(&bytes[cursor..span.start]) {
    out.push_str(s);
} else {
    // Non-UTF8 boundary; refuse to redact this fragment safely.
    return text.to_string();   // ← returns ORIGINAL text containing PII
}

The detector said "PII at start..end" but because of byte-vs-char boundary mismatch we return the original (containing the PII span). This contradicts the fail-closed guarantee elsewhere in the design. Suggest propagating an AutoRedactError::Internal so the handler returns 5xx instead of silently sending raw PII upstream. (Same concern for the lower if let Ok(s) = ... { out.push_str(s); } block at line 914 — non-UTF8 trailing bytes are silently dropped instead of erroring.)

2. Tool call arguments are not unredacted in the response (both paths)

crates/api/src/routes/completions.rs:86-101 (unredact_chat_response_in_place) walks only content, reasoning_content, reasoning. It does not touch choice.message.tool_calls[*].function.arguments. Since the model only sees the redacted prompt, any user PII it echoes into a tool call's arguments will be the placeholder form (e.g. {\"to\":\"<email1>\"}), and the client will receive that placeholder instead of the original alice@example.com. This breaks tool-calling round-trips when auto-redact is on. Same issue in the streaming path: unredact_chunk_in_place (lines 106-133) ignores delta.tool_calls[*].function.arguments.

If tool-calling-with-auto-redact is out of scope for v1, please document it explicitly (similar to the raw_bytes signing tradeoff in the PR description) — currently the PR description implies the only deferred item is /v1/responses.

Minor

  • tokio::sync::Mutex is unnecessary at completions.rs:166-171. The locked region (lines 197-200) holds no .await points, so std::sync::Mutex is cheaper and avoids a potential cancellation-during-lock subtlety. Optional.
  • Comment drift in placeholders.rs: MAX_PLACEHOLDER_LEN doc says <account_number999> (19 bytes), but placeholder_prefix(\"account_number\") returns \"account\", so the longest you can mint is <account4294967295> (~18 bytes). The constant is still fine; the comment is just misleading.
  • safe_emit_boundary is O(n²) in the worst case: for each < in the last MAX_PLACEHOLDER_LEN bytes it scans bytes[i+1..] for >. The window is small (32 bytes) so the multiplier is bounded, but the inner contains() scans the rest of the buffer. For very large chunks containing many < chars in the tail, this could be noticeable. Probably not worth optimizing unless it shows up in profiling.

What's good

  • Placeholder collision avoidance via reserve_literal is a nice touch.
  • Stream tests cover the genuinely hard cases (char-by-char split, UTF-8 multibyte before a held <, hallucinated placeholders).
  • Strip-before-forward for auto_redact body field correctly prevents 422s on strict-schema providers.
  • Fail-closed e2e test confirms the privacy guarantee.
  • Logging is ID-only / sanitized — no PII content paths.

⚠️ Issues found — addressing items 1 and 2 above is recommended before merge.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an auto-redaction feature for PII in chat completions, including support for both non-streaming and streaming responses. The implementation uses a sliding-window buffer to handle placeholders split across stream chunks. The code review identified several critical issues: the redaction logic currently fails open on UTF-8 errors, which poses a security risk of PII leakage; the streaming unredaction state is incorrectly shared across multiple choices and reasoning fields, which will lead to data corruption during interleaved streams; and the response parsing for the PII detector should be more robust against duplicate indices.

Comment on lines +125 to +130
if let Ok(s) = std::str::from_utf8(&bytes[cursor..span.start]) {
out.push_str(s);
} else {
// Non-UTF8 boundary; refuse to redact this fragment safely.
return text.to_string();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The current implementation of redact_one silently returns the original unredacted text if it encounters a UTF-8 boundary error or a malformed span. This is a security risk as it leaks PII when redaction was explicitly requested. The function should return a Result and the caller should fail the request (fail-closed) if redaction cannot be performed safely, honoring the privacy guarantee the user opted into.

Comment thread crates/api/src/routes/completions.rs Outdated
Comment on lines +565 to +570
let content_unredact = Arc::new(tokio::sync::Mutex::new(StreamUnredact::new(
redaction_map.clone(),
)));
let reasoning_unredact = Arc::new(tokio::sync::Mutex::new(StreamUnredact::new(
redaction_map.clone(),
)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The StreamUnredact instances are shared across all choices in the stream. If the request has n > 1, the provider may interleave chunks for different choices. Since StreamUnredact maintains a tail buffer for partial placeholders, interleaving chunks from different choices will corrupt the state and lead to incorrect un-redaction or data corruption. Unredacter state must be maintained per choice index to ensure correctness for multi-choice streaming.

Comment on lines +102 to +111
out[item.index] = item
.spans
.into_iter()
.map(|s| Span {
category: s.category,
start: s.start,
end: s.end,
text: s.text,
})
.collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the privacy-filter model returns multiple entries in data for the same index, the current logic overwrites the previous spans for that index. It is safer to extend the vector to ensure all detected spans are captured.

        out[item.index].extend(item
            .spans
            .into_iter()
            .map(|s| Span {
                category: s.category,
                start: s.start,
                end: s.end,
                text: s.text,
            }));

Comment thread crates/api/src/routes/completions.rs Outdated
Comment on lines +390 to +395
if let Some(reasoning) = &mut delta.reasoning_content {
*reasoning = reasoning_un.process(reasoning);
}
if let Some(reasoning) = &mut delta.reasoning {
*reasoning = reasoning_un.process(reasoning);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Both reasoning_content and reasoning fields share the same reasoning_un unredacter. If a model provides both fields in the same stream (or interleaved), the shared state will be corrupted. Each field should have its own dedicated StreamUnredact instance to maintain independent tail buffers.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in automatic PII redaction for /v1/chat/completions, using the privacy-filter model to replace detected spans with stable placeholders before sending prompts to providers, then un-redacting responses back to the client (including streaming SSE handling).

Changes:

  • Introduces a new services::auto_redact module (placeholder minting, span application, detector call, and streaming-safe un-redaction).
  • Wires auto-redact into the API chat completions route via x-auto-redact header or auto_redact body field (and strips the body field before forwarding).
  • Extends the MockProvider to simulate privacy-filter behavior and adds dedicated E2E coverage for header/body/streaming/fail-closed behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
crates/services/src/lib.rs Exposes the new auto_redact module from the services crate.
crates/services/src/auto_redact/mod.rs Public API/types/constants for auto-redact plus top-level design docs.
crates/services/src/auto_redact/apply.rs Extracts text fragments from messages and applies span-based redaction.
crates/services/src/auto_redact/detect.rs Calls privacy_classify via the provider pool and parses span results.
crates/services/src/auto_redact/placeholders.rs Implements bidirectional placeholder↔original mapping and regex matching.
crates/services/src/auto_redact/stream_unredact.rs Streaming-safe un-redaction with a bounded tail buffer across chunks.
crates/inference_providers/src/mock.rs Adds simple regex-based PII span detection for mock privacy-filter responses.
crates/api/src/routes/completions.rs Integrates auto-redact into chat completions request/response (stream + non-stream).
crates/api/tests/e2e_all/main.rs Registers the new auto-redact E2E test module.
crates/api/tests/e2e_all/auto_redact.rs E2E tests for header/body enablement, stripping, streaming, and fail-closed behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +105 to +145
pub fn redact_one(text: &str, spans: &[Span], map: &mut super::RedactionMap) -> String {
if spans.is_empty() {
return text.to_string();
}
let bytes = text.as_bytes();
let mut sorted: Vec<&Span> = spans.iter().collect();
sorted.sort_by_key(|s| s.start);

let mut out = String::with_capacity(text.len());
let mut cursor = 0usize;
for span in sorted {
if span.start < cursor {
// Overlapping with the previous span: drop the offending one.
continue;
}
if span.end > bytes.len() || span.start > span.end {
// Malformed span — skip rather than panic on slice.
continue;
}
// Append the text between the previous span and this one.
if let Ok(s) = std::str::from_utf8(&bytes[cursor..span.start]) {
out.push_str(s);
} else {
// Non-UTF8 boundary; refuse to redact this fragment safely.
return text.to_string();
}
// Prefer the model's own `text` field when it's a clean UTF-8 slice
// of the input; fall back to the raw slice otherwise.
let original = std::str::from_utf8(&bytes[span.start..span.end])
.map(str::to_string)
.unwrap_or_else(|_| span.text.clone());
let placeholder = map.lookup_or_mint(&span.category, &original);
out.push_str(&placeholder);
cursor = span.end;
}
if cursor < bytes.len() {
if let Ok(s) = std::str::from_utf8(&bytes[cursor..]) {
out.push_str(s);
}
}
out
Comment thread crates/api/src/routes/completions.rs Outdated
Comment on lines +592 to +599
// Auto-redact: swap any minted placeholders in this
// chunk's text deltas back to their originals before
// we serialize and send.
if auto_redact_enabled {
let mut c = content_un.lock().await;
let mut r = reasoning_un.lock().await;
unredact_chunk_in_place(&mut event.chunk, &mut c, &mut r);
}
Comment on lines +700 to +728
// When auto-redact is enabled, we substitute placeholders back to
// originals and re-serialize. The provider's raw_bytes are over the
// redacted form; we deliberately drop that signed payload because
// the client opted into munging the response.
let body_bytes = if auto_redact_enabled {
unredact_chat_response_in_place(
&mut response_with_bytes.response,
&redaction_map,
);
match serde_json::to_vec(&response_with_bytes.response) {
Ok(b) => b,
Err(e) => {
tracing::error!(error = %e, "failed to re-serialize unredacted chat response");
return (
StatusCode::INTERNAL_SERVER_ERROR,
ResponseJson(ErrorResponse::new(
"Failed to assemble response".to_string(),
"internal_server_error".to_string(),
)),
)
.into_response();
}
}
} else {
// Return the exact bytes from the provider for hash verification.
// This ensures clients can hash the response and compare with
// attestation endpoints.
response_with_bytes.raw_bytes
};
Comment thread crates/services/src/auto_redact/mod.rs Outdated
Comment on lines +4 to +5
//! See `docs/auto-redact.md` (TBD) for the end-to-end design. Quick
//! summary:
/// Maximum byte length of any placeholder we mint. Bounds the streaming
/// unredact tail buffer so we never split a placeholder across SSE chunks.
///
/// `<account_number999>` is 19 bytes. We round up to 32 to leave headroom
Comment thread crates/api/src/routes/completions.rs Outdated
Comment on lines +592 to +599
// Auto-redact: swap any minted placeholders in this
// chunk's text deltas back to their originals before
// we serialize and send.
if auto_redact_enabled {
let mut c = content_un.lock().await;
let mut r = reasoning_un.lock().await;
unredact_chunk_in_place(&mut event.chunk, &mut c, &mut r);
}
Critical / blocking fixes:

- redact_one is now fail-closed on malformed input. Previously a span
  whose offsets fell inside a multi-byte UTF-8 sequence or were
  out-of-range silently passed the original text through — leaking PII
  upstream when redaction was explicitly requested. Now returns
  AutoRedactError::Internal, propagated to a 500 in the handler.
  (gemini security-critical, Copilot)
- Streaming un-redact state is now keyed by choice index (HashMap<i64,
  StreamUnredact>) rather than a single shared instance, with separate
  maps for content / reasoning_content / reasoning. For n>1 completions
  the provider may interleave chunks across choice indices, which would
  have cross-contaminated the sliding 16-byte tail. (gemini high)

High-value fixes:

- End-of-stream flush: any bytes still held in a tail buffer when the
  upstream stream ends (e.g. mid-placeholder) are now emitted as a
  synthetic SSE chunk before [DONE], not silently dropped. (Copilot)
- The per-chunk `tracing::debug!("Completion stream event: ...")` line
  is suppressed when auto_redact_enabled. After un-redact the chunk
  holds the user's original PII; routing it to logs would defeat the
  privacy guarantee. (Copilot)

Other fixes:

- detect.rs: extend rather than overwrite when the detector returns
  multiple `data` entries for the same index. (gemini medium)
- Skip non-streaming response re-serialize when the redaction map is
  empty (request opted in but no PII detected). Preserves the
  raw_bytes/signing path for clean inputs. (Copilot)
- Doc comment in mod.rs no longer references a non-existent
  `docs/auto-redact.md`. (Copilot nit)
- MAX_PLACEHOLDER_LEN comment example matches what's actually minted
  (placeholder_prefix("account_number") -> "account", not
  "account_number"). (Copilot nit)
- Span.text field removed (it was only carried through from the
  detector response and never read; dead-code warning).
- MockProvider's privacy_classify_raw keeps usage.input_tokens=10 for
  backward compat with the privacy_classify e2e test from #584; only
  the spans field is computed from the input.

Tests:

- New unit tests in apply.rs for fail-closed-on-non-char-boundary and
  fail-closed-on-out-of-range-span.
- New e2e test
  auto_redact_skips_response_munging_when_no_pii_detected covers the
  empty-map short-circuit.
- All 260 unit + 8 auto_redact e2e + 7 privacy_classify e2e pass.
@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env May 13, 2026 07:37 — with GitHub Actions Inactive
@Evrard-Nil
Copy link
Copy Markdown
Collaborator Author

Pushed 9107ea7 addressing the bot review. Walk-through:

Security-critical / blocking

redact_one silently passed through on malformed input (gemini critical, Copilot):
Now returns AutoRedactError::Internal on:

  • Span offsets that don't land on UTF-8 char boundaries (e.g. mid-multibyte-sequence)
  • span.end > text.len() or inverted bounds
  • Final cursor not on a char boundary

Propagated through redact_messages so the handler responds 500 rather than letting raw PII through. Two new unit tests cover the failure paths.

Streaming state shared across choices (gemini high):
Streaming un-redact now keyed by choice.index (HashMap<i64, StreamUnredact>) with separate maps for content, reasoning_content, and reasoning. For n>1 completions or models that emit content + reasoning concurrently, each (choice, field) gets its own sliding tail buffer.

High-value

Stream tail never flushed at end-of-stream (Copilot):
Added build_flush_chunks() that drains each per-(choice, field) state at [DONE] time. Any bytes still in a sliding tail (e.g. upstream cut off mid-<email1>) get emitted as a synthetic final SSE chunk before [DONE] with the same id/model/created as the first real chunk. Empty fields are skipped so the synthetic chunk only fires when there's something to emit.

PII reaches logs at debug level (Copilot):
The tracing::debug!("Completion stream event: ...") line is now suppressed when auto_redact_enabled. After un-redact the chunk content is the user's original PII; logging it would defeat the privacy invariant the user opted into. CLAUDE.md's "no customer content in logs, even at debug" rule applies here.

Worth-addressing

detect.rs overwriting spans for duplicate index (gemini medium): Changed out[item.index] = ... to out[item.index].extend(...).

Re-serialize even when map is empty (Copilot): auto_redact_enabled is now requested && !map.is_empty(). A request that opts in but contains no PII keeps the raw-bytes/signing path and the existing debug log.

Nits

  • Removed dead docs/auto-redact.md reference from mod.rs doc comment.
  • MAX_PLACEHOLDER_LEN doc example now matches what's actually minted (<address9999>, not <account_number999>).
  • Dropped Span.text (was only set from the detector response and never read).

Mock side-effect

MockProvider::privacy_classify_raw keeps usage.input_tokens=10 (the original constant) for backward compat with the test_privacy_classify_costs_deducted test from #584. Only spans is computed from the input.

Test pass

  • 260 unit pass (added 2 fail-closed tests in apply.rs)
  • 8 auto_redact e2e pass (added auto_redact_skips_response_munging_when_no_pii_detected)
  • 7 privacy_classify e2e pass (the failing one from CI is now green)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo fmt --check clean

@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env May 13, 2026 09:16 — with GitHub Actions Inactive
@Evrard-Nil Evrard-Nil merged commit 972de6c into main May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants