feat: x-auto-redact for chat completions#585
Conversation
Adds an opt-in header (`x-auto-redact: on`) and body field
(`auto_redact: true`) on /v1/chat/completions that:
1. Detects PII in prompt messages by calling the privacy-filter model
via the inference provider pool.
2. Mints stable per-request placeholders (<email1>, <phone2>, …) and
rewrites the messages so the provider only ever sees the redacted
form. Provider scope is intentional: works for vLLM and external
(Anthropic/OpenAI/Gemini) alike.
3. Strips the auto_redact body field before forwarding so providers
with strict JSON schemas don't 422.
4. Un-redacts the response: walks message.content / reasoning fields
for non-streaming, wraps SSE chunks in a sliding-window unredacter
(16-byte tail, holds incomplete <…> tokens across chunks) for
streaming.
Fails closed: if the PII detector is unavailable the request is
rejected with 503 `auto_redact_unavailable` rather than degrading
silently to send raw PII to the provider.
New module: services::auto_redact
- placeholders.rs: bidirectional placeholder ↔ original map with
monotonic per-category ordinals, dedup of repeated PII, and
collision-safe minting when the user's own text contains a
<categoryN>-shaped literal.
- detect.rs: pool.privacy_classify invocation + response parse.
- apply.rs: walks CompletionMessage content (string + content-parts
arrays), redacts spans, writes back.
- stream_unredact.rs: sliding 16-byte tail buffer + regex
`<[a-z_]+\d+>` for streaming replacement that never splits a
placeholder across an emitted chunk.
Scope decisions:
- /v1/completions left out: handler currently returns 501. The
maybe_redact helper is generic over ServiceCompletionRequest so the
wire-in is a 5-line copy when that handler is enabled.
- /v1/responses out of scope for v1 per design; stored-conversation
semantics need their own decision (PR description).
- No category opt-out filter (all-or-nothing).
Tests: 34 unit + 7 e2e covering header/body activation parity,
streaming chunk splits, fail-closed on missing detector, body-field
stripping, multi-PII (email/phone/SSN) round-trip, and PII passthrough
when off.
Builds on /v1/privacy/classify (#584). The MockProvider's
privacy_classify_raw now does shape-based PII detection (email, SSN,
phone) so e2e tests exercise the full redact→provider→unredact loop
without a live privacy-filter model.
Review — auto-redact for chat completionsSolid design overall: fail-closed semantics on detector failure, the streaming sliding-window approach is well-tested, the body-field-stripped-before-forward concern is handled. A few issues worth addressing before merge.
|
There was a problem hiding this comment.
Code Review
This pull request implements an auto-redaction feature for PII in chat completions, including support for both non-streaming and streaming responses. The implementation uses a sliding-window buffer to handle placeholders split across stream chunks. The code review identified several critical issues: the redaction logic currently fails open on UTF-8 errors, which poses a security risk of PII leakage; the streaming unredaction state is incorrectly shared across multiple choices and reasoning fields, which will lead to data corruption during interleaved streams; and the response parsing for the PII detector should be more robust against duplicate indices.
| if let Ok(s) = std::str::from_utf8(&bytes[cursor..span.start]) { | ||
| out.push_str(s); | ||
| } else { | ||
| // Non-UTF8 boundary; refuse to redact this fragment safely. | ||
| return text.to_string(); | ||
| } |
There was a problem hiding this comment.
The current implementation of redact_one silently returns the original unredacted text if it encounters a UTF-8 boundary error or a malformed span. This is a security risk as it leaks PII when redaction was explicitly requested. The function should return a Result and the caller should fail the request (fail-closed) if redaction cannot be performed safely, honoring the privacy guarantee the user opted into.
| let content_unredact = Arc::new(tokio::sync::Mutex::new(StreamUnredact::new( | ||
| redaction_map.clone(), | ||
| ))); | ||
| let reasoning_unredact = Arc::new(tokio::sync::Mutex::new(StreamUnredact::new( | ||
| redaction_map.clone(), | ||
| ))); |
There was a problem hiding this comment.
The StreamUnredact instances are shared across all choices in the stream. If the request has n > 1, the provider may interleave chunks for different choices. Since StreamUnredact maintains a tail buffer for partial placeholders, interleaving chunks from different choices will corrupt the state and lead to incorrect un-redaction or data corruption. Unredacter state must be maintained per choice index to ensure correctness for multi-choice streaming.
| out[item.index] = item | ||
| .spans | ||
| .into_iter() | ||
| .map(|s| Span { | ||
| category: s.category, | ||
| start: s.start, | ||
| end: s.end, | ||
| text: s.text, | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
If the privacy-filter model returns multiple entries in data for the same index, the current logic overwrites the previous spans for that index. It is safer to extend the vector to ensure all detected spans are captured.
out[item.index].extend(item
.spans
.into_iter()
.map(|s| Span {
category: s.category,
start: s.start,
end: s.end,
text: s.text,
}));| if let Some(reasoning) = &mut delta.reasoning_content { | ||
| *reasoning = reasoning_un.process(reasoning); | ||
| } | ||
| if let Some(reasoning) = &mut delta.reasoning { | ||
| *reasoning = reasoning_un.process(reasoning); | ||
| } |
There was a problem hiding this comment.
There was a problem hiding this comment.
Pull request overview
Adds opt-in automatic PII redaction for /v1/chat/completions, using the privacy-filter model to replace detected spans with stable placeholders before sending prompts to providers, then un-redacting responses back to the client (including streaming SSE handling).
Changes:
- Introduces a new
services::auto_redactmodule (placeholder minting, span application, detector call, and streaming-safe un-redaction). - Wires auto-redact into the API chat completions route via
x-auto-redactheader orauto_redactbody field (and strips the body field before forwarding). - Extends the MockProvider to simulate privacy-filter behavior and adds dedicated E2E coverage for header/body/streaming/fail-closed behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/services/src/lib.rs | Exposes the new auto_redact module from the services crate. |
| crates/services/src/auto_redact/mod.rs | Public API/types/constants for auto-redact plus top-level design docs. |
| crates/services/src/auto_redact/apply.rs | Extracts text fragments from messages and applies span-based redaction. |
| crates/services/src/auto_redact/detect.rs | Calls privacy_classify via the provider pool and parses span results. |
| crates/services/src/auto_redact/placeholders.rs | Implements bidirectional placeholder↔original mapping and regex matching. |
| crates/services/src/auto_redact/stream_unredact.rs | Streaming-safe un-redaction with a bounded tail buffer across chunks. |
| crates/inference_providers/src/mock.rs | Adds simple regex-based PII span detection for mock privacy-filter responses. |
| crates/api/src/routes/completions.rs | Integrates auto-redact into chat completions request/response (stream + non-stream). |
| crates/api/tests/e2e_all/main.rs | Registers the new auto-redact E2E test module. |
| crates/api/tests/e2e_all/auto_redact.rs | E2E tests for header/body enablement, stripping, streaming, and fail-closed behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn redact_one(text: &str, spans: &[Span], map: &mut super::RedactionMap) -> String { | ||
| if spans.is_empty() { | ||
| return text.to_string(); | ||
| } | ||
| let bytes = text.as_bytes(); | ||
| let mut sorted: Vec<&Span> = spans.iter().collect(); | ||
| sorted.sort_by_key(|s| s.start); | ||
|
|
||
| let mut out = String::with_capacity(text.len()); | ||
| let mut cursor = 0usize; | ||
| for span in sorted { | ||
| if span.start < cursor { | ||
| // Overlapping with the previous span: drop the offending one. | ||
| continue; | ||
| } | ||
| if span.end > bytes.len() || span.start > span.end { | ||
| // Malformed span — skip rather than panic on slice. | ||
| continue; | ||
| } | ||
| // Append the text between the previous span and this one. | ||
| if let Ok(s) = std::str::from_utf8(&bytes[cursor..span.start]) { | ||
| out.push_str(s); | ||
| } else { | ||
| // Non-UTF8 boundary; refuse to redact this fragment safely. | ||
| return text.to_string(); | ||
| } | ||
| // Prefer the model's own `text` field when it's a clean UTF-8 slice | ||
| // of the input; fall back to the raw slice otherwise. | ||
| let original = std::str::from_utf8(&bytes[span.start..span.end]) | ||
| .map(str::to_string) | ||
| .unwrap_or_else(|_| span.text.clone()); | ||
| let placeholder = map.lookup_or_mint(&span.category, &original); | ||
| out.push_str(&placeholder); | ||
| cursor = span.end; | ||
| } | ||
| if cursor < bytes.len() { | ||
| if let Ok(s) = std::str::from_utf8(&bytes[cursor..]) { | ||
| out.push_str(s); | ||
| } | ||
| } | ||
| out |
| // Auto-redact: swap any minted placeholders in this | ||
| // chunk's text deltas back to their originals before | ||
| // we serialize and send. | ||
| if auto_redact_enabled { | ||
| let mut c = content_un.lock().await; | ||
| let mut r = reasoning_un.lock().await; | ||
| unredact_chunk_in_place(&mut event.chunk, &mut c, &mut r); | ||
| } |
| // When auto-redact is enabled, we substitute placeholders back to | ||
| // originals and re-serialize. The provider's raw_bytes are over the | ||
| // redacted form; we deliberately drop that signed payload because | ||
| // the client opted into munging the response. | ||
| let body_bytes = if auto_redact_enabled { | ||
| unredact_chat_response_in_place( | ||
| &mut response_with_bytes.response, | ||
| &redaction_map, | ||
| ); | ||
| match serde_json::to_vec(&response_with_bytes.response) { | ||
| Ok(b) => b, | ||
| Err(e) => { | ||
| tracing::error!(error = %e, "failed to re-serialize unredacted chat response"); | ||
| return ( | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| ResponseJson(ErrorResponse::new( | ||
| "Failed to assemble response".to_string(), | ||
| "internal_server_error".to_string(), | ||
| )), | ||
| ) | ||
| .into_response(); | ||
| } | ||
| } | ||
| } else { | ||
| // Return the exact bytes from the provider for hash verification. | ||
| // This ensures clients can hash the response and compare with | ||
| // attestation endpoints. | ||
| response_with_bytes.raw_bytes | ||
| }; |
| //! See `docs/auto-redact.md` (TBD) for the end-to-end design. Quick | ||
| //! summary: |
| /// Maximum byte length of any placeholder we mint. Bounds the streaming | ||
| /// unredact tail buffer so we never split a placeholder across SSE chunks. | ||
| /// | ||
| /// `<account_number999>` is 19 bytes. We round up to 32 to leave headroom |
| // Auto-redact: swap any minted placeholders in this | ||
| // chunk's text deltas back to their originals before | ||
| // we serialize and send. | ||
| if auto_redact_enabled { | ||
| let mut c = content_un.lock().await; | ||
| let mut r = reasoning_un.lock().await; | ||
| unredact_chunk_in_place(&mut event.chunk, &mut c, &mut r); | ||
| } |
Critical / blocking fixes:
- redact_one is now fail-closed on malformed input. Previously a span
whose offsets fell inside a multi-byte UTF-8 sequence or were
out-of-range silently passed the original text through — leaking PII
upstream when redaction was explicitly requested. Now returns
AutoRedactError::Internal, propagated to a 500 in the handler.
(gemini security-critical, Copilot)
- Streaming un-redact state is now keyed by choice index (HashMap<i64,
StreamUnredact>) rather than a single shared instance, with separate
maps for content / reasoning_content / reasoning. For n>1 completions
the provider may interleave chunks across choice indices, which would
have cross-contaminated the sliding 16-byte tail. (gemini high)
High-value fixes:
- End-of-stream flush: any bytes still held in a tail buffer when the
upstream stream ends (e.g. mid-placeholder) are now emitted as a
synthetic SSE chunk before [DONE], not silently dropped. (Copilot)
- The per-chunk `tracing::debug!("Completion stream event: ...")` line
is suppressed when auto_redact_enabled. After un-redact the chunk
holds the user's original PII; routing it to logs would defeat the
privacy guarantee. (Copilot)
Other fixes:
- detect.rs: extend rather than overwrite when the detector returns
multiple `data` entries for the same index. (gemini medium)
- Skip non-streaming response re-serialize when the redaction map is
empty (request opted in but no PII detected). Preserves the
raw_bytes/signing path for clean inputs. (Copilot)
- Doc comment in mod.rs no longer references a non-existent
`docs/auto-redact.md`. (Copilot nit)
- MAX_PLACEHOLDER_LEN comment example matches what's actually minted
(placeholder_prefix("account_number") -> "account", not
"account_number"). (Copilot nit)
- Span.text field removed (it was only carried through from the
detector response and never read; dead-code warning).
- MockProvider's privacy_classify_raw keeps usage.input_tokens=10 for
backward compat with the privacy_classify e2e test from #584; only
the spans field is computed from the input.
Tests:
- New unit tests in apply.rs for fail-closed-on-non-char-boundary and
fail-closed-on-out-of-range-span.
- New e2e test
auto_redact_skips_response_munging_when_no_pii_detected covers the
empty-map short-circuit.
- All 260 unit + 8 auto_redact e2e + 7 privacy_classify e2e pass.
|
Pushed Security-critical / blocking
Propagated through Streaming state shared across choices (gemini high): High-valueStream tail never flushed at end-of-stream (Copilot): PII reaches logs at debug level (Copilot): Worth-addressing
Re-serialize even when map is empty (Copilot): Nits
Mock side-effect
Test pass
|
Summary
Opt-in PII redaction on
/v1/chat/completions. The user enables it with either:x-auto-redact: on"auto_redact": trueBoth spellings are equivalent. The body field is stripped before forwarding so strict-schema providers (Anthropic) don't 422.
How it works
pool.privacy_classify(the endpoint added in feat: add /v1/privacy/classify endpoint #584) on the user's message text.<email1>,<phone1>,<account1>, etc. Same literal PII within one request reuses the same placeholder (dedup). Ordinals are monotonic per category. If the user's own text already contains a<categoryN>-shaped literal, the allocator skips that ordinal to avoid collision.choices[*].message.content,reasoning_content, andreasoning. Note: when auto-redact is on we re-serialize the response (drops the raw_bytes signing path — documented as an explicit tradeoff for the privacy guarantee).<…>token across chunks so a placeholder split mid-flight is never emitted partial.Fail-closed behavior
If the privacy-filter model is unreachable, errors, or isn't registered in the cloud-api models DB, the handler returns 503 with
\"type\": \"auto_redact_unavailable\"rather than silently degrading to send raw PII upstream. This honors the privacy guarantee the user opted into.Out of scope (deferred)
maybe_redacthelper is generic overServiceCompletionRequestso wiring it in is a 5-line copy when that handler is enabled.x-auto-redact-categories: …if needed.raw_bytessigning parity with auto-redact on — the response is re-serialized post-unredact, breaking byte-for-byte signature verification. The user explicitly opted into munging the response; we accept this. (Signature verification still works when auto-redact is off.)Module layout
```
crates/services/src/auto_redact/
mod.rs — public API: is_enabled, redact_messages, strip_body_field
placeholders.rs — RedactionMap (bidirectional, collision-safe minting)
detect.rs — call privacy-filter via pool.privacy_classify
apply.rs — walk CompletionMessage content (string + parts arrays)
stream_unredact.rs — sliding 16-byte tail for SSE
```
Tests
<email1>, client gets original backauto_redact: trueis equivalent to headerauto_redact_unavailable)offis a true no-opThe MockProvider's
privacy_classify_rawdoes shape-based detection (email, SSN, phone regex) so e2e tests exercise the full pipeline without a live privacy-filter model.Test plan
Builds on #584. The privacy-filter model must be registered in the cloud-api models DB (already done on staging + prod after #584).