Context
RedactionMapping was just slimmed down to carry only entity_id and location. Previously it carried original: String and replacement: Option<String>, but those fields were structurally incapable of holding image/audio originals — only text values fit. For non-text modalities the fields stored degenerate placeholder strings (e.g. [REDACTED IMAGE]), so they weren't useful as either an audit trail or a reversibility ledger.
The simplification:
- The audit trail intent is satisfied:
AuditEntry.value: RedactionValue { original, replacement } already carries text values, and audit.entries records which entities were touched and by what policy.
- The reversibility intent is deferred: we no longer pretend to support it for image/audio. The map is now a thin entity-to-location index.
Goal
Make image and audio redactions reversible by pairing the redaction map with a blob store keyed by content hash. Audit metadata stays compact in RedactionMap; original bytes live in a separate, access-controlled store and are addressed by reference.
Suggested shape
struct RedactionMapping {
entity_id: Uuid,
location: Location,
original_ref: ContentRef,
replacement_ref: Option<ContentRef>,
}
enum ContentRef {
Inline(String), // small text values
Blob { content_hash: Sha256, modality: Modality, size_bytes: u64 },
Empty, // e.g. Remove output
}
The engine computes content hashes (and inline text values) at apply time. Whether/where to store the bytes is the caller's choice — the engine emits the references but doesn't own the blob storage.
Open questions
- Storage interface. Introduce a
BlobSink trait the engine takes as a dependency, or push the storage decision entirely to the consumer (engine just emits hashes; consumer correlates with their own store)?
- When to extract originals. Today the codec applies redactions in place; extracting image regions or audio segments before mutation is an extra read step. Decide whether to make extraction unconditional or gated on a per-entity `reversible: bool` flag.
- Replacement materialization for in-place ops.
Blur / Pixelate / Block / Silence are in-place transforms; they don't produce a discrete "replacement" blob. Either (a) leave replacement_ref: None for these and only populate it for Replace { data } outputs, or (b) re-extract the redacted region after the fact (doubles IO).
- Strategy reversibility.
Strategy::is_reversible_for already returns false for image/audio. Even with blob storage, blur/pixelate are mathematically irreversible regardless of what's kept. Storing originals enables reversibility only for Replace { data } outputs and audit-evidence purposes for the rest.
Out of scope
- The actual blob storage backend (S3/disk/...).
- Cross-modality
RedactedValue enum embedded directly in RedactionMapping — rejected in design discussion because of audit-log size implications.
Related
Follow-up to the RedactionMapping simplification in the same branch (feat/policy-precedence).
Context
RedactionMappingwas just slimmed down to carry onlyentity_idandlocation. Previously it carriedoriginal: Stringandreplacement: Option<String>, but those fields were structurally incapable of holding image/audio originals — only text values fit. For non-text modalities the fields stored degenerate placeholder strings (e.g.[REDACTED IMAGE]), so they weren't useful as either an audit trail or a reversibility ledger.The simplification:
AuditEntry.value: RedactionValue { original, replacement }already carries text values, andaudit.entriesrecords which entities were touched and by what policy.Goal
Make image and audio redactions reversible by pairing the redaction map with a blob store keyed by content hash. Audit metadata stays compact in
RedactionMap; original bytes live in a separate, access-controlled store and are addressed by reference.Suggested shape
The engine computes content hashes (and inline text values) at apply time. Whether/where to store the bytes is the caller's choice — the engine emits the references but doesn't own the blob storage.
Open questions
BlobSinktrait the engine takes as a dependency, or push the storage decision entirely to the consumer (engine just emits hashes; consumer correlates with their own store)?Blur/Pixelate/Block/Silenceare in-place transforms; they don't produce a discrete "replacement" blob. Either (a) leavereplacement_ref: Nonefor these and only populate it forReplace { data }outputs, or (b) re-extract the redacted region after the fact (doubles IO).Strategy::is_reversible_foralready returnsfalsefor image/audio. Even with blob storage, blur/pixelate are mathematically irreversible regardless of what's kept. Storing originals enables reversibility only forReplace { data }outputs and audit-evidence purposes for the rest.Out of scope
RedactedValueenum embedded directly inRedactionMapping— rejected in design discussion because of audit-log size implications.Related
Follow-up to the
RedactionMappingsimplification in the same branch (feat/policy-precedence).