Skip to content

WAV redaction + rework Mergeable/Redactions/handler shape, fold transform/ into handler/#150

Merged
martsokha merged 5 commits into
mainfrom
feat/audio-redaction
May 19, 2026
Merged

WAV redaction + rework Mergeable/Redactions/handler shape, fold transform/ into handler/#150
martsokha merged 5 commits into
mainfrom
feat/audio-redaction

Conversation

@martsokha
Copy link
Copy Markdown
Member

Summary

End-to-end audio redaction (WAV) plus a substantial cleanup of the redaction-handler API: relocate Mergeable to Location, drop range fields that duplicated location data from each *Redaction, flatten Redactions<S, R> to Vec<(S, R)>, narrow handler hooks to redact_at(&Location, Redaction), and collapse the now-unnecessary transform/ module into handler/.

Architectural changes

  • Mergeable moved to ontology, implemented on *Location. Overlap detection and merge semantics are a Location concern, not a Redaction concern. Redactions<S, R> now requires S: Overlap + Mergeable, R: Mergeable; the Merge policy needs both to succeed.
  • Flatten Redactions<S, R> to a Vec<(S, R)> ordered by insertion. Bucketing-by-key was masquerading for overlap detection.
  • Drop range fields from *Redaction payloads. AudioRedaction.time_span, ImageRedaction.bounding_box, and TabularRedaction.start/end all duplicated data already on the matching *Location. TextRedaction keeps its start/end because TextLocation is line-level — the range is intra-line.
  • Handler API narrows to redact_at(&Location, Redaction). Each *Handler capability trait gets a provided default redact(Redactions<L, R>) that loops redact_at. AudioHandler::redact overrides the default to sort right-to-left by time_span.start_us so AudioOutput::Remove doesn't shift later sample indices (this was the bug that motivated the redesign).
  • transform/ folded into handler/. Redactions, ConflictPolicy, InsertError, and the per-modality *Redaction/*Output types all live under handler/ now. transform/ is deleted.
  • Buffer-mutation helpers (apply_*_redaction, ImageOps) moved to handler/{text,image,audio,tabular}/ next to their only consumers.

WAV redaction

  • New WavHandler::redact_at decodes via hound, applies the sample-level mutation, and re-encodes. Supported: i8/i16/i32 PCM and f32 IEEE float.
  • Mp3Handler::redact_at returns an explicit "MP3 redaction is not supported" error (no pure-Rust encoder; libmp3lame out of scope). The pipeline fails-fast at the first redaction with a clear message; convert to WAV upstream.
  • The right-to-left sort in AudioHandler::redact's default means callers passing multiple Remove redactions get correct indices without each handler having to remember.

Minor cleanups

  • ContentKind: replace four hand-written is_*() predicates with derive_more::IsVariant. is_text_based renamed to is_text to match the variant (no callers in the workspace).
  • Redactions<S, R>: IntoIterator via derive_more::IntoIterator on the items field.
  • Entities: extend the existing derive to include ref and ref_mut iterators via #[into_iterator(owned, ref, ref_mut)]; delete the hand-written &'a Entities impl.

Test plan

  • cargo check --workspace --all-features — clean
  • cargo test --workspace --all-features — all green (425+ tests across 11 crates)
  • cargo clippy --workspace --all-features --no-deps — clean
  • cargo +nightly fmt --all — clean
  • cargo doc --workspace --all-features --no-deps — no rustdoc warnings (pre-existing nvisy-cli/nvisy-server filename collision unrelated)
  • WAV redaction covered by new unit tests in wav_handler.rs (silence + remove across mono/stereo/i16/i32/f32)
  • MP3 explicit-error path covered by mp3_handler.rs tests
  • Manual smoke through the audio→STT→redact pipeline once an integration env is available

🤖 Generated with Claude Code

martsokha and others added 4 commits May 19, 2026 14:51
…ion) handler API

- Move Mergeable from codec to ontology and implement on the four
  *Location types; overlap detection and merge semantics are a
  Location concern, not a Redaction concern.
- Flatten Redactions<S, R> to Vec<(S, R)>; conflict policy uses
  S::overlaps and requires both S::try_merge and R::try_merge to
  succeed for Merge.
- Drop range fields duplicated by Locations from *Redaction payloads
  (AudioRedaction.time_span, ImageRedaction.bounding_box,
  TabularRedaction.start/end). TextRedaction keeps start/end since
  TextLocation is line-level and the range is intra-line.
- Reintroduce *Transform traits as blanket impls over *Handler that
  iterate Redactions and dispatch each (location, redaction) pair to
  the handler's narrow redact_at hook. AudioTransform pre-sorts
  right-to-left by time_span.start_us so Remove ops don't shift
  indices for later calls.
- Add hound-based WAV redaction; MP3 returns an explicit
  "not supported" error.
- Move buffer-mutation helpers (apply_*_redaction, ImageOps) from
  transform/ to handler/{text,image,audio,tabular}/ next to their
  only consumers. transform/ now owns just the iteration protocol.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nsform/ into handler/

- Fold each *Transform trait into the matching *Handler as a provided
  default method redact(Redactions<L, R>) that loops redact_at in
  insertion order. AudioHandler::redact overrides the default to sort
  right-to-left by time_span.start_us so AudioOutput::Remove doesn't
  shift later sample indices. One trait per modality instead of two;
  no separate blanket-impl extension trait.
- Move every transform/ contents into handler/:
  - *Redaction / *Output structs into handler/{text,image,audio,tabular}/
    next to the redact_at hook they feed.
  - Redactions<S, R>, ConflictPolicy, InsertError into handler/ as
    the cross-modality batching primitives.
  - Mergeable re-exported from handler/ (still defined in ontology).
- Delete transform/ entirely; nothing in the engine or codec imports
  from it anymore. Engine and downstream consumers now go through
  nvisy_codec::handler::{*Redaction, Redactions, ConflictPolicy, ...}.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…itten sites

- ContentKind: replace four hand-written is_*() predicates with
  derive_more::IsVariant. Note: is_text_based renamed to is_text to
  match the variant name (no callers in the workspace).
- Redactions<S, R>: replace the hand-written owned IntoIterator impl
  with derive_more::IntoIterator on the `items` field.
- Entities: extend the existing derive_more::IntoIterator to also
  derive the ref and ref_mut variants via #[into_iterator(owned, ref,
  ref_mut)] on the field, deleting the hand-written `&'a Entities`
  impl.
- Add the `is_variant` derive_more feature to nvisy-core and the
  `into_iterator` feature to nvisy-codec.

TextData was also a candidate (collapse its From<String> and From<&str>
into derive_more::From with #[from(forward)]) but HipStr<'static>'s
From impls are parameterized on the backing storage (Arc) and forward
mode can't see them through the wrapper. Left as hand-written.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rename

The sed-driven path rewrite produced split imports (`use super::apply_*;`
+ `use crate::handler::*Redaction;` instead of a single grouped import,
plus a few `crate::handler::TextData; use crate::handler::Handler;`
pairs that should fold together). Nightly rustfmt regrouped them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@martsokha martsokha self-assigned this May 19, 2026
@martsokha martsokha added feat request for or implementation of a new feature codec nvisy-codec: loaders, transforms, handlers refactor code restructuring without behavior change labels May 19, 2026
… uses location range

Brings text in line with the other modalities: the redaction payload
now carries only `output`, the byte range comes from the containing
TextLocation. One coordinate system, not two.

- TextRedaction::new(output) — drops start/end fields. Mergeable
  collapses to (output == other.output).then_some(self), matching
  Image/Audio/Tabular.
- apply_text_redaction now takes explicit start/end parameters (same
  shape as apply_tabular_redaction).
- Each text handler's redact_at (txt, json, html, pdf) finds the line
  / node / page / span containing location.start_offset..end_offset,
  computes span-relative offsets, and forwards them to the apply
  helper. The "exact start match" requirement is gone — entity-shaped
  locations (substrings) now work, not just whole-line locations.
- Engine RedactionApplicator drops the now-redundant
  TextRedaction::new(loc.start_offset, loc.end_offset, output) →
  TextRedaction::new(output).
- Tests updated. Added redact_substring_within_line covering the
  entity-shaped case explicitly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@martsokha martsokha merged commit e7fde34 into main May 19, 2026
5 checks passed
@martsokha martsokha deleted the feat/audio-redaction branch May 19, 2026 14:38
@martsokha martsokha restored the feat/audio-redaction branch May 19, 2026 14:39
@martsokha martsokha deleted the feat/audio-redaction branch May 19, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codec nvisy-codec: loaders, transforms, handlers feat request for or implementation of a new feature refactor code restructuring without behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant