refactor(codec)!: location/read/redact handler API#148
Merged
Conversation
Reworks the codec *Transform layer to take a generic Redactions
collection keyed by span identity instead of a flat slice.
New types in nvisy-codec/transform/:
- `Redactions<S, R>` (redactions.rs): groups payloads by span, with
overlap detection on insert. Consumed via IntoIterator; no raw
map access.
- `Mergeable` (mergeable.rs): trait for the redaction payload `R`.
`overlaps()` for detection, `try_merge()` for merging with
honest failure semantics (returns `None` when outputs differ).
- `ConflictPolicy` (policy.rs): Reject / Merge / Replace.
Merge falls back to `InsertError::NotMergeable` when `try_merge`
returns `None`, rather than picking a magic default.
*Redaction structs lose their span_id field:
- TextRedaction { start, end, output }
- ImageRedaction { bounding_box, output }
- AudioRedaction { time_span, output }
- TabularRedaction { start, end, output }
- Each gets a `::new()` constructor.
- Mergeable impls reuse ontology primitives' overlaps()/union().
Transform traits now take `Redactions<Location, Payload>` by value:
- TextTransform::redact_text
- ImageTransform::redact_images
- AudioTransform::redact_audio
- TabularTransform::redact_tabular
Transforms iterate `for (loc, mut items) in redactions` instead of
re-grouping a flat slice. Overlap checking is no longer duplicated
per handler — the collection enforces it on insert.
Engine apply.rs builds redactions via `try_insert`; insertion
failures surface as validation errors with the rejected/unmergeable
reason. Tests use `*Redaction::new()` and `TabularLocationBuilder`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweeps the workspace for inline-qualified `nvisy_core::Result<...>` and
`nvisy_core::Error::...` uses and adds proper `use nvisy_core::{...};`
imports following the existing convention used across other engine
files.
Affected:
- nvisy-engine/src/operation/redaction/apply.rs
- nvisy-engine/src/operation/mod.rs
- nvisy-engine/src/utility/encryption/provider.rs
- nvisy-provider/src/http/mod.rs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The four redaction payload structs (TextRedaction, ImageRedaction, AudioRedaction, TabularRedaction) are constructed via ::new() and their fields are only read inside nvisy-codec (by transforms and by Mergeable impls). Tightens the surface to pub(crate) — external crates already use ::new() exclusively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces Span<Id, Data> + SpanStream with Located<L> + LocationStream. Handler capability traits now expose locations() (cheap identity-only streams), read(&L) -> Option<*Data> (typed per-modality fetch), and redact(Redactions<L, R>) -> Result<()> (direct batch application). ContentHandle gains typed read_text/read_image/read_audio in place of the modality-erased value_at(&Location) -> Option<String>. Tabular handlers (CSV, XLSX) move into handler/text since they implement TextHandler. The *Transform blanket-impl traits are removed; helpers will live alongside the per-modality instruction types. Concrete handlers still implement the old API and do not compile after this commit — follow-up commits migrate them and the engine callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every text, image, audio, and rich handler now implements the new capability traits directly: locations() yields cheap Located<L> identities, read(&loc) fetches typed *Data on demand, and redact(Redactions<L, R>) applies a batch in place. Byte-level replacement logic lives in pub(crate) helpers under transform/text/apply.rs and transform/image/apply.rs; handlers walk the Redactions collection and call the helper on the affected slice of their internal model (lines, cells, pages, image buffer). Per-handler tests are rewritten against the new API. The codec crate compiles and 85/85 codec tests pass. The engine still references the old API and does not compile — that's the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine operations now consume the new codec surface: - Document gains collect_text/image/audio_locations + read_text/image/audio + value_at(&Location); old text_spans/image_spans/audio_spans and the *_at typed accessors are gone. - EntityRecognitionOp and PatternRecognitionOp build their (TextLocation, TextData) work lists by walking locations() and calling read_text per item. - VisualExtractionOp builds (ContentSource, ImageData) pairs the same way for OCR batches and verification. - ValidationOp concatenates current text by reading each location. - RedactionApplicator reads each entity's text value via read_text instead of the old enum-typed value_at(&Location). - Document::apply_tabular_redactions is dropped — nothing in the engine drove it, and the (row, col) → byte-offset bridge it relied on (TabularTransform) is gone with the rest of the *Transform blanket impls. cargo check + clippy + tests all clean across the workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine operations that enumerate a document's locations together
with their content (NER detector, pattern scanner, OCR extractor)
previously walked locations() + read_*() and pushed (Location, Data)
tuples into a Vec. Tuples obscure intent and force destructuring at
every call site.
Span<L, D> { source, location, data } lives in codec — same shape as
the type we deleted at the start of the refactor, but intentionally
*not* used on handler trait signatures. Handlers still expose only
cheap identity via locations() plus on-demand read(); engine callers
that want enumerate-with-content build Span::from_located in their
read loops.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel sweeps: 1. Inline rustdoc reference-style links — [`Foo`](crate::path::Foo) — moved to bottom-of-docblock references: [`Foo`] + `[`Foo`]: crate::path::Foo` after a blank /// separator. 83 conversions across 51 files. 2. Inline `use` statements inside function bodies, impl scopes, and non-test inner blocks hoisted to top of file. Cfg-gated inline uses preserved their cfg with a wrapping #[cfg(...)] on the hoisted form. Macro-body uses and test-module-top uses left as-is. Tests, clippy, and rustdoc all clean across the workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ground-up redesign of the codec around locations as the cross-boundary identifier. Handlers no longer yield content tuples; they expose cheap identity (
locations()), on-demand typed reads (read(&L)), and direct batch redaction (redact(Redactions<L, R>)).Earlier commits on this branch landed
Redactions<S, R>as a grouped, overlap-aware collection. This PR builds on that to remove the rest of theSpan-based reading/editing pipeline.Shape
Types
Located<L> { source: ContentSource, location: L }— handlers tag every location they emit with theirContentSource, so provenance travels alongside identity without polluting the location'sPartialEq.LocationStream<'a, L>— replacesSpanStream<'a, Id, Data>. Cheap to enumerate; no payload allocation.Span<L, D> { source, location, data }— same name and shape as before, but not on handler traits. Codec utility for callers that want enumerate-with-content.Handler capability traits (
TextHandler/ImageHandler/AudioHandler)The
*Transformblanket-impl traits andedit_*methods are gone. Byte-level replacement logic lives inpub(crate)helpers undertransform/text/apply.rsandtransform/image/apply.rs; handlers walk theRedactionscollection and call the helper on the affected slice of their internal model.ContentHandlesurfacetext_locations() / image_locations() / audio_locations()— typed streams per modality.read_text(&TextLocation) / read_image(&ImageLocation) / read_audio(&AudioLocation)— typed per-modality fetches. Replaces the unsoundvalue_at(&Location) -> Option<String>that returnedStringeven for image/audio locations.apply_text_redactions / apply_image_redactions / apply_audio_redactions— unchanged from the caller's perspective.Layout changes
handler/tabular/folded intohandler/text/—CsvHandlerandXlsxHandlerimplementTextHandler; there is noTabularHandlercapability trait. Tabular redaction is a transform-layer concern, not a handler modality.document/span.rsdeleted, then reintroduced — but only as a codec utility (Span<L, D>), never on trait signatures.transform/{text,image,audio,tabular}/transform.rs(the blanket-impl files) deleted. Replaced byapply.rshelpers.Engine
Documentgainscollect_text/image/audio_locations+read_text/image/audio+ a narrowervalue_at(&Location) -> Option<String>(text and audio only).EntityRecognitionOp,PatternRecognitionOp,VisualExtractionOp, andValidationOpwalklocations()and buildVec<Span<L, D>>work lists viaSpan::from_located+read_*.RedactionApplicatorswitches fromvalue_at(&Location)toread_text(loc).into_inner()— typed, no enum dispatch.Document::apply_tabular_redactionsdropped; nothing in the engine drove it afterTabularTransform's removal.Style sweeps (workspace-wide)
Two parallel mechanical sweeps in the last commit:
[\Foo`](crate::path)forms across 51 files moved to bottom-of-docblock[`Foo`]: crate::path` reference style.usestatements inside function bodies, impl scopes, and cfg-gated inner blocks hoisted to top-of-file (with cfg-attr wrapping where needed). 14+ production sites; test-module-top and macro-body uses left alone.Commit walkthrough
Test plan
🤖 Generated with Claude Code