Skip to content

Adopt elide reconcile API + schema feature; collapse mirrors; docs refresh#289

Merged
martsokha merged 3 commits into
mainfrom
refactor/reconcile
Jun 28, 2026
Merged

Adopt elide reconcile API + schema feature; collapse mirrors; docs refresh#289
martsokha merged 3 commits into
mainfrom
refactor/reconcile

Conversation

@martsokha

Copy link
Copy Markdown
Member

Summary

Pulls upstream elide twice in this branch (0ca39cf3416f3165), absorbs the two breaking changes that came with it, deletes the duplication those changes made obsolete, and rewrites the docs around the new toolkit/runtime split.

Reconcile API adoption

Upstream collapsed the dedup pipeline calibrate → fuse → resolve → filter into calibrate → reconcile → filter. One ReconcileLayer<G, R> covers both passes via two configurations:

  • same_label(Merging<scoring>) — fuse same-label overlaps (scoring: Max or NoisyOr; Mean is gone).
  • cross_label(Structural<tiebreaker>) — break cross-label ties (HighestConfidence or LongestSpan).

Wire mirror in nvisy-core::plan::deduplication:

  • FusionStrategyParamsMergingStrategyParams (MaxConfidenceMax; Mean deleted; NoisyOr kept).
  • ResolutionStrategyParamsTiebreakerParams (variants unchanged).
  • DeduplicationParams.fusionmerging; .resolutiontiebreaker.

Adopt elide's schema feature; delete nvisy-core::schema

Upstream now derives JsonSchema on every domain type behind a schema feature. Eight mirror types in nvisy-core::schema are now redundant duplicates of the elide canonical types — same field names, same wire format. The whole schema module is deleted (~250 LOC) and call sites import the elide types directly:

Mirror (deleted) Replacement
ColorSchema elide_core::primitive::Color
BoundingBoxSchema elide_core::primitive::BoundingBox
PointSchema elide_core::primitive::Point
PolygonSchema elide_core::primitive::Polygon
LabelSchema elide_core::entity::Label
LanguageTagSchema elide_core::primitive::LanguageTag
TimeSpanSchema elide_core::primitive::TimeSpan
WaveformSchema elide_core::modality::audio::Waveform
OperatorIdSchema deleted unconditionally (no callers)

Engine state derives JsonSchema

Now that elide's schema feature is on, Entity<M> is schema-able. The engine's persistence types follow: EntityRecord<M> (generic bound on M::Location + M::Data), Run, RunState, RunDocument, RunDocState, DocBody, StartBatch, DocumentInput. Run's Timestamp fields get #[schemars(with = "String")].

Server DTO collapse

Two response-side DTOs whose engine equivalents now derive JsonSchema natively are gone: ResourceRefDto and ModalityDto. The request-side ResourceRef mirror module is also deleted. Handlers import nvisy_engine::runs::{ResourceRef, ModalityKind} directly.

The remaining response-side DTOs stay because they do real shape-flattening (provenance drop, HipStrString, BoundingBox→flat {x,y,width,height}), not just JsonSchema-bridging.

Docs refresh

The four toolkit-side white papers (README, DETECTION, INGESTION, REDACTION) now live in elide/docs and are deleted here. DEVELOPER.md is also deleted. The four remaining docs are rewritten runtime-first:

  • README.md — runtime overview in the same white-paper style as elide's README; positions the runtime as the engine wrapping the toolkit (tenancy, durable governance, two-phase review).
  • PIPELINE.md — runtime-specific lifecycle: the run as the unit of work, two-phase analyze→review→apply, files as the input/output interface, artifact + override identity, policy/context resolution, failure + cancellation.
  • COMPLIANCE.md — composite audit (toolkit provenance + runtime attribution), the four decision provenances the two-phase pipeline surfaces, retention boundary, multi-tenant isolation, what the runtime does and does not underwrite.
  • INFRASTRUCTURE.md — deployment shape (one process, one data directory, no service deps), the six persisted keyspaces, operational primitives, scaling by sharding actors across instances, what the deployment must provide.

Commits

  1. a9156783 — refactor(plan,engine): adopt elide reconcile API; rename fusion/resolution → merging/tiebreaker
  2. f6c97554 — refactor(core,engine,server): adopt elide schema feature; delete schema mirrors; collapse DTOs
  3. 579ea571 — docs: refresh for the elide-runtime split; runtime-side concerns only

Test plan

  • cargo build --workspace
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace
  • RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps
  • cargo deny check all

🤖 Generated with Claude Code

martsokha and others added 3 commits June 28, 2026 14:20
…ution → merging/tiebreaker

Upstream elide (`0ca39cf3` → `56f883fd`) collapsed the dedup
pipeline from `calibrate → fuse → resolve → filter` into
`calibrate → reconcile → filter`. One `ReconcileLayer<G, R>` covers
both passes via two configurations:

- `same_label(Merging<scoring>)` — fuse same-label overlaps
  (scoring: `Max` or `NoisyOr`; `Mean` is gone).
- `cross_label(Structural<tiebreaker>)` — break cross-label ties
  (`HighestConfidence` or `LongestSpan`). `Structural::standard()`
  ships sensible IoU + nesting-margin defaults; the engine chains
  `.with_tiebreaker(...)` to override the strategy.

Wire-shape mirror (nvisy-core::plan::deduplication):

- `FusionStrategyParams` → `MergingStrategyParams`; variants
  `MaxConfidence` → `Max`; `Mean` deleted; `NoisyOr` kept.
- `ResolutionStrategyParams` → `TiebreakerParams`; variants
  unchanged.
- `DeduplicationParams.fusion` → `merging`; `.resolution` →
  `tiebreaker`.

Engine: `attach_dedup` rewritten to drive the new layer API.

`Nvisy.example.toml` updated to the renamed fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ma mirrors; collapse DTOs

Upstream elide (`0ca39cf3` → `416f3165`) ships a `schema` feature that
derives `JsonSchema` across the toolkit's domain types. Pull it through
to runtime crates and use elide's canonical types directly where the
nvisy-core mirrors were structurally identical.

`nvisy-core::schema` deleted in full (8 files, ~250 LOC). Replacements
at call sites:

- `ColorSchema` → `elide_core::primitive::Color`
- `BoundingBoxSchema` → `elide_core::primitive::BoundingBox`
- `PointSchema` → `elide_core::primitive::Point`
- `PolygonSchema` → `elide_core::primitive::Polygon`
- `LabelSchema` → `elide_core::entity::Label`
- `LanguageTagSchema` → `elide_core::primitive::LanguageTag`
- `TimeSpanSchema` → `elide_core::primitive::TimeSpan`
- `WaveformSchema` → `elide_core::modality::audio::Waveform`
- `OperatorIdSchema` deleted unconditionally (no callers).

The shapes match the wire format the mirrors had previously been
defined to emit; conversions at the engine seam (`compile_scope`,
`build_catalog`, audio/image anonymizer build) drop the now-useless
`.into()` calls. `compile_scope` no longer parses language tags from
String — `ScopeParams.languages` is `Vec<LanguageTag>` natively.

Engine persistence types gain `JsonSchema`:
`EntityRecord<M>` (generic bound), `Run`, `RunState`, `RunDocument`,
`RunDocState`, `DocBody`, `StartBatch`, `DocumentInput`. `Run`'s
`Timestamp` fields use `#[schemars(with = "String")]`.
`ResourceRef` + `ModalityKind` had already gained `JsonSchema` in the
preceding work.

Server collapses two DTOs whose elide-typed engine equivalents now
carry JsonSchema natively: `ResourceRefDto` and `ModalityDto` are
gone, replaced by direct `nvisy_engine::runs::ResourceRef` and
`ModalityKind` uses; the request-side `ResourceRef` mirror module is
deleted. The remaining response-side DTOs stay because they do real
shape-flattening (provenance drop, HipStr→String, BoundingBox→flat
xywh), not just JsonSchema-bridging.

Reconcile API adoption from earlier upstream bump is already in commit
a915678 on this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The toolkit's white-paper docs (Ingestion / Detection / Redaction) now
live in the elide repo; this repo's README + the corresponding docs
were deleted in the previous commit. This commit rewrites what
remains.

- README.md (new) — runtime overview in the same white-paper style as
  elide's README. Positions the runtime as the engine wrapping the
  toolkit: tenancy, durable governance, two-phase review. Reader's
  guide points at the three remaining docs; glossary covers
  runtime-specific terms only (actor, engine, run, run document,
  detection artifact, override).
- PIPELINE.md (rewrite) — runtime-specific lifecycle: the run as
  the unit of work, the two-phase analyze→review→apply split, files
  as the input/output interface, the artifact + override identity
  model, policy/context resolution, per-document failure and
  cancellation semantics.
- COMPLIANCE.md (rewrite) — composite audit (toolkit per-entity
  provenance + runtime per-redaction attribution), the four
  decision provenances the two-phase pipeline surfaces, the
  retention boundary, multi-tenant isolation, an explicit list of
  what the runtime does and does not underwrite.
- INFRASTRUCTURE.md (rewrite) — deployment shape (one process, one
  data directory, no service deps), the six persisted keyspaces and
  their actor-prefixed key shape, operational primitives, scaling
  by sharding actors across instances, and the responsibilities the
  runtime delegates to the surrounding deployment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@martsokha martsokha added refactor code restructuring without behavior change docs improvements, updates or additions to docs engine redaction engine, pipeline runtime, orchestration, configuration core content model, errors, shared types server server, API handlers, middleware recognition pattern, NER, LLM, and OCR backends (elide::recognition::*) redaction anonymizer, deanonymizer, redaction operators, replacements labels Jun 28, 2026
@martsokha martsokha self-assigned this Jun 28, 2026
@martsokha martsokha merged commit 7ec7ac0 into main Jun 28, 2026
6 checks passed
@martsokha martsokha deleted the refactor/reconcile branch June 28, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core content model, errors, shared types docs improvements, updates or additions to docs engine redaction engine, pipeline runtime, orchestration, configuration recognition pattern, NER, LLM, and OCR backends (elide::recognition::*) redaction anonymizer, deanonymizer, redaction operators, replacements refactor code restructuring without behavior change server server, API handlers, middleware

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant