Skip to content

feat(pattern, rig): dictionary & llm-driven identification#26

Merged
martsokha merged 24 commits intomainfrom
feature/identify
Feb 26, 2026
Merged

feat(pattern, rig): dictionary & llm-driven identification#26
martsokha merged 24 commits intomainfrom
feature/identify

Conversation

@martsokha
Copy link
Member

No description provided.

martsokha and others added 2 commits February 24, 2026 14:13
…gories

Ensure all dependency versions specify major.minor, add
tracing-subscriber to workspace dependencies, sort members and
internal crates alphabetically, and fix dependency category groupings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename ServerConfig to Cli as top-level parser, extract ServerConfig
into config/server.rs for network binding. Split server/ into listen.rs
and shutdown.rs, add shutdown timeout with structured tracing, move
init_tracing to Cli, and use anyhow::Result for error propagation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha self-assigned this Feb 24, 2026
@martsokha martsokha added docs improvements, updates or additions to docs feat request for or implementation of a new feature labels Feb 24, 2026
martsokha and others added 21 commits February 24, 2026 20:50
…crates

Reorganize nvisy-identify from modality-based layout (text/, image/) to
detection-method-based layout (pattern/, ner/, llm/, vision/, audio/,
fusion/) so the module structure mirrors identification strategies.

- Create nvisy-ocr crate: OcrBackend trait, config, parsing, PythonBridge
- Create nvisy-asr crate: TranscribeBackend trait, config, parsing, PythonBridge
- Add LlmBackend trait and parse_llm_entities to nvisy-rig
- Update nvisy-augment to import from nvisy-ocr/nvisy-asr
- Add LLM contextual detection layer (llm/detection.rs, llm/prompt.rs)
- Add OCR detection layer (vision/ocr.rs)
- Add audio transcript+NER composite layer (audio/transcript.rs)
- Add ensemble fusion with MaxConfidence/WeightedAverage/NoisyOr strategies
- Remove stale nvisy-object workspace references
- Sort workspace members, deps, Dockerfile crate lists, and changelog

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfidence

- Narrow nvisy-pattern root exports to only externally-used types
  (PatternEngine, PatternEngineBuilder, PatternMatch, DetectionSource,
  ContextRule); move AllowList/DenyList/PatternEngineError/default_engine
  behind `pub mod engine` for opt-in access
- Add `column_confidence` to DictionaryPattern so CSV dictionary columns
  can have different confidence scores (e.g. full name vs short code)
- Track source column index in CsvDictionary via new Dictionary::columns()
- Apply column-specific confidence in PatternEngine::scan_dict
- Update currencies/cryptocurrencies/languages patterns with per-column
  confidence (full names 0.85, codes 0.55/0.45)
- Remove API Status link from root README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move confidence from a top-level JSON field into the match source
objects so each source type owns its own scoring:

- RegexPattern gains a `confidence: f64` field (default 1.0)
- DictionaryPattern.confidence accepts a number (uniform) or array
  (per-column) via DictionaryConfidence enum
- Remove Pattern::confidence() from the trait — confidence is now
  read directly from the match source during engine compilation
- Remove top-level `confidence` from all 27 pattern JSON definitions
- Rename `column_confidence` to `confidence` in dictionary patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Absorb small utility modules (error, retry, metrics, compact) into
backend/ and rename structured/ to agent/, reducing module sprawl
while keeping all public re-exports intact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r wrapper

Introduce layered agent architecture:
- BaseAgent<M> with builder handling rig-core's typestate for tools
- NerAgent<M> replacing StructuredAgent with NER-specific prompts
- OcrProvider/CvProvider traits in their respective agent modules
- ResponseParser as Cow<str> wrapper with extract_text constructor
- Stub modules for ocr, cv, and redactor agents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…edactionMethod

Implement the three remaining stub agents in nvisy-rig:

- OcrAgent: VLM agent with OcrProvider-backed tool, extracts text from
  images and detects entities via OcrPromptBuilder
- CvAgent: VLM agent with CvProvider-backed tool, detects faces/plates/
  signatures via CvPromptBuilder
- RedactorAgent: pure LLM agent that recommends TextRedactionMethod for
  each detected entity via RedactorPromptBuilder

Ontology changes (nvisy-ontology):
- Rename spec/ to specification/
- Split mod.rs into input.rs (*RedactionInput enums + RedactorInput) and
  method.rs (TextRedactionMethod, ImageRedactionMethod,
  AudioRedactionMethod, RedactionMethod)

Rig structural changes (nvisy-rig):
- Rename agent dirs: ner→recognize, ocr→extract, cv→detect
- Flatten agent/mod.rs re-exports (no pub submodules)
- Add PromptBuilder structs for all agents (OcrPromptBuilder,
  CvPromptBuilder, RedactorPromptBuilder)
- Add base64 and thiserror dependencies
- Improve docs and tracing across all agents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BaseAgent.prompt_text() now uses agent.completion() instead of
  building raw requests from the model, so preamble/tools/config
  are preserved
- Remove model: Arc<M> from BaseAgent (agent owns it)
- Remove system: Option<&str> param from prompt methods (preamble is
  on the agent)
- Replace BaseAgentConfig field with context_window: Option<ContextWindow>
  since temperature/max_tokens are baked into the rig Agent at build time
- Split base.rs into base/{agent,builder,context}.rs
- Rename redactor/ → redact/ to match action-verb convention
- OcrProvider returns Vec<OcrTextRegion> with bbox support
- Add fn new() constructors to OcrRigTool and CvRigTool
- Add from_prompt error mapper for rig::PromptError
- Export OcrTextRegion from lib.rs and prelude

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BaseAgent.id (UUIDv7) for observability; expose id() on all
  specialized agents and include agent_id in tracing spans
- Make RetryPolicy generic over any Req: Clone + Res instead of
  hardcoding DetectionRequest/DetectionResponse
- Use : instead of — as doc separator
- Use 0.0..=1.0 range notation in confidence docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efactor

Fix UTF-8 panics in split_to_fit/truncate_to_fit by snapping byte
positions to char boundaries. Rewrite prompt_structured to use
completion()+output_schema so usage is always recorded. Refactor
RigBackend into generic ServiceBackend<S> wrapping any inner Tower
service with usage tracking and tracing. Export BaseAgentConfig and
ContextWindow for external consumers. Add Clone+PartialEq to all
public output types. Restrict from_completion/from_prompt to
pub(crate). Deduplicate ALL_TYPES_HINT. Remove dead parse_json_array.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… remove RedactorAgent

- Add LLM-based compact() on ContextWindow and prompt_compact() on
  BaseAgent for summarizing text that exceeds the token budget
- Delete nvisy-ocr crate; move OcrBackend, OcrConfig, parse_ocr_entities,
  and PythonBridge impl into nvisy-rig/src/paddle module
- Update nvisy-identify and nvisy-augment to import from nvisy_rig::paddle
- Remove RedactorAgent, keeping NerAgent, OcrAgent, and CvAgent
- Clean up workspace Cargo.toml, Dockerfile, and all re-exports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…paddle crate

Move OCR backend code out of nvisy-rig/src/paddle/ into a new
nvisy-paddle crate so nvisy-rig no longer depends on nvisy-python.
Consumers (nvisy-identify, nvisy-augment) now import from nvisy_paddle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… src/error.rs

Add a proper Error enum that implements From<CompletionError>,
From<PromptError>, and Into<nvisy_core::Error>. Delete the old
backend/error.rs helper functions and update all call sites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h plain connection params

Replace all CompletionModel generics with a Provider enum holding
connection parameters (api_key, base_url). Client construction is
deferred to build time via ProviderClient. Agent and backend constructors
now return Result to propagate client errors instead of panicking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Tower service layer with reqwest-middleware + reqwest-retry for
transparent HTTP-level retries. Delete ServiceBackend, RigBackend,
RetryPolicy, and dispatch_model! macro. Replace tower::Service bound in
nvisy-identify with LlmBackend async trait. Rename agent submodules:
detect→cv, extract→ocr, recognize→ner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aw* types, move compact to BaseAgent

Extract max_retries from provider structs into standalone RetryConfig.
Replace HttpClient type alias with ClientWithMiddleware directly. Rename
entity types: RawEntity→NerEntity, RawCvEntity→CvEntity,
RawOcrEntity→OcrEntity. Move compact logic from ContextWindow to
BaseAgent::prompt_compact where it belongs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… RetryConfig into BaseAgentConfig

Fold client construction directly into Agents::build(), eliminating the
ProviderClient intermediary. Move model_name from a separate parameter
into Provider variants so each provider carries its full identity.
Merge max_retries into BaseAgentConfig, removing the standalone
RetryConfig struct.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ivial tests

Move agent/base/* files (BaseAgent, BaseAgentBuilder, BaseAgentConfig,
ContextWindow, Provider) into backend/ so the agent infrastructure lives
alongside usage tracking and detection types. Make the agent module
private (was pub(crate)) and re-export public types through backend/.

Improve module and type documentation across the crate. Remove 9 trivial
tests that only verified arithmetic or getters (23 → 14 tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… delete EntityParser/vision/ontology

- Add reqwest-tracing middleware and 120s timeout to HTTP client
- Move base agent from backend/agent/ to agent/base/ module
- Delete EntityParser from nvisy-rig, inline logic in nvisy-identify
- Delete vision/ and ontology/ modules from nvisy-identify
- Make all internal modules private, re-export from parent mod.rs
- Remove nvisy-paddle dependency from nvisy-identify

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ema for tool args

- Fix nvisy_ontology::spec → nvisy_ontology::specification in engine test
- Replace hand-written json!() tool schemas with schemars::schema_for!()
- Add Debug, Clone, JsonSchema derives to CvToolArgs and OcrToolArgs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…y-server

- Add missing features = [] to reqwest-middleware, reqwest-retry,
  reqwest-tracing in workspace Cargo.toml
- Remove pub use re-exports (routes, ServiceState) from nvisy-server
- Update nvisy-cli to use full module paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed offset resolution, and KnownNerEntity accumulation

Move preamble into BaseAgentConfig so specialized agents set it via
config. Redesign NerEntity with entity_id for coreference, optional
category/entity_type/confidence, context snippet for deterministic
offset resolution, and LLM-produced description. Add KnownNerEntity
for lightweight cross-chunk context, NerContext with merge/set_text
for accumulating surface forms and descriptions across calls, and
ResolvedOffsets with type-safe resolve_offsets tied to the source
NerContext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha changed the title feat(identify): complete identification pipeline feat(pattern, rig): complete pattern & llm-driven identification Feb 26, 2026
@martsokha martsokha changed the title feat(pattern, rig): complete pattern & llm-driven identification feat(pattern, rig): dictionary & llm-driven identification Feb 26, 2026
… adapters

Delete the old detection modules that duplicated logic now provided by
nvisy-rig and nvisy-pattern. Replace them with thin adapter structs in a
new method/ module: NerMethod (wraps NerAgent), CvMethod (wraps CvAgent),
and PatternDetection (migrated as-is). Remove nvisy-python and bytes deps
that were only needed by the deleted code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha merged commit e9cb484 into main Feb 26, 2026
5 checks passed
@martsokha martsokha deleted the feature/identify branch February 26, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs improvements, updates or additions to docs feat request for or implementation of a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant