Skip to content

Future: reintroduce Azure Document Intelligence OCR backend #203

Description

@martsokha

Status

Deferred — removed in #195 to clear the deck for the externalised inference architecture (#194). Reintroduce as opt-in when there's user demand.

Context

AzureDocaiBackend was a direct cloud OCR backend in nvisy-ocr/src/provider/azure_docai/, gated behind the azure-docai cargo feature (forwarded as microsoft through nvisy-engine/nvisy-server/nvisy-cli). It used Azure's async analyzeDocument POST + operationId polling flow.

The new architecture (#194) centres on externalised inference services (see nvisycom/inference). Cloud OCR backends carry provider-specific auth, polling, and parsing work that doesn't pay off without a user. This issue tracks reintroducing it when needed.

What this issue becomes

Add AzureDocaiBackend back to nvisy-ocr behind an azure-docai cargo feature.

Triggers to revisit

  • A customer requests DocAI for self-hosted runtime
  • We want DocAI as a fallback when the externalised Bento OCR is unavailable
  • An Azure-resident deployment where calling DocAI directly is simpler than running a sidecar

Scope when reintroduced

  • Restore nvisy-ocr/src/backend/azure_docai_backend.rs (new backend/-based layout)
  • Restore the azure-docai feature on nvisy-ocr (with tokio/time for the poll loop)
  • Restore the OcrBackend::AzureDocai { … } variant + OcrExtractor::from_config dispatch
  • Restore feature forwarding nvisy-ocr/azure-docainvisy-engine/microsoftnvisy-server/microsoftnvisy-cli/microsoft
  • Auth: Ocp-Apim-Subscription-Key header from env or config
  • Poll: exponential backoff with a hard timeout cap; expose the cap on BentoNerParams-style config

Reference

The deleted code is preserved in git history; the last commit including it is the parent of the removal commit on #195.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featrequest for or implementation of a new featureocrOCR backends and providers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions