Status
Deferred — removed in #195 to clear the deck for the externalised inference architecture (#194). Reintroduce as opt-in when there's user demand.
Context
AzureDocaiBackend was a direct cloud OCR backend in nvisy-ocr/src/provider/azure_docai/, gated behind the azure-docai cargo feature (forwarded as microsoft through nvisy-engine/nvisy-server/nvisy-cli). It used Azure's async analyzeDocument POST + operationId polling flow.
The new architecture (#194) centres on externalised inference services (see nvisycom/inference). Cloud OCR backends carry provider-specific auth, polling, and parsing work that doesn't pay off without a user. This issue tracks reintroducing it when needed.
What this issue becomes
Add AzureDocaiBackend back to nvisy-ocr behind an azure-docai cargo feature.
Triggers to revisit
- A customer requests DocAI for self-hosted runtime
- We want DocAI as a fallback when the externalised Bento OCR is unavailable
- An Azure-resident deployment where calling DocAI directly is simpler than running a sidecar
Scope when reintroduced
- Restore
nvisy-ocr/src/backend/azure_docai_backend.rs (new backend/-based layout)
- Restore the
azure-docai feature on nvisy-ocr (with tokio/time for the poll loop)
- Restore the
OcrBackend::AzureDocai { … } variant + OcrExtractor::from_config dispatch
- Restore feature forwarding
nvisy-ocr/azure-docai → nvisy-engine/microsoft → nvisy-server/microsoft → nvisy-cli/microsoft
- Auth:
Ocp-Apim-Subscription-Key header from env or config
- Poll: exponential backoff with a hard timeout cap; expose the cap on
BentoNerParams-style config
Reference
The deleted code is preserved in git history; the last commit including it is the parent of the removal commit on #195.
Status
Deferred — removed in #195 to clear the deck for the externalised inference architecture (#194). Reintroduce as opt-in when there's user demand.
Context
AzureDocaiBackendwas a direct cloud OCR backend innvisy-ocr/src/provider/azure_docai/, gated behind theazure-docaicargo feature (forwarded asmicrosoftthroughnvisy-engine/nvisy-server/nvisy-cli). It used Azure's asyncanalyzeDocumentPOST +operationIdpolling flow.The new architecture (#194) centres on externalised inference services (see
nvisycom/inference). Cloud OCR backends carry provider-specific auth, polling, and parsing work that doesn't pay off without a user. This issue tracks reintroducing it when needed.What this issue becomes
Add
AzureDocaiBackendback tonvisy-ocrbehind anazure-docaicargo feature.Triggers to revisit
Scope when reintroduced
nvisy-ocr/src/backend/azure_docai_backend.rs(new backend/-based layout)azure-docaifeature onnvisy-ocr(withtokio/timefor the poll loop)OcrBackend::AzureDocai { … }variant +OcrExtractor::from_configdispatchnvisy-ocr/azure-docai→nvisy-engine/microsoft→nvisy-server/microsoft→nvisy-cli/microsoftOcp-Apim-Subscription-Keyheader from env or configBentoNerParams-style configReference
The deleted code is preserved in git history; the last commit including it is the parent of the removal commit on #195.