Azure-native legal-contract intelligence platform. POC scaffold.
This repo holds the POC scope: a minimum viable Azure stack to validate metadata extraction, RAG retrieval, clause comparison against a gold standard, and human-in-the-loop review, on a 500-document corpus.
▶ View the interactive overview — a visual tour of the query router, extraction, HITL review, and cross-domain reuse.
- POC overview and success criteria:
docs/poc/00-overview.md - POC architecture (diagram + components):
docs/poc/01-architecture.md - Data model (SQL DDL, AI Search indexes, blob layout):
docs/poc/02-data-model.md - Model selection and prompts per stage:
docs/poc/03-models-and-prompts.md - Cost considerations:
docs/poc/04-cost-considerations.md - Architectural tradeoffs (Event Grid vs Service Bus, etc.):
docs/poc/05-tradeoffs.md - Low-code alternatives (Copilot Studio, Power Automate):
docs/poc/06-low-code-alternatives.md - Sample-document sourcing strategy:
docs/poc/07-sample-documents.md - Router design:
docs/poc/08-router-design.md - Evaluation harness:
docs/poc/09-evaluation.md - Diagrams (architecture, data flows, state lifecycles, UI flows, HITL modes):
docs/poc/10-diagrams.md - Ingestion pipeline (Azure Event Grid vs local polling):
docs/poc/11-ingestion-pipeline.md - Local runtime (docker-compose runbook + parity matrix):
docs/poc/12-local-runtime.md - Tenant setup + permissions matrix:
docs/poc/13-tenant-setup.md - Deployment guide (Azure runbook):
docs/poc/14-deployment-guide.md - Observability (App Insights + SQL audit + KQL/SQL queries):
docs/poc/15-observability.md - Azure DevOps operator guide (CI/CD, promotion, rotation, alerts, runbook):
docs/poc/16-azure-ops-guide.md - Scaling considerations (POC → 100k contracts):
docs/poc/17-scaling-considerations.md - LLM orchestration (no-framework rationale + adoption triggers):
docs/poc/18-llm-orchestration.md - Eval baselines (golden-QA + field-extraction):
docs/poc/19-eval-baselines.md - Corpus and gold clauses reference (the 16 synthetic contracts + 9 gold clauses + applicability map):
docs/poc/20-corpus-and-gold-clauses.md - HITL review workflow (queue, per-field correction, append-only lineage, three-axis state, reviewer auth):
docs/poc/22-hitl-review.md— state model ADR-0011, auth ADR-0012 - Reusing this codebase for other domains (sales, surveys, support calls):
docs/reuse-for-other-domains.md - Architecture Decision Records:
docs/adr/
docs/ reference architecture + POC docs + ADRs
infra/
bicep/ Azure IaC (subscription-scoped main.bicep + 12 modules)
local/ docker-compose stack (mssql, azurite, qdrant, ollama, unstructured)
scripts/ SQL DDL, AI Search index definitions, data-prep, function packaging
samples/ gold-clause templates + synthetic contracts (PDFs built on demand)
src/
shared/ profile, config, clients, router, sql_builder, api, prompts,
auth, openapi, layout, vector_search, coercions, embedding_text
functions/
ingestion/ Event Grid → process_blob_event (azure profile)
api/ HTTP query/contracts/compare endpoints (azure profile)
local/ FastAPI wrapper (api_server.py) + Azurite poll watcher
(ingest_watcher.py) for the docker-compose runtime
web/ React + Vite + TypeScript SPA with light/dark theming
(Tailwind v4) — Static Web App ready
site/ static GitHub Pages overview site (index.html + screenshots)
tests/
unit/ fast tests, no Azure deps
eval/ integration eval runner (RUN_INTEGRATION_EVAL=1)
- Read
docs/poc/00-overview.mdfor scope. - Run the local stack:
docs/poc/12-local-runtime.md. - When ready for cloud:
docs/poc/13-tenant-setup.md→docs/poc/14-deployment-guide.md. - Source contracts:
docs/poc/07-sample-documents.md.
Counterparties in samples/contracts-synthetic/ and tests/golden_qa.jsonl are fictional. Build the PDFs with bash scripts/data-prep/build-synthetic-pdfs.sh. Real corpora (CUAD, SEC EDGAR) are not redistributed — see docs/poc/07-sample-documents.md.
The full POC stack runs end-to-end in two profiles selected by RUNTIME_PROFILE:
azure(default): Functions on Event Grid + Document Intelligence + Azure OpenAI + Azure SQL + Azure AI Search + Static Web App. Bicep is idempotent and zero-warning; the Bicep-↔-app contract is enforced bytests/unit/test_bicep_app_contract.py. Deployment to a real subscription has not been performed; seedocs/poc/13-tenant-setup.mdfor prerequisites.local(docker-compose, no cloud): FastAPI wrapper + Azurite-poll watcher driving the samepipeline.process_blob_eventandshared.api.querycodepaths, with mssql / Azurite / Qdrant / Ollama / unstructured.io as drop-in service replacements. Seedocs/poc/12-local-runtime.md.
src/ highlights (all checked in and exercised by the local stack):
| Area | Where |
|---|---|
| Ingestion pipeline (DI/unstructured → LLM extraction → SQL + vectors + audit) | src/functions/ingestion/pipeline.py — flow walkthrough in docs/poc/11-ingestion-pipeline.md |
| Query API (router → reporting / search / clause-comparison / mixed handlers) | src/shared/api.py, src/shared/router.py, src/shared/sql_builder.py — design narrative in docs/poc/18-llm-orchestration.md |
| HITL review (queue + per-field write-through correction, append-only lineage, three-axis state, reviewer auth) | src/shared/api.py, src/shared/auth.py — workflow in docs/poc/22-hitl-review.md |
| Profile-aware client factories (Azure SDK ↔ local equivalents) | src/shared/clients.py, src/shared/layout.py, src/shared/vector_search.py |
| Web frontend (4 tabs — Chat, Contracts, Review, Gold Clauses — with HITL per-field review + lineage, shared drawer + compare modal, light/dark theme via Tailwind v4) | src/web/ |
| OpenAPI spec + Swagger UI served by the API | src/shared/openapi.py |
Tests: unit suite is PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest tests/unit -q. A golden-question eval lives in tests/golden_qa.jsonl and runs against the live API via tests/eval/ when RUN_INTEGRATION_EVAL=1.