AI Document Intelligence Platform (POC) - Overview

This repository is the public overview for the organization project.

It intentionally contains no application code.
It explains the product vision, architecture, capabilities, roadmap, and deployment links.

1. Project Purpose

Build a configurable, multi-tenant platform that can:

Read unstructured documents from different departments
Convert them into structured JSON
Store and organize output by tenant and department
Prepare data for analytics, automation, and future AI/ML workflows

The product is designed to work across multiple business functions, not only healthcare.

2. Department-Driven Model

The platform now supports department isolation.
A department is selected before upload/configuration, and data remains scoped to that department.

Current department set:

Clinic/Pharma
HR
Billing/Finance
Electricity Bills
Water Bills
Sales
Purchasing
Store/Stock

What this means in practice:

Each department has its own extraction template
Uploads are tagged by department
Document views and downstream reports can be filtered per department
Future training/analytics can be run department-wise

3. Core Product Capabilities

A) Document Reader and Extractor

Upload mixed formats (PDF, DOC, DOCX, images)
OCR + native parsing for text acquisition
LLM-based structured extraction into JSON
Error tracking per document for actionable failure handling

B) Configurable Extraction Templates

Non-technical field configuration from UI
Field-wise controls: field_name, data_type, description, format_rules
Department-level save and isolation

C) Operational Data Layer

Structured output stored in PostgreSQL
Multi-tenant boundaries
Document lifecycle controls (status, error, delete, extracted-data views)

4. Architecture Flow (Current + RAG-ready)

flowchart TD
    A[User Upload] --> B[FastAPI API Layer]
    B --> C[Storage + Metadata]
    C --> D[Text Acquisition Layer]
    D --> D1[PDF parsing + OCR]
    D --> D2[DOCX parser]
    D --> D3[DOC parser]
    D --> D4[Image OCR]
    D1 --> E[LLM Structuring]
    D2 --> E
    D3 --> E
    D4 --> E
    E --> F[Validation + Normalization]
    F --> G[PostgreSQL Persistence]
    G --> H[Dashboard + Extracted Data Views]
    G --> I[RAG Ingestion - Planned]
    I --> J[Chunking + Metadata]
    J --> K[Embeddings]
    K --> L[Vector Index]
    L --> M[Retriever + Filters + Rerank]
    M --> N[Grounded Answer + Citations]

5. Technology Stack (Current Implementation)

Frontend

Next.js: 16.1.6
React: 19.2.3
Tailwind CSS: 4.x
TypeScript: 5.x

Backend

FastAPI: >=0.110.0
Python: 3.11
SQLAlchemy: >=2.0.0
Uvicorn: >=0.27.0

Data and Storage

PostgreSQL: 15-alpine
Local/GCP-style storage abstraction

Document and OCR

pdfplumber: >=0.11.0
pytesseract: >=0.3.10
python-docx: >=1.1.2
Pillow: >=10.3.0
antiword: for legacy .doc

AI Layer

Anthropic Claude API (4.x family with fallback strategy)

Auth

Google OAuth 2.0 (authorization code flow)

6. Current Status

Completed

End-to-end extraction pipeline
Multi-format file ingestion
Department-aware config and upload behavior
Per-document error message tracking and display
Document delete and extracted-data navigation
Architecture and dev-status experience pages

In Progress

Quality benchmarking and extraction accuracy baselines
Operational hardening and migration maturity

Planned

RAG ingestion + retrieval pipeline
Citation-grounded Q&A
Department-wise analytics and KPI dashboards
Optional model fine-tuning only after quality baselines are stable

7. Product Rollout Plan (Phased)

Foundation (Done)

Upload, extraction, storage, UI visibility

Reliability (Done/In progress)

Error visibility, delete controls, quality fixes

Quality Baseline (In progress)

Field-level scoring, benchmark set, regression tracking

RAG Enablement (Planned)

Chunking, embeddings, vector index, retriever pipeline

Production Hardening (Planned)

Queue workers, migration discipline, observability, compliance controls

Advanced ML Training (Conditional)

Department-specific training only when measurable benefit is proven

8. Why This Approach

Faster business value with pre-trained models
Lower initial risk than immediate custom-model training
Strong path to scale via department templates + tenant isolation
Clean progression from extraction -> retrieval -> intelligence

9. Deployment Links

Update this section after deployment:

Product URL: TBD
API URL: TBD
API Docs: TBD
Architecture Page: TBD
Dev Status Page: TBD

10. Repository Scope

This repository is the public overview hub for stakeholders, clients, and partners.

It is intended for:

Product narrative
Capability visibility
Architecture communication
Roadmap alignment
Deployment link sharing

No runtime code is maintained here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.docs		.docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Intelligence Platform (POC) - Overview

1. Project Purpose

2. Department-Driven Model

3. Core Product Capabilities

A) Document Reader and Extractor

B) Configurable Extraction Templates

C) Operational Data Layer

4. Architecture Flow (Current + RAG-ready)

5. Technology Stack (Current Implementation)

Frontend

Backend

Data and Storage

Document and OCR

AI Layer

Auth

6. Current Status

Completed

In Progress

Planned

7. Product Rollout Plan (Phased)

8. Why This Approach

9. Deployment Links

10. Repository Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Document Intelligence Platform (POC) - Overview

1. Project Purpose

2. Department-Driven Model

3. Core Product Capabilities

A) Document Reader and Extractor

B) Configurable Extraction Templates

C) Operational Data Layer

4. Architecture Flow (Current + RAG-ready)

5. Technology Stack (Current Implementation)

Frontend

Backend

Data and Storage

Document and OCR

AI Layer

Auth

6. Current Status

Completed

In Progress

Planned

7. Product Rollout Plan (Phased)

8. Why This Approach

9. Deployment Links

10. Repository Scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages