A production-ready AI system for intelligent document analysis using Retrieval-Augmented Generation (RAG) with comprehensive traceability. The platform ingests documents, performs semantic search with pgvector, and delivers structured analysis with full citation tracking and cost transparency.
π Live Demo: https://document-intelligence-platform.vercel.app
π Backend API: https://document-intelligence-platform.onrender.comStatus: β Fully deployed and operational
- β Document Upload & Processing: PDF, Markdown, HTML, and text file ingestion
- β Intelligent Chunking: Semantic document splitting with configurable parameters
- β Structured LLM Analysis: JSON-mode analysis with confidence scoring
- β Cost Tracking: Token usage and cost estimation for every operation
- β Modern UI: Next.js frontend with real-time analysis display
- β Vector Embeddings: OpenAI text-embedding-3-small with Supabase storage
- β Semantic Search: pgvector similarity search with metadata filtering (threshold: 0.3)
- β Retrieval Pipeline: Query embedding β vector search β ranked results with logging
- β Citation Tracking: Every analysis includes source chunk references
- β Retrieval Traceability: Full visibility into which chunks influenced the response
- β Advanced Filtering: Filter by document type, specific documents, or metadata
- β Frontend Visualization: Interactive display of retrieved chunks with similarity scores
- β Automatic Embeddings: Documents are automatically embedded after upload
- β Production Deployment: Backend on Render, Frontend on Vercel, Database on Supabase
- β Secure API Proxy: Server-side proxy hides API keys from client
- Architecture Overview
- Production Deployment
- Getting Started
- API Reference
- RAG Pipeline
- Configuration
- Development
- Testing
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js) β
β - Document Upload UI - RAG Analysis Interface β
β - Retrieval Traceability Visualization β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Document β β Chunking β β Embedding β β
β β Service ββ β Service ββ β Service β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Retrieval ββ β Vector ββ β Supabase β β
β β Service β β Search β β + pgvector β β
β ββββββββ¬ββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ β
β β LLM β (OpenAI GPT-4o) β
β β Service β β
β ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Query Embedding: User query β OpenAI embeddings (1536 dimensions)
- Vector Search: Semantic similarity search via pgvector with cosine distance
- Filtering: Apply document_ids, doc_type, or custom metadata filters
- Ranking: Sort by similarity score, apply threshold (default: 0.5)
- Context Building: Format top-k chunks with source references
- LLM Analysis: GPT-4o generates response with chunk citations
- Traceability: Return full metadata (chunks used, scores, tokens, cost)
- Frontend: https://document-intelligence-platform.vercel.app (Vercel)
- Backend API: https://document-intelligence-platform.onrender.com (Render)
- Database: Supabase PostgreSQL with pgvector extension
Frontend (Vercel):
- Next.js 14 with TypeScript and Tailwind CSS
- Server-side API proxy (
/api/proxy) to hide API keys - Automatic deployments from main branch
- Environment variables:
API_KEY,BACKEND_API_URL
Backend (Render):
- FastAPI with Python 3.12
- Auto-deploy from GitHub on push to main
- Environment variables:
OPENAI_API_KEY,SUPABASE_URL,SUPABASE_KEY,API_KEY,CORS_ORIGINS - Free tier with auto-sleep after inactivity (cold starts ~30s)
Database (Supabase):
- PostgreSQL 15 with pgvector 0.5.1
- Tables:
documents,chunks,embeddings - HNSW vector index for fast similarity search
- Foreign key constraints and cascade deletes
β
Dependency Compatibility: Upgraded to supabase 2.10.0, httpx 0.27.2
β
UUID Serialization: Fixed model_dump(mode='json') for PostgreSQL
β
Null Byte Sanitization: Strip \x00 from PDF text before DB insert
β
Vector Parsing: Convert DB string vectors back to lists with json.loads()
β
Similarity Threshold: Lowered from 0.5 to 0.3 for better recall
β
Auto-Embedding: Documents automatically embedded after upload
β
Vector Search Function: Added match_chunks() SQL function for semantic search
- API keys stored server-side only (never exposed to client)
- Next.js API routes act as secure proxy
- CORS restricted to frontend domain
- Supabase service keys protected via environment variables
- Python 3.11+
- Node.js 18+
- Supabase account with pgvector enabled
- OpenAI API key
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your credentials:
# OPENAI_API_KEY=sk-...
# SUPABASE_URL=https://xxx.supabase.co
# SUPABASE_KEY=eyJhbG...
# Run database migrations
# (Ensure pgvector extension is enabled in Supabase)
# Start development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Configure environment
cp .env.example .env.local
# Edit .env.local:
# NEXT_PUBLIC_API_URL=http://localhost:8000
# Start development server
npm run devVisit http://localhost:3000 to use the application.
Base URL: http://localhost:8000/api/v1
POST /documents/upload
Content-Type: multipart/form-data
file: <binary>
doc_type: "resume" | "cover_letter" | "other"Response:
{
"id": "uuid",
"filename": "resume.pdf",
"doc_type": "resume",
"file_size": 45678,
"upload_date": "2026-01-04T14:30:00Z"
}POST /documents/{document_id}/generate-embeddingsResponse:
{
"document_id": "uuid",
"chunks_embedded": 12,
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"cost": 0.00015
}POST /search/chunks
Content-Type: application/json
{
"query": "What are the candidate's Python skills?",
"top_k": 5,
"similarity_threshold": 0.5,
"filters": {
"document_ids": ["uuid1", "uuid2"],
"doc_type": "resume"
}
}Response:
{
"chunks": [
{
"chunk_id": "uuid",
"document_id": "uuid",
"document_title": "John_Doe_Resume.pdf",
"doc_type": "resume",
"chunk_index": 3,
"text": "Python: 5 years experience with Django, FastAPI...",
"similarity_score": 0.89
}
],
"metadata": {
"chunks_retrieved": 5,
"query_embedding_model": "text-embedding-3-small",
"timestamp": "2026-01-04T14:35:00Z",
"filters_applied": { "doc_type": "resume" }
}
}POST /analyze-rag
Content-Type: application/json
{
"query": "Evaluate the candidate's qualifications for a senior backend role",
"document_ids": ["uuid1"],
"top_k": 5,
"similarity_threshold": 0.5,
"temperature": 0.7
}Response:
{
"analysis_id": "uuid",
"query": "Evaluate the candidate's qualifications...",
"output": {
"summary": "Strong backend engineer with 5+ years Python...",
"key_points": ["..."],
"confidence_score": 0.85
},
"citations": [
{
"chunk_id": "uuid",
"document_id": "uuid",
"document_title": "resume.pdf",
"chunk_index": 3,
"text": "Python: 5 years with Django..."
}
],
"retrieved_chunks": [
{
"chunk_id": "uuid",
"document_id": "uuid",
"document_title": "resume.pdf",
"doc_type": "resume",
"chunk_index": 3,
"text": "Python: 5 years...",
"similarity_score": 0.89
}
],
"retrieval_metadata": {
"chunks_retrieved": 5,
"query_embedding_model": "text-embedding-3-small",
"timestamp": "2026-01-04T14:35:00Z"
},
"llm_metadata": {
"model": "gpt-4o-2024-11-20",
"temperature": 0.7,
"prompt_tokens": 850,
"completion_tokens": 320,
"total_tokens": 1170
},
"cost": {
"embedding_cost": 0.00002,
"llm_cost": 0.00585,
"total_cost": 0.00587
}
}The RAG (Retrieval-Augmented Generation) pipeline combines semantic search with LLM reasoning for accurate, source-grounded analysis:
-
Query Processing
- User submits natural language query
- Query is embedded using OpenAI's text-embedding-3-small (1536 dims)
-
Semantic Retrieval
- Vector similarity search via pgvector (cosine distance)
- Optional filters: document IDs, doc types, metadata
- Returns top-k most relevant chunks above similarity threshold
-
Context Construction
- Retrieved chunks formatted with source metadata
- Example:
[CHUNK 1] [Doc: resume.pdf, Chunk 3] Python: 5 years...
-
LLM Analysis
- GPT-4o receives query + formatted context
- Generates structured response citing source chunks
- Configured temperature (default: 0.7) for creativity vs. precision
-
Traceability Response
- Complete chunk details with similarity scores
- Token usage and cost breakdown
- Embedding and LLM model metadata
- Citation Tracking: Every claim links to source chunks
- Similarity Scores: Understand retrieval confidence (0.0-1.0)
- Cost Transparency: Embedding + LLM costs per request
- Flexible Filtering: Narrow search by document or type
- Configurable Parameters: Control top_k, threshold, temperature
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Natural language question |
document_ids |
UUID[] | null | Limit search to specific docs |
doc_type |
string | null | Filter by type (resume, cover_letter, etc.) |
top_k |
int | 5 | Number of chunks to retrieve |
similarity_threshold |
float | 0.3 | Minimum cosine similarity (0.0-1.0) |
temperature |
float | 0.7 | LLM creativity (0.0=focused, 1.0=creative) |
Backend (backend/.env):
# OpenAI
OPENAI_API_KEY=sk-...
# Supabase
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=eyJhbG...
# Optional
LOG_LEVEL=INFO
CORS_ORIGINS=http://localhost:3000Frontend (frontend/.env.local):
NEXT_PUBLIC_API_URL=http://localhost:8000Enable pgvector extension in Supabase:
-- Run in Supabase SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';Schema includes:
documents: Uploaded files metadatachunks: Chunked document contentembeddings: Vector embeddings (1536 dimensions)
document-intelligence-platform/
βββ backend/
β βββ app/
β β βββ api/ # FastAPI routes
β β βββ services/ # Business logic
β β β βββ chunking_service.py
β β β βββ embedding_service.py
β β β βββ retrieval.py # RAG retrieval pipeline
β β β βββ llm_service.py # LLM with citations
β β βββ models/ # Pydantic models
β β βββ database/ # Supabase client
β βββ tests/ # 37 comprehensive tests
β βββ requirements.txt
βββ frontend/
β βββ app/ # Next.js pages
β βββ lib/
β β βββ api-client.ts # Typed API client
β β βββ hooks.ts # React hooks (useRAGAnalysis)
β βββ package.json
βββ README.md
cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=html
# Run specific test suites
pytest tests/test_retrieval.py # Retrieval pipeline (12 tests)
pytest tests/test_llm_service_rag.py # LLM RAG features (13 tests)
pytest tests/test_rag_endpoints.py # API endpoints (12 tests)Test Coverage:
- β 37 total tests
- β Retrieval pipeline with filters
- β LLM analysis with citations
- β API endpoints with error handling
- β Vector search operations
- β Cost tracking and metadata
All core functionality is tested with comprehensive coverage:
- Unit Tests: Services tested in isolation with mocks
- Integration Tests: API endpoints with TestClient
- Traceability Tests: Verify complete metadata in responses
- Error Handling: Invalid inputs, missing data, edge cases
$ pytest tests/test_rag_endpoints.py -v
tests/test_rag_endpoints.py::test_generate_embeddings_success PASSED
tests/test_rag_endpoints.py::test_search_chunks_basic PASSED
tests/test_rag_endpoints.py::test_search_chunks_with_filters PASSED
tests/test_rag_endpoints.py::test_analyze_rag_basic PASSED
tests/test_rag_endpoints.py::test_analyze_rag_with_document_filter PASSED
tests/test_rag_endpoints.py::test_rag_endpoint_traceability PASSED
...
======================== 12 passed in 2.34s =========================The frontend provides rich visualization of RAG operations:
- Query Input: Natural language question entry
- Filter Controls: Select documents, types, parameters
- Analysis Display: Structured output with confidence scores
- Retrieval Pipeline Section (collapsible):
- Chunks retrieved count
- Embedding model used
- LLM token usage (prompt/completion/total)
- Temperature setting
- Applied filters (JSON display)
- Retrieved Chunks Section:
- Document title and chunk index badges
- Similarity scores with visual progress bars
- Chunk text preview (expandable)
- Doc type indicators
- Upload resume and cover letter
- Generate embeddings for both documents
- Ask: "What makes this candidate qualified for a backend role?"
- View:
- Retrieved 5 chunks (similarity: 0.85-0.92)
- LLM analysis citing specific sections
- Cost: $0.00587 total
- Document upload & processing
- Chunking with configurable parameters
- LLM analysis with structured outputs
- Basic frontend
- Vector embeddings with Supabase
- Semantic search with pgvector
- Retrieval pipeline with comprehensive logging
- Citation tracking
- Frontend traceability visualization
- Batch evaluation framework
- Retrieval quality metrics
- LLM response evaluation
- A/B testing infrastructure
- Performance optimization
- Authentication & authorization
- Rate limiting & caching
- Monitoring & alerting
- Deployment configuration
- API documentation (OpenAPI/Swagger)
Contributions welcome! This project follows a test-driven development approach:
- Write tests for new features
- Implement functionality
- Ensure all tests pass (including existing ones)
- Update documentation
- Submit pull request
MIT License - See LICENSE file for details
- OpenAI: GPT-4o and text-embedding-3-small models
- Supabase: PostgreSQL + pgvector hosting
- FastAPI: High-performance Python API framework
- Next.js: React framework for production-ready frontends
Built with β€οΈ for production-grade document intelligence