Skip to content

illuminator22/openbrief

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenBrief

Open-source multi-agent legal document intelligence platform.

Users upload legal documents (contracts, briefs, case filings) and a team of AI agents collaborates to analyze them — with a production RAG pipeline, transparent evaluation metrics, and BYOK (Bring Your Own Key) model access.

Status: Phases 1–4 complete. Custom legal entity extraction models (Qwen3.5-9B + LoRA and a Llama + QLoRA variant) published on Hugging Face for local use. Production extraction runs through the user's BYOK LLM with a per-user opt-out toggle. 6 deterministic contract checkers run inline at upload. Intelligence sidebar on the document detail page renders both. Next up: Phase 5 (MCP Server) so any AI assistant can analyze legal documents through OpenBrief.

What It Does

  1. Upload a legal document (PDF) — automatically parsed, chunked, and embedded for semantic search
  2. Ask questions — AI retrieves relevant sections and generates cited answers
  3. Full document review — comprehensive analysis identifying risks, obligations, deadlines, missing clauses, contradictions, and ambiguities
  4. Evaluation dashboard — transparent metrics tracking hallucination rate, retrieval precision, citation accuracy, and answer relevance

Tech Stack

Backend

  • Python 3.11+, FastAPI, SQLAlchemy 2.0 (async)
  • PostgreSQL + pgvector — database with vector similarity search
  • Sentence Transformers (bge-small-en-v1.5) — document embeddings
  • PyMuPDF4LLM — layout-aware PDF parsing with header/footer stripping
  • DeepEval — RAG evaluation with 4 metrics, 52 test cases
  • LangGraph — multi-agent orchestration (in progress)

Frontend

  • Next.js 16 (App Router), TypeScript (strict), Tailwind CSS
  • TanStack Query — data fetching
  • Recharts — evaluation dashboard charts
  • Claude AI-inspired dark theme

Supported LLM Providers (BYOK)

  • OpenAI — GPT-5.4, GPT-5.4 Mini, GPT-5.4 Nano
  • Anthropic — Claude Opus 4.6, Claude Sonnet 4.6
  • DeepSeek — DeepSeek R1

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 15+ with the pgvector extension

Backend Setup

cd backend
cp .env.example .env
# Edit .env with your database credentials and encryption key

pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload

Frontend Setup

cd frontend
cp .env.example .env.local
# Edit .env.local with your API URL

npm install
npm run dev

Health Check

GET http://localhost:8000/health

Project Structure

openbrief/
├── backend/
│   ├── main.py              # FastAPI app entry point
│   ├── config.py            # Pydantic Settings (loads from .env)
│   ├── api/routes/          # API endpoints (documents, analysis, settings, evaluation, routing)
│   ├── core/
│   │   ├── ingestion/       # PDF parsing (pymupdf4llm), chunking, embedding
│   │   ├── rag/             # Retriever, RAG pipeline, full review, prompts, pricing
│   │   ├── agents/          # Multi-agent system (Research, Analysis, Draft, Fact-Check)
│   │   ├── routing/         # Semantic query routing (targeted vs full review)
│   │   ├── llm/             # LLM provider abstraction, encryption
│   │   ├── extraction/      # Entity extraction (BYOK PromptExtractor + OpenBrief Nano models on HF)
│   │   ├── checkers/        # 6 deterministic contract checkers (no LLM, run inline at upload)
│   │   └── evaluation/      # DeepEval integration, 52 test cases (one-time benchmark, not continuous)
│   ├── db/                  # SQLAlchemy models, async database connection
│   └── alembic/             # Database migrations
├── frontend/
│   ├── app/                 # Next.js pages (dashboard, documents, evaluation, settings)
│   └── components/          # React components (upload, status badge, providers)
├── training/                # Local training data (CUAD) for OpenBrief Nano — the training pipeline and published model live on Hugging Face: illuminator22/openbrief-nano-qwen35-entity-extractor
└── tests/                   # Pytest test suite

API Endpoints

Endpoint Method Description
/api/documents/upload POST Upload a PDF document
/api/documents/ GET List all documents
/api/documents/{id} GET Document details with chunk count
/api/documents/{id}/search GET Vector similarity search
/api/documents/{id}/entities GET Extracted entities + extraction status
/api/documents/{id}/findings GET Deterministic checker findings, grouped by type and severity
/api/analysis/query POST Ask a question (RAG pipeline)
/api/analysis/full-review POST Full document review
/api/analysis/estimate POST Cost estimate before running
/api/analysis/unified POST Auto-routing (question vs review)
/api/settings/llm-key POST Set API key (encrypted)
/api/settings/models GET Available models with pricing
/api/evaluation/summary GET Evaluation metrics dashboard
/api/evaluation/run POST Run evaluation test suite

Evaluation Results

52 test cases evaluated against GPT-5.4:

Metric Score Target
Hallucination Rate 2.3% <5%
Answer Relevance 91.9% >90%
Faithfulness 99.4% >90%
Retrieval Precision 70.3% >85%

License

MIT

Author

Ivan Arshakyan — BrainX

About

Open-source multi-agent legal document intelligence platform.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors