OpenBrief

Open-source multi-agent legal document intelligence platform.

Users upload legal documents (contracts, briefs, case filings) and a team of AI agents collaborates to analyze them — with a production RAG pipeline, transparent evaluation metrics, and BYOK (Bring Your Own Key) model access.

Status: Phases 1–4 complete. Custom legal entity extraction models (Qwen3.5-9B + LoRA and a Llama + QLoRA variant) published on Hugging Face for local use. Production extraction runs through the user's BYOK LLM with a per-user opt-out toggle. 6 deterministic contract checkers run inline at upload. Intelligence sidebar on the document detail page renders both. Next up: Phase 5 (MCP Server) so any AI assistant can analyze legal documents through OpenBrief.

What It Does

Upload a legal document (PDF) — automatically parsed, chunked, and embedded for semantic search
Ask questions — AI retrieves relevant sections and generates cited answers
Full document review — comprehensive analysis identifying risks, obligations, deadlines, missing clauses, contradictions, and ambiguities
Evaluation dashboard — transparent metrics tracking hallucination rate, retrieval precision, citation accuracy, and answer relevance

Tech Stack

Backend

Python 3.11+, FastAPI, SQLAlchemy 2.0 (async)
PostgreSQL + pgvector — database with vector similarity search
Sentence Transformers (bge-small-en-v1.5) — document embeddings
PyMuPDF4LLM — layout-aware PDF parsing with header/footer stripping
DeepEval — RAG evaluation with 4 metrics, 52 test cases
LangGraph — multi-agent orchestration (in progress)

Frontend

Next.js 16 (App Router), TypeScript (strict), Tailwind CSS
TanStack Query — data fetching
Recharts — evaluation dashboard charts
Claude AI-inspired dark theme

Supported LLM Providers (BYOK)

OpenAI — GPT-5.4, GPT-5.4 Mini, GPT-5.4 Nano
Anthropic — Claude Opus 4.6, Claude Sonnet 4.6
DeepSeek — DeepSeek R1

Getting Started

Prerequisites

Python 3.11+
Node.js 18+
PostgreSQL 15+ with the pgvector extension

Backend Setup

cd backend
cp .env.example .env
# Edit .env with your database credentials and encryption key

pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload

Frontend Setup

cd frontend
cp .env.example .env.local
# Edit .env.local with your API URL

npm install
npm run dev

Health Check

GET http://localhost:8000/health

Project Structure

openbrief/
├── backend/
│   ├── main.py              # FastAPI app entry point
│   ├── config.py            # Pydantic Settings (loads from .env)
│   ├── api/routes/          # API endpoints (documents, analysis, settings, evaluation, routing)
│   ├── core/
│   │   ├── ingestion/       # PDF parsing (pymupdf4llm), chunking, embedding
│   │   ├── rag/             # Retriever, RAG pipeline, full review, prompts, pricing
│   │   ├── agents/          # Multi-agent system (Research, Analysis, Draft, Fact-Check)
│   │   ├── routing/         # Semantic query routing (targeted vs full review)
│   │   ├── llm/             # LLM provider abstraction, encryption
│   │   ├── extraction/      # Entity extraction (BYOK PromptExtractor + OpenBrief Nano models on HF)
│   │   ├── checkers/        # 6 deterministic contract checkers (no LLM, run inline at upload)
│   │   └── evaluation/      # DeepEval integration, 52 test cases (one-time benchmark, not continuous)
│   ├── db/                  # SQLAlchemy models, async database connection
│   └── alembic/             # Database migrations
├── frontend/
│   ├── app/                 # Next.js pages (dashboard, documents, evaluation, settings)
│   └── components/          # React components (upload, status badge, providers)
├── training/                # Local training data (CUAD) for OpenBrief Nano — the training pipeline and published model live on Hugging Face: illuminator22/openbrief-nano-qwen35-entity-extractor
└── tests/                   # Pytest test suite

API Endpoints

Endpoint	Method	Description
`/api/documents/upload`	POST	Upload a PDF document
`/api/documents/`	GET	List all documents
`/api/documents/{id}`	GET	Document details with chunk count
`/api/documents/{id}/search`	GET	Vector similarity search
`/api/documents/{id}/entities`	GET	Extracted entities + extraction status
`/api/documents/{id}/findings`	GET	Deterministic checker findings, grouped by type and severity
`/api/analysis/query`	POST	Ask a question (RAG pipeline)
`/api/analysis/full-review`	POST	Full document review
`/api/analysis/estimate`	POST	Cost estimate before running
`/api/analysis/unified`	POST	Auto-routing (question vs review)
`/api/settings/llm-key`	POST	Set API key (encrypted)
`/api/settings/models`	GET	Available models with pricing
`/api/evaluation/summary`	GET	Evaluation metrics dashboard
`/api/evaluation/run`	POST	Run evaluation test suite

Evaluation Results

52 test cases evaluated against GPT-5.4:

Metric	Score	Target
Hallucination Rate	2.3%	<5%
Answer Relevance	91.9%	>90%
Faithfulness	99.4%	>90%
Retrieval Precision	70.3%	>85%

License

MIT

Author

Ivan Arshakyan — BrainX

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenBrief

What It Does

Tech Stack

Backend

Frontend

Supported LLM Providers (BYOK)

Getting Started

Prerequisites

Backend Setup

Frontend Setup

Health Check

Project Structure

API Endpoints

Evaluation Results

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenBrief

What It Does

Tech Stack

Backend

Frontend

Supported LLM Providers (BYOK)

Getting Started

Prerequisites

Backend Setup

Frontend Setup

Health Check

Project Structure

API Endpoints

Evaluation Results

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages