Turn Documents Into Knowledge.
DocLens is an open-source document intelligence platform that transforms PDFs, research papers, reports, manuals, and other documents into searchable, structured, and AI-ready knowledge.
Instead of simply extracting text, DocLens helps users understand, explore, and interact with information through AI-powered summaries, knowledge graphs, document search, and intelligent analysis.
DocLens runs locally as a single `docker compose` stack: Postgres, Redis, MinIO, the Go API, the extraction worker, and the Next.js web client. The whole loop — sign in, upload a PDF, watch it extract, read the Markdown, search across documents, delete — runs offline against the local stack with no third-party services.
- Docker Desktop (or `colima` on macOS) with the `docker compose` plugin
- Node 20+ and pnpm 9+
- Go 1.23+
- A few minutes; first `make dev` pulls images and runs migrations
git clone https://github.com/tomeku/doclens.git
cd doclens
make bootstrap # installs JS deps + syncs the Go workspace
make dev # starts Postgres, Redis, MinIO, api, worker, webYou should see:
- API healthy at http://localhost:8080/v1/health
- Web at http://localhost:3000
- MinIO console at http://localhost:9001 (user/pass: `doclens`/`doclens`)
The API ships with a local-auth provider for development (`AUTH_PROVIDER=local`); no Clerk account is required to upload documents and exercise the full flow. To enable Clerk for browser sign-in, set `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` and `CLERK_SECRET_KEY` in `apps/web/.env` and `AUTH_PROVIDER=clerk` plus the Clerk envs in `apps/api/.env`.
make test # Go test ./... + pnpm vitest run
make lint # golangci-lint + eslint
make gen # regenerate OpenAPI client + server stubs
make migrate # apply Postgres migrations against the running compose stack
make down # stop everythingSee CONTRIBUTING.md for the full development workflow.
Modern document extraction tools focus on converting files into text.
DocLens focuses on helping humans understand information.
Upload a document and instantly:
- 📄 Extract text, tables, images, and metadata
- 🧠 Generate AI-powered summaries
- 🔍 Search documents intelligently
- 🌐 Visualize relationships using knowledge graphs
- 🛡️ Verify extraction confidence
- ⚡ Prepare content for AI and RAG workflows
Extract structured content from:
- PDFs
- Research Papers
- Reports
- Manuals
- Documentation
- Office Documents
Generate concise summaries and key insights.
TL;DR
Key Findings
Important Concepts
Actionable Insights
Visualize connections between:
- Topics
- Concepts
- Entities
- References
Know how reliable extracted information is.
Confidence Score: 92%
Tables Found: 12
Topics Extracted: 23
Find information instantly across large documents.
"What does this document say about neural networks?"
Study smarter with AI-generated summaries.
Extract findings and discover connections across papers.
Prepare documents for RAG pipelines and AI applications.
Analyze reports, contracts, and manuals faster.
We believe documents should be:
- Searchable
- Understandable
- Verifiable
- AI-ready
DocLens aims to become the open platform for document intelligence.
Document
↓
Extraction Engine
↓
Document Intelligence Layer
↓
AI Processing
↓
Knowledge Graph
↓
Human Interface Layer
- Document Upload
- Text Extraction
- Markdown Output
- Search
- AI Summaries
- Metadata Extraction
- Table Detection
- Knowledge Graphs
- Confidence Scoring
- Verification Layer
- RAG Export
- Embeddings
- AI Agents
- Full Document Intelligence Platform
- Next.js
- React
- TypeScript
- Tailwind CSS
- shadcn/ui
- Go
- PostgreSQL
- Redis
- OpenAI
- Anthropic
- AWS Bedrock
- Docker
- Kubernetes
- Cloudflare
Contributions are welcome.
Whether you're a developer, designer, researcher, or student, we'd love your help building the future of document intelligence.
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a Pull Request
Website: https://doclens.org
GitHub Discussions: Coming Soon
Discord: Coming Soon
MIT License
Built with ❤️ by Tomeku.
