-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
Honest list of what is shipped, what is planned, and what is not happening.
- PDF parsing with page-by-page text extraction
- Fixed-size chunking with overlap and page tracking
- OpenAI embeddings
- In-memory vector store with cosine similarity
- Streaming chat responses
- Single-page UI for upload and chat
- Hybrid search (BM25 plus dense, fused with Reciprocal Rank Fusion)
- Reranker step (LLM cross-encoder with a deterministic lexical fallback)
- Citation streaming (NDJSON: citations then tokens)
- Multi-document chat (index many, ask across all or scope to a subset)
- Page-level highlights (every citation carries source and page)
- End-to-end and unit tests with fixture PDFs
- Vercel one-click deploy
- MIT licence, full docs
- pgvector adapter as a drop-in behind a flag, keeping BM25 via Postgres
tsvector. Schema and migration in Swap to pgvector. - Local embedding option (Ollama) for users who do not want to send data to OpenAI.
- Semantic chunking (split by topic rather than character count).
- Streaming progress during upload (live progress for big PDFs)
- Auth and per-user indexes
- OCR for scanned PDFs (Tesseract or a vision model)
- Custom prompt templates per use case
- Export question-and-answer history as JSON or Markdown
- Highlight the exact character span on the page, not just the page number
These belong in their own projects:
- Production multi-tenant SaaS layer. A different product, not a starter.
- Payment and billing. Same.
- Heavy framework dependencies (LangChain, LlamaIndex). Defeats the point of this being readable.
- A hosted vector database service (Pinecone, Weaviate). pgvector covers what most people need.
Tracked as issues rather than applied in the non-breaking dependency pass: Next.js 16, React 19, openai SDK 6, pdf-parse 2, Tailwind 4, ESLint 10, TypeScript 6. See the repository issues.
- Pick something from the "Next up" list.
- Open an issue saying you are taking it so two people do not duplicate.
- Fork, branch, push, PR.
- Keep PRs small and focused. One feature per PR.
I will accept PRs that fix bugs, add tests, implement something on the "Next up" list, or improve docs. I do not want PRs that add a framework dependency without strong justification, refactor working code without changing behaviour, or add a UI library when the inline Tailwind is fine.
Semver. Breaking changes (renamed env var, removed API route, changed default behaviour) bump major. New features that do not break existing setups bump minor. Bug fixes bump patch.
Current version: 1.1.0.
See Releases on GitHub and the CHANGELOG.