Private Doc Brain

A local, privacy-first question-answering system for your personal documents. Ask natural language questions about your files — PDFs, Word docs, emails, text files — without any data ever leaving your machine.

Why This Exists

Most document Q&A tools require uploading your files to a cloud service. Private Doc Brain runs entirely on your hardware using Ollama for local LLM inference and ChromaDB for local vector storage. No API keys, no subscriptions, no data exposure.

Features

Multi-format document support — PDF, DOCX, TXT, MD, EML
Hybrid search — combines BM25 keyword search with semantic vector search via Reciprocal Rank Fusion for better retrieval than either alone
HyDE (Hypothetical Document Embeddings) — optional mode where the LLM generates a hypothetical answer first, then uses that to find more relevant chunks
Incremental indexing — SHA256-based change detection means only modified or new files get re-embedded
Source citations — every answer shows which document and chunk it came from
Conversational memory — maintains the last 6 turns so you can ask follow-up questions
Fully local — Ollama runs the embedding model and LLM on your machine, ChromaDB stores vectors on disk

Architecture

private-doc-brain/
├── main.py          # CLI entry point (ingest / chat / list / remove)
├── config.py        # Centralized settings (paths, models, chunking, retrieval)
├── ingest.py        # Document parsing, chunking, and indexing pipeline
├── ollama.py        # HTTP client for Ollama (embeddings + chat)
├── search.py        # Hybrid retrieval: BM25 + vector search + RRF + HyDE
├── brain.py         # Interactive REPL, prompt engineering, citation display
├── requirements.txt
├── docs/            # Put your documents here
├── .chroma_db/      # ChromaDB vector store (auto-generated)
└── .index_state.json  # Index metadata (auto-generated)

How It Works

Ingest — Documents in docs/ are parsed, split into ~500-character overlapping chunks, embedded using nomic-embed-text, and stored in ChromaDB.
Search — At query time, a BM25 index is built in-memory from all stored chunks. Your question is embedded and used for vector search. Both result sets are merged using Reciprocal Rank Fusion and the top 5 chunks are selected.
HyDE (optional) — Before embedding the question, the LLM generates a hypothetical document snippet that would answer it. That synthetic snippet is embedded instead of the raw question, which tends to match real document language more closely.
Chat — The top chunks are injected into a prompt with the conversation history. llama3.2 streams the response in real-time. Citations are printed afterward.

Tech Stack

Component	Technology
Language	Python 3.10+
LLM backend	Ollama (`llama3.2`)
Embeddings	Ollama (`nomic-embed-text`)
Vector store	ChromaDB (cosine similarity)
Keyword search	rank-bm25
PDF parsing	pdfplumber
DOCX parsing	python-docx
Terminal output	colorama

Setup

1. Install Ollama

Download from https://ollama.com, then pull the required models:

ollama pull nomic-embed-text
ollama pull llama3.2

Start the Ollama server (it may already be running as a background service):

ollama serve

2. Install Python dependencies

pip install -r requirements.txt

3. Add your documents

Copy any PDFs, Word docs, emails, or text files into the docs/ directory:

docs/
├── contract.pdf
├── notes.md
├── report.docx
└── archive.eml

Usage

Index your documents

python main.py ingest

Only new or modified files are re-indexed on subsequent runs.

Ask questions

python main.py chat

With HyDE enabled (recommended for better semantic recall):

python main.py chat --hyde

You'll enter an interactive session. Type your question and press Enter. Type exit or press Ctrl+C to quit.

You: What were the key terms of the vendor contract?
[streams answer with citations...]

You: What about the payment schedule?
[follow-up using conversation context...]

Manage the index

# List all indexed documents
python main.py list

# Remove a specific file from the index
python main.py remove contract.pdf

Configuration

All settings are in config.py:

Setting	Default	Description
`DOCS_DIR`	`docs/`	Input directory
`CHROMA_DIR`	`.chroma_db/`	Vector store path
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama endpoint
`EMBED_MODEL`	`nomic-embed-text`	Embedding model
`CHAT_MODEL`	`llama3.2`	Chat/generation model
`CHUNK_SIZE`	500	Target chunk size (chars)
`CHUNK_OVERLAP`	50	Overlap between chunks (chars)
`TOP_K_VECTOR`	20	Vector search candidates
`TOP_K_BM25`	20	BM25 search candidates
`TOP_K_FINAL`	5	Chunks passed to LLM

Implementation Notes

Chunking without a tokenizer — Chunks are split on sentence and paragraph boundaries using character count (4 chars ≈ 1 token) as a lightweight approximation. This avoids adding a tokenizer dependency while still respecting semantic boundaries.

Reciprocal Rank Fusion — Rather than manually weighting BM25 vs. vector scores (which are on incompatible scales), RRF uses the rank position of each chunk in each result list. This is robust and requires no tuning.

HyDE — The prompt instructs the model to write as if extracting text from a real document, avoiding meta-language like "according to...". The resulting snippet lives in the same embedding space as actual document text, improving recall when question phrasing diverges from document phrasing.

Embedding batching — Texts are embedded 32 at a time to avoid memory spikes when indexing large document sets.

Session-scoped BM25 — The BM25 index is built once per chat session from all ChromaDB chunks loaded into memory. ChromaDB remains the source of truth; BM25 is a fast in-memory layer.

Requirements

Python 3.10+
Ollama running locally with nomic-embed-text and llama3.2 pulled
~4GB RAM for llama3.2 (or more for larger models)

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Private Doc Brain

Why This Exists

Features

Architecture

How It Works

Tech Stack

Setup

1. Install Ollama

2. Install Python dependencies

3. Add your documents

Usage

Index your documents

Ask questions

Manage the index

Configuration

Implementation Notes

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
brain.py		brain.py
config.py		config.py
ingest.py		ingest.py
main.py		main.py
ollama.py		ollama.py
requirements.txt		requirements.txt
search.py		search.py

Folders and files

Latest commit

History

Repository files navigation

Private Doc Brain

Why This Exists

Features

Architecture

How It Works

Tech Stack

Setup

1. Install Ollama

2. Install Python dependencies

3. Add your documents

Usage

Index your documents

Ask questions

Manage the index

Configuration

Implementation Notes

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages