A RAG (retrieval-augmented generation) system that lets you ask natural-language questions across your PM documents and get answers grounded in the actual text, with citations back to the source.
Built with sentence-transformers (local embeddings, no API key needed), ChromaDB (vector store), and Claude (answer generation). Ships with a synthetic set of Petal & Co PRDs, strategy docs, and retro notes so you can run it immediately.
Documents are write-only in practice. You write the PRD, it gets approved, and then it sits in Confluence being slowly forgotten. Six months later someone asks "why did we decide not to build X?" and you spend 20 minutes searching before giving up and guessing.
The second failure mode is context when re-entering a product area. Before a quarterly review you want to quickly re-read everything relevant, but that means opening five documents, skimming, and doing the synthesis yourself.
RAG solves both problems. You ingest your documents once, and then ask natural-language questions against them. The system retrieves the most relevant passages and hands them to a language model to synthesise a cited answer.
User question
|
v
Embed query (sentence-transformers, runs locally)
|
v
ChromaDB: cosine similarity search -> top-k chunks
|
v
Prompt: system + retrieved context + question
|
v
Claude (Anthropic API) -> cited answer
|
v
Streamlit UI with source expander
Chunking: Documents are split on ## headings first (semantic boundaries), then capped at 300 words with 40-word overlap. Chunking is a real design decision: too small and you lose context, too large and irrelevant text pollutes retrieval. The overlap prevents a fact from being split across a chunk boundary and becoming unretrievable.
Embeddings: Each chunk is converted to a 384-dimensional vector using all-MiniLM-L6-v2 from sentence-transformers. This runs entirely on your machine with no API key. Embeddings are learned numeric representations where semantically similar text has similar vectors, which is what makes "why did we descope saved cards?" match the decision log entry even though none of those exact words appear in it.
Cosine similarity: ChromaDB retrieves chunks by cosine similarity, which measures the angle between vectors rather than their magnitude. This makes retrieval robust to document length: a short decision log entry and a long PRD section on the same topic are equally findable for the same query.
Top-k retrieval: The retriever returns the k most similar chunks. k is a hyperparameter you control via the sidebar slider. Higher k gives Claude more context but risks adding noise. The default of 4 works well for this dataset.
Prompt stuffing: The retrieved chunks are injected directly into the prompt alongside the question. This is why Claude can answer questions about your specific documents without fine-tuning. The model's weights do not change. All the knowledge comes from retrieval at query time.
Requirements: Python 3.10+, uv, an Anthropic API key
git clone https://github.com/jackhendon/pm-doc-chat
cd pm-doc-chat
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
uv sync
# Step 1: index the documents (run once, or re-run when docs change)
uv run python ingest.py
# Step 2: start the chat interface
uv run streamlit run app.pyThe first uv sync downloads sentence-transformers and its dependencies (including PyTorch), which is around 500MB. Subsequent runs are fast.
These all work against the included Petal & Co sample documents:
Checkout PRD
- "What are the success metrics for the checkout redesign?"
- "Why was saved card management descoped from v1?"
- "What accessibility requirements does the checkout project have?"
Gifting Suite PRD
- "What are the main features in the gifting suite?"
- "What external dependencies could block the gifting suite?"
- "Why is gift registry out of scope?"
API Platform PRD
- "Why did we choose Redis for rate limiting?"
- "How long will v1 of the API be supported after v2 launches?"
- "Who are the target customers for the API platform?"
Strategy doc
- "What are Petal & Co's three strategic pillars for 2026?"
- "Why aren't we building a mobile app this year?"
Decision log
- "Why did we choose Stripe over other payment providers?"
- "What alternatives to Redis were considered?"
Retro notes
- "What action items came out of the sprint 13 retrospective?"
- "How is team morale?"
Cross-document
- "What is the current status of the Redis staging blocker and how did it come up in planning?"
- "What is the overall product strategy and how does checkout fit into it?"
| Document | What it contains |
|---|---|
prd_checkout_redesign.md |
Goals, user stories, success metrics, open questions, decisions (including why saved cards were descoped) |
prd_gifting_suite.md |
Gift message, wrap, scheduled delivery, anonymous receipt; gift registry out of scope; fulfilment dependency |
prd_api_platform_v2.md |
Rate limiting, webhooks, key management UI, v1 deprecation plan, Redis decision |
strategy_fy2026.md |
Three pillars, OKRs, resourcing, what we are not doing (mobile app, international, marketplace) |
decision_log.md |
Running log: Stripe vs Adyen, Redis vs in-memory, saved cards descoped, gift registry deferred, v1 deprecation timeline |
retro_sprint_13.md |
What went well, Redis blocker, fulfilment dependency, action items, team mood |
Replace the files in docs/ with your own markdown exports from Confluence, Notion, or wherever your documents live. Then re-run ingest:
uv run python ingest.pyNothing else needs to change. The chunker, embedder, and retriever are document-agnostic.