-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start
sarmakska edited this page May 31, 2026
·
2 revisions
Five steps. Roughly 90 seconds if you have an OpenAI key handy.
git clone https://github.com/sarmakska/rag-over-pdf.git
cd rag-over-pdfpnpm install(npm install works too, but pnpm is faster.)
If you don't have one, create a key here. Free tier credits are usually enough to test this repo.
cp .env.example .env.localOpen .env.local and paste your key:
OPENAI_API_KEY=sk-proj-...
pnpm devOpen http://localhost:3000, upload one or more PDFs, ask questions.
- An upload form.
- After upload, the document appears in a list with its chunk and page counts. Upload more to build a multi-document corpus.
- Tick the documents you want to search, or leave all unticked to search everything.
- Ask a question. The answer streams in token by token, with a numbered source list underneath linking each citation to a document and page.
| Error | Cause | Fix |
|---|---|---|
OPENAI_API_KEY is not set |
env not loaded | Restart pnpm dev after editing .env.local
|
PDF has no extractable text |
scanned PDF | Use a different PDF or add OCR (out of scope) |
| Upload hangs forever | huge PDF | Try one under 5MB first |
| 429 from OpenAI | rate limited | Wait, or upgrade your OpenAI tier |
| Empty answers | retrieval missed | Try a different question phrasing, check the PDF really contains the answer |
A few PDFs that work well for testing:
- A product manual or spec sheet
- A research paper (arXiv works great)
- An employee handbook
- A privacy policy
- Last year's annual report
Avoid PDFs that are mostly images, charts, or scanned pages. They have no extractable text.
- Read How RAG Works to understand hybrid retrieval, reranking, and citations
- Read Configuration to tune chunk size, top-k, and models
- Read Swap to pgvector to make it persistent
- Read Deployment to ship it on Vercel