ask questions about your own pdfs. drop a doc in, get cited answers back.
drop a pdf, it gets chunked at around 800 chars with 100 overlap, then embedded with text-embedding-3-small. vectors live in a json-backed in-memory store. when you ask something, the query is embedded, top chunks come back via cosine search, and gpt-4o-mini answers with those chunks as context. the ui streams the answer and shows which chunks it pulled from.
next.js 14 (app router), typescript, tailwind. openai sdk for embeddings and chat. pdf-parse for text extraction. no database; chunks persist to .data/store.json.
cp .env.example .env.local
# fill in OPENAI_API_KEY
npm install
npm run devthen http://localhost:3000.
| var | what |
|---|---|
| OPENAI_API_KEY | required. used for both embeddings and chat |
vercel works out of the box. set OPENAI_API_KEY in project env. the .data/store.json persistence is local only, so on serverless platforms the filesystem is ephemeral. for a real deployment swap lib/vector-store.ts for pgvector or pinecone.
no auth on purpose. clerk drops in cleanly if you want it: wrap app/layout.tsx with ClerkProvider and gate api routes with authMiddleware.
chunking is intentionally dumb. for production use semantic chunking and a reranker.