A terminal-style document intelligence system built with Next.js 15, Jina Embeddings, Pinecone, Groq LLaMA 3.3, and Cloudinary.
The application allows you to:
- Upload PDF files
- Extract and clean PDF text
- Auto-chunk content dynamically
- Generate embeddings of Extracted text
- Store vectors in Pinecone
- Query the PDF using natural language
- Stream AI responses in real-time
- Enforce safe, context-only answers
Uploads PDF files using Cloudinary’s raw resource mode.
Text extraction is performed using pdf-parse-fixed, a Node-only PDF parser that works reliably on Vercel without requiring any DOM, canvas, or worker polyfills. This ensures fast, unlimited, and cost-free text extraction for all uploaded PDFs.
Automatic chunk-size calculation (500–1800 chars) with ~12% overlap.
Multi-chunk embedding using 20 concurrent Jina API calls.
Each chunk is stored with:
profilefiletext
Supports filtered similarity search.
- Normalize query text
- Embed query
- Pinecone similarity search
- Return context
- Stream LLaMA response
If the answer is not in the PDF → reply exactly: "Not found in the provided document."
- White terminal-style progress bar
- Upload → Parsing → Embedding progress
- Real-time streamed answers
CTRL+Csession termination
- Jina embeddings rate limits
- Groq request limits
- Pinecone query/write caps
- Cloudinary file size limits
Avoid spamming uploads or excessive PDF reprocessing.
Live Site: DocShadow
- Next.js 15
- TypeScript
- Jina Embeddings v2
- Pinecone
- Groq LLaMA 3.3
- Cloudinary RAW
- pdf-parse-fixed
- Tailwind CSS
- LinkedIn – Jaafar Youssef
For learning and personal use only.
Not intended for heavy industrial document processing.


