PDF RAG Terminal (Next.js 16)

A terminal-style document intelligence system built with Next.js 15, Jina Embeddings, Pinecone, Groq LLaMA 3.3, and Cloudinary.

The application allows you to:

Upload PDF files
Extract and clean PDF text
Auto-chunk content dynamically
Generate embeddings of Extracted text
Store vectors in Pinecone
Query the PDF using natural language
Stream AI responses in real-time
Enforce safe, context-only answers

Features

PDF Upload (Cloudinary RAW)

Uploads PDF files using Cloudinary’s raw resource mode.

Text Extraction & Cleaning

Text extraction is performed using pdf-parse-fixed, a Node-only PDF parser that works reliably on Vercel without requiring any DOM, canvas, or worker polyfills. This ensures fast, unlimited, and cost-free text extraction for all uploaded PDFs.

Adaptive Chunking

Automatic chunk-size calculation (500–1800 chars) with ~12% overlap.

High-Speed Jina Embeddings

Multi-chunk embedding using 20 concurrent Jina API calls.

Pinecone Vector Indexing

Each chunk is stored with:

profile
file
text

Supports filtered similarity search.

RAG Query Pipeline

Normalize query text
Embed query
Pinecone similarity search
Return context
Stream LLaMA response

Context Guardrail

If the answer is not in the PDF → reply exactly: "Not found in the provided document."

Terminal UI

White terminal-style progress bar
Upload → Parsing → Embedding progress
Real-time streamed answers
CTRL+C session termination

API Limitations

Jina embeddings rate limits
Groq request limits
Pinecone query/write caps
Cloudinary file size limits

Avoid spamming uploads or excessive PDF reprocessing.

Overview

Links

Live Site: DocShadow

Technologies Used

Next.js 15
TypeScript
Jina Embeddings v2
Pinecone
Groq LLaMA 3.3
Cloudinary RAW
pdf-parse-fixed
Tailwind CSS

Author

LinkedIn – Jaafar Youssef

License

For learning and personal use only.
Not intended for heavy industrial document processing.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
lib		lib
public		public
serverFunction		serverFunction
types		types
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF RAG Terminal (Next.js 16)

Features

PDF Upload (Cloudinary RAW)

Text Extraction & Cleaning

Adaptive Chunking

High-Speed Jina Embeddings

Pinecone Vector Indexing

RAG Query Pipeline

Context Guardrail

Terminal UI

API Limitations

Overview

Links

Technologies Used

Author

License

About

Uh oh!

Releases

Packages

Languages

jaafar2000/DocShadow

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Terminal (Next.js 16)

Features

PDF Upload (Cloudinary RAW)

Text Extraction & Cleaning

Adaptive Chunking

High-Speed Jina Embeddings

Pinecone Vector Indexing

RAG Query Pipeline

Context Guardrail

Terminal UI

API Limitations

Overview

Links

Technologies Used

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages