GitHub - param20h/PDF-Assistant-RAG: PDF-Assistant-RAG is a complete, production-ready AI document assistant that lets users upload complex PDFs, financial reports, legal contracts, and research papers — then chat with an AI that provides accurate, cited answers powered by a multi-stage Retrieval-Augmented Generation pipeline.

title	Document AI Analyst
emoji	🧠
colorFrom	indigo
colorTo	purple
sdk	docker
app_port	7860
pinned	true
license	mit
short_description	Enterprise Agentic RAG — upload PDFs and chat with AI

██████╗ ██████╗ ███████╗     █████╗ ███████╗███████╗██╗███████╗████████╗ █████╗ ███╗   ██╗████████╗
██╔══██╗██╔══██╗██╔════╝    ██╔══██╗██╔════╝██╔════╝██║██╔════╝╚══██╔══╝██╔══██╗████╗  ██║╚══██╔══╝
██████╔╝██║  ██║█████╗      ███████║███████╗███████╗██║███████╗   ██║   ███████║██╔██╗ ██║   ██║
██╔═══╝ ██║  ██║██╔══╝      ██╔══██║╚════██║╚════██║██║╚════██║   ██║   ██╔══██║██║╚██╗██║   ██║
██║     ██████╔╝██║         ██║  ██║███████║███████║██║███████║   ██║   ██║  ██║██║ ╚████║   ██║
╚═╝     ╚═════╝ ╚═╝         ╚═╝  ╚═╝╚══════╝╚══════╝╚═╝╚══════╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═══╝   ╚═╝
                                                                                                    
                        ██████╗  █████╗  ██████╗
                        ██╔══██╗██╔══██╗██╔════╝
                        ██████╔╝███████║██║  ███╗
                        ██╔══██╗██╔══██║██║   ██║
                        ██║  ██║██║  ██║╚██████╔╝
                        ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝

Enterprise Agentic Retrieval-Augmented Generation System

Upload · Embed · Retrieve · Chat — A production-grade AI document assistant built end-to-end with an agentic RAG pipeline, streaming responses, and per-user data isolation.

Features · Tech Stack · Getting Started · Architecture · RAG Pipeline · API Reference · Deployment · Contributing

🤝 Contributors

Thanks to all the amazing people who have contributed to PDF-Assistant-RAG! 🎉

📋 Contributions

Contributor	Role	Key Contributions
@param20h — Paramjit Singh	🧭 Project Lead	Founded the project; core RAG architecture (FastAPI + ChromaDB + Next.js); Docker multi-stage & HuggingFace Spaces deployment; GitHub Actions CI/CD; JWT auth & Google OAuth; documentation with pipeline diagrams; project governance
@Yuvraj-Sarathe — Yuvraj Sarathe	📐 Docs & Build	Added Mermaid architecture diagram; inline RAG pipeline comments; `.env.example` docs; created `Makefile` with concurrent dev commands & `CHANGELOG.md`
@SatyamPrakash09 — Satyam Prakash	⚙️ Backend Engineer	Chat history export & auto-refresh auth; user profile endpoints; `/health` endpoint (Vector DB + SQL DB monitoring); document pagination; MIME file validation; JWT access + refresh token system
@akmhatey-ai	🎨 UI/UX	Chat textarea auto-resize; clear messages on document switch
@drishtisharma14052007-eng — Drishti Sharma	📝 Documentation	GSSOC contributor FAQ content
@Pika-pika06 — Pika	📝 Documentation	Changelog tracking historical commits through v0.4.0
@blinkerbit / @algojogacor — Arya Rizky	🐛 Bug Buster	Fixed 0-indexed page number display in source cards
@HirenGajjar	🔒 DevOps	Restricted CORS origins via `ALLOWED_ORIGINS` environment variable
@Kaustub26Pvgda — Kaustub Pavagada	⚡ Frontend Engineer	Copy LLM response capability; increased max file upload size to 20 MB
@akshy-yy — Akshaya	🎨 Frontend Engineer	Typing indicator animation while AI responds
@GHX5T-SOL — Bruce Wayne	🐛 Bug Buster	Backend offline error message display
@viswanatha	📊 Observability	Health check endpoint

🌟 Want to join them? Check out CONTRIBUTING.md for contribution guidelines and look for good first issues to get started!

🌟 Overview

PDF-Assistant-RAG is a complete, production-ready AI document assistant that lets users upload complex PDFs, financial reports, legal contracts, and research papers — then chat with an AI that provides accurate, cited answers powered by a multi-stage Retrieval-Augmented Generation pipeline.

The system uses semantic search + cross-encoder reranking to find the most relevant document chunks, streams AI-generated answers token-by-token, and highlights exact source citations with page numbers — all inside a sleek Next.js UI with JWT-secured per-user data isolation.

🏗️ Architecture

graph TD
    subgraph Frontend["Frontend (Next.js 16)"]
        UI["Dashboard UI (React)"]
        Chat["Chat Panel (SSE)"]
        Viewer["PDF Viewer (iframe)"]
    end

    subgraph Backend["Backend (FastAPI 0.115+)"]
        API["API Router (/api/v1)"]
        Auth["Auth (JWT/bcrypt)"]
        DB[(SQLite Metadata)]

        subgraph RAG["RAG Pipeline"]
            Upload["Ingestion Task (Chunking)"]
            Embed["Local Embeddings (all-MiniLM-L6-v2)"]
            Retriever["Two-Stage Retriever"]
            Rerank["Cross-Encoder Reranker"]
            Agent["Agent/Generator"]
        end
    end

    subgraph Storage["Vector Storage"]
        Chroma[(ChromaDB)]
    end

    subgraph External["External Services"]
        HF["HuggingFace Inference API (Qwen 72B)"]
    end

    %% Frontend to Backend Connections
    UI <-->|REST / Auth| API
    Chat <-->|SSE Streaming| API
    Viewer -->|Fetch PDF| API

    %% Backend Internals
    API <--> Auth
    API <--> DB
    API --> Upload
    API <--> Retriever
    API <--> Agent

    %% RAG Ingestion Flow
    Upload --> Embed
    Embed -->|Store Vectors| Chroma

    %% RAG Query Flow
    Retriever -->|1. Semantic Search| Chroma
    Retriever -->|2. Score & Sort| Rerank
    Retriever -->|Context| Agent

    %% External LLM Flow
    Agent <-->|LLM Generation| HF

🔄 System Flow Overview

The user interacts with the Next.js frontend to upload documents and ask questions.
FastAPI handles authentication, document ingestion, and chat APIs.
Uploaded documents are parsed, chunked, and converted into vector embeddings.
Embeddings are stored in ChromaDB for semantic retrieval.
During querying, the retriever fetches relevant chunks from ChromaDB.
A reranker improves retrieval quality before sending context to the LLM.
Hugging Face Inference API generates the final response.
Responses are streamed back to the frontend using SSE.

🛠 Tech Stack

Backend

	Technology	Purpose
	FastAPI 0.115+	Async REST API framework
	Python 3.11	Runtime environment
	SQLite + SQLAlchemy	User & document metadata storage
	JWT + Passlib	Authentication & authorization
	LangChain	RAG orchestration
	ChromaDB	Persistent vector store (per-user)
	HuggingFace Hub	LLM inference API

Frontend

	Technology	Purpose
	Next.js 16	React framework (App Router)
	Tailwind CSS v4	Utility-first styling
	shadcn/ui	Accessible component library
	TypeScript	Type-safe frontend
	react-pdf	In-browser PDF viewer
	react-markdown + GFM	Markdown-rendered AI responses

AI / ML Pipeline

	Technology	Purpose
	all-MiniLM-L6-v2	Local sentence embeddings
	ms-marco-MiniLM-L-6-v2	Cross-encoder reranker
	Qwen2.5-72B-Instruct	LLM (HuggingFace Inference API)
	PyMuPDF + python-docx	Document parsing

DevOps & Tooling

	Technology	Purpose
	Docker Multi-Stage	Containerized deployment
	GitHub Actions	CI pipeline (dev branch)
	Git LFS	Binary asset management
	HuggingFace Spaces	Production deployment

✨ Key Features

👤 Users

🔐 JWT-secured register & login
📄 Upload PDF and DOCX documents
💬 Ask questions in natural language
🌊 Streaming AI responses token-by-token
📚 Inline source citations with page numbers
🗂️ Per-user complete data isolation

🤖 RAG Pipeline

🔪 Smart recursive text chunking (configurable size & overlap)
🧠 Local embeddings — no data leaves your machine
🔍 Two-stage retrieval — semantic search → cross-encoder rerank
✂️ Top-K filtering for precision answers
📝 Custom system prompts with citation instructions
🧾 Source scoring with confidence levels

⚙️ Engineering

🚀 Async FastAPI with Server-Sent Events streaming
🗄️ ChromaDB with persistent per-user collections
🐳 Multi-stage Docker build (Node → Python)
🔄 GitHub Actions CI on dev branch
🛡️ CORS, file validation, JWT expiry
📊 Chat history persistence per document

📁 Project Structure

PDF-Assistant-RAG/
│
├── backend/                          # FastAPI + RAG server
│   ├── app/
│   │   ├── main.py                   # App entrypoint, middleware, static files
│   │   ├── config.py                 # Pydantic settings (env vars)
│   │   ├── database.py               # SQLAlchemy async engine
│   │   ├── models.py                 # ORM models (User, Document, Message)
│   │   ├── schemas.py                # Pydantic request/response schemas
│   │   ├── auth.py                   # JWT creation & verification
│   │   │
│   │   ├── routes/
│   │   │   ├── auth.py               # POST /register, /login, /me
│   │   │   ├── documents.py          # Upload, list, delete, retrieve
│   │   │   └── chat.py               # Streaming chat + history
│   │   │
│   │   └── rag/
│   │       ├── agent.py              # Main RAG orchestrator
│   │       ├── chunker.py            # Recursive text splitter
│   │       ├── embeddings.py         # SentenceTransformer wrapper
│   │       ├── vectorstore.py        # ChromaDB collection manager
│   │       ├── retriever.py          # Semantic search + reranking
│   │       └── prompts.py            # System & user prompt templates
│   │
│   ├── requirements.txt
│   └── .env                          # Local env (never committed)
│
├── frontend/                         # Next.js 16 App Router
│   └── src/
│       ├── app/
│       │   ├── layout.tsx            # Root layout + fonts
│       │   ├── page.tsx              # Landing / redirect
│       │   ├── login/                # Auth pages
│       │   ├── register/
│       │   └── dashboard/            # Main app page
│       │
│       ├── components/
│       │   ├── chat/
│       │   │   ├── ChatPanel.tsx     # Chat UI + SSE streaming
│       │   │   ├── MessageBubble.tsx # User / assistant message
│       │   │   └── SourceCard.tsx    # Citation cards
│       │   ├── document/             # Upload + sidebar components
│       │   └── layout/               # Navbar, sidebar shell
│       │
│       └── lib/
│           └── api.ts                # Typed API client + SSE stream helper
│
├── .github/
│   ├── workflows/
│   │   ├── ci.yml                    # CI — runs on dev branch only
│   │   ├── deploy.yml                # Docker build — main branch only
│   │   └── devsecops.yml             # Security scans — main branch only
│   ├── ISSUE_TEMPLATE/               # Bug report & feature request forms
│   ├── pull_request_template.md      # PR checklist
│   └── CODEOWNERS                    # Auto-review assignment
│
├── Dockerfile                        # Multi-stage: Node build → Python serve
├── docker-compose.yml                # Local Docker stack
├── CONTRIBUTING.md                   # contributor guide
└── .env.example                      # Template for environment variables

🚀 Getting Started

Prerequisites

Python 3.11+
Node.js 20+
HuggingFace account (free) for LLM inference

1. Clone the Repository

git clone https://github.com/param20h/PDF-Assistant-RAG.git
cd PDF-Assistant-RAG

2. Configure Environment

cp .env.example backend/.env

Edit backend/.env:

SECRET_KEY=your-strong-random-secret
DATABASE_URL=sqlite:///./data/app.db
HF_TOKEN=hf_your_huggingface_token_here
UPLOAD_DIR=./data/uploads
CHROMA_PERSIST_DIR=./data/chroma_db

Get your free HuggingFace token at huggingface.co/settings/tokens

3. Run Locally

Open two terminals:

# Terminal A — Backend
cd backend
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# → API running at http://localhost:8000
# → Swagger docs at http://localhost:8000/docs

# Terminal B — Frontend
cd frontend
npm install
npm run dev
# → App running at http://localhost:3000

4. Run with Docker

docker compose up --build
# → Full stack at http://localhost:7860

🧠 RAG Pipeline

                    ┌─────────────────────────────────────────────┐
                    │              PDF / DOCX Upload               │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │         PyMuPDF / python-docx Parser         │
                    │         (text extraction per page)           │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │      Recursive Character Text Splitter       │
                    │   chunk_size=1000  |  overlap=200            │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │    all-MiniLM-L6-v2  (local embeddings)      │
                    │    384-dim dense vectors                      │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │   ChromaDB  — per-user persistent collection │
                    └─────────────────────────────────────────────┘

                              ── At Query Time ──

  User Question ──▶ Embed ──▶ Semantic Search (Top-K=10)
                                        │
                                        ▼
                         Cross-Encoder Reranker (Top-K=5)
                         ms-marco-MiniLM-L-6-v2
                                        │
                                        ▼
                    Prompt Assembly (system + context + question)
                                        │
                                        ▼
                    Qwen2.5-72B-Instruct (HF Inference API)
                                        │
                                        ▼
                    Streamed SSE tokens ──▶ Frontend ChatPanel

📡 API Reference

Method	Endpoint	Auth	Description
`POST`	`/api/v1/auth/register`	❌	Create a new user account
`POST`	`/api/v1/auth/login`	❌	Login and receive JWT token
`GET`	`/api/v1/auth/me`	✅	Get current user profile
`POST`	`/api/v1/documents/upload`	✅	Upload PDF/DOCX and enqueue background indexing (`202 Accepted`)
`GET`	`/api/v1/documents`	✅	List all documents for current user
`GET`	`/api/v1/documents/{id}/status`	✅	Poll background document processing status
`DELETE`	`/api/v1/documents/{id}`	✅	Delete a document and its vector data
`POST`	`/api/v1/chat/ask/stream`	✅	Ask a question (SSE streaming response)
`GET`	`/api/v1/chat/history/{doc_id}`	✅	Get chat history for a document
`DELETE`	`/api/v1/chat/history/{doc_id}`	✅	Clear chat history for a document
`GET`	`/health`	❌	Health check (db + chroma status)

Full interactive docs available at /docs (Swagger UI) when running locally.

📦 Environment Variables

Variable	Required	Default	Description	Where to Get It
`SECRET_KEY`	✅	—	JWT signing & session secret. Use a strong random string.	Generate: `python -c "import secrets; print(secrets.token_urlsafe(32))"`
`HF_TOKEN`	✅	—	HuggingFace API token for LLM inference via Inference API.	huggingface.co/settings/tokens (free)
`ENVIRONMENT`	❌	`development`	Runtime mode. Set to `production` for deployment to lock CORS.	—
`DEBUG`	❌	`False`	Enable debug mode with detailed error pages. Never enable in production.	—
`ALLOWED_ORIGINS`	❌	`http://localhost:3000,http://localhost:7860`	Comma-separated CORS origins (only enforced in production).	Your deployed domain(s)
`DATABASE_URL`	❌	`sqlite:///./data/app.db`	SQLAlchemy database connection string.	SQLite (default), or your Postgres/MySQL connection string
`JWT_ALGORITHM`	❌	`HS256`	JWT signing algorithm.	—
`JWT_EXPIRY_HOURS`	❌	`72`	JWT token lifetime in hours before re-login is required.	—
`GOOGLE_CLIENT_ID`	❌	—	Google OAuth web client ID used by FastAPI to verify ID tokens.	Google Cloud Console
`NEXT_PUBLIC_GOOGLE_CLIENT_ID`	❌	—	Google OAuth web client ID exposed to the Next.js Google sign-in button.	Google Cloud Console
`UPLOAD_DIR`	❌	`./data/uploads`	Local directory for storing uploaded documents.	—
`MAX_FILE_SIZE_MB`	❌	`50`	Maximum allowed upload file size in MB.	—
`ALLOWED_EXTENSIONS`	❌	`pdf,docx,txt,md`	Comma-separated list of permitted file extensions.	—
`CHROMA_PERSIST_DIR`	❌	`./data/chroma_db`	Directory where ChromaDB persists its vector index.	—
`LLM_MODEL`	❌	`Qwen/Qwen2.5-72B-Instruct`	HuggingFace model ID for answer generation.	huggingface.co/models
`LLM_TEMPERATURE`	❌	`0.3`	LLM sampling temperature (0 = deterministic, 1 = creative).	—
`LLM_MAX_NEW_TOKENS`	❌	`1024`	Maximum tokens per LLM response.	—
`EMBEDDING_MODEL`	❌	`sentence-transformers/all-MiniLM-L6-v2`	SentenceTransformer model for local embeddings (no external API).	huggingface.co/sentence-transformers
`EMBEDDING_DIMENSION`	❌	`384`	Embedding vector dimension (must match the model).	—
`RERANKER_MODEL`	❌	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Cross-encoder model for reranking retrieved chunks by relevance.	huggingface.co/cross-encoder
`CHUNK_SIZE`	❌	`1000`	Characters per document chunk. Larger = more context, smaller = better precision.	—
`CHUNK_OVERLAP`	❌	`200`	Overlap between consecutive chunks to maintain boundary context.	—
`TOP_K_RETRIEVAL`	❌	`10`	Candidate chunks retrieved from vector store during semantic search.	—
`TOP_K_RERANK`	❌	`5`	Final chunks passed to the LLM after reranking (must be ≤ `TOP_K_RETRIEVAL`).	—

📜 Scripts

Backend (`backend/`)

Command	Description
`uvicorn app.main:app --reload`	Start FastAPI with hot reload
`uvicorn app.main:app --port 8000`	Start FastAPI on port 8000

Frontend (`frontend/`)

Command	Description
`npm run dev`	Start Next.js dev server
`npm run build`	Production build → `out/` (static export)
`npm run lint`	Run ESLint
`npm run test:e2e`	Run Playwright end-to-end tests

Docker

Command	Description
`docker compose up --build`	Build and start the full stack
`docker compose down`	Stop all containers

🌐 Deployment

This project is deployed on HuggingFace Spaces using Docker.

HuggingFace Spaces

Fork this repo and create a new Space at huggingface.co/new-space (SDK: Docker)
Set the following Space secrets:
- HF_TOKEN — your HuggingFace API token
- SECRET_KEY — a strong random string
Push to the hf remote — the Space will auto-build

git remote add hf https://<username>:<HF_TOKEN>@huggingface.co/spaces/<username>/<space-name>
git push hf main

Self-Hosted / VPS

docker compose up -d --build
# App available at http://your-server:7860

🤝 Contributing

This project is participating in GirlScript Summer of Code! We welcome contributors of all skill levels.

Branch Strategy:

Branch	Purpose
`main`	Production — HuggingFace deployed (admin only)
`dev`	All contributor PRs target here
`feature/` / `fix/` / `docs/*`	Your working branches

# Always branch from dev
git checkout -b feature/my-feature upstream/dev

Quick links:

📄 License

Distributed under the MIT License. See LICENSE for more information.

Built with 💙 as a flagship AI engineering project

If you found this project helpful, please give it a ⭐ — it helps contributors discover it!

⬆ Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
backend		backend
bots/discord		bots/discord
docs		docs
frontend		frontend
instance		instance
static		static
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.MD		CHANGELOG.MD
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
app.py		app.py
config.py		config.py
docker-compose.yml		docker-compose.yml
license		license
make_admin.py		make_admin.py
models.py		models.py
render.yaml		render.yaml
requirements.txt		requirements.txt
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

Enterprise Agentic Retrieval-Augmented Generation System

🤝 Contributors

📋 Contributions

🌟 Overview

🏗️ Architecture

🔄 System Flow Overview

🛠 Tech Stack

Backend

Frontend

AI / ML Pipeline

DevOps & Tooling

✨ Key Features

👤 Users

🤖 RAG Pipeline

⚙️ Engineering

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Configure Environment

3. Run Locally

4. Run with Docker

🧠 RAG Pipeline

📡 API Reference

📦 Environment Variables

📜 Scripts

Backend (backend/)

Frontend (frontend/)

Docker

🌐 Deployment

HuggingFace Spaces

Self-Hosted / VPS

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/`)

Frontend (`frontend/`)

Packages