Skip to content

param20h/PDF-Assistant-RAG

Repository files navigation

title Document AI Analyst
emoji 🧠
colorFrom indigo
colorTo purple
sdk docker
app_port 7860
pinned true
license mit
short_description Enterprise Agentic RAG — upload PDFs and chat with AI

██████╗ ██████╗ ███████╗     █████╗ ███████╗███████╗██╗███████╗████████╗ █████╗ ███╗   ██╗████████╗
██╔══██╗██╔══██╗██╔════╝    ██╔══██╗██╔════╝██╔════╝██║██╔════╝╚══██╔══╝██╔══██╗████╗  ██║╚══██╔══╝
██████╔╝██║  ██║█████╗      ███████║███████╗███████╗██║███████╗   ██║   ███████║██╔██╗ ██║   ██║
██╔═══╝ ██║  ██║██╔══╝      ██╔══██║╚════██║╚════██║██║╚════██║   ██║   ██╔══██║██║╚██╗██║   ██║
██║     ██████╔╝██║         ██║  ██║███████║███████║██║███████║   ██║   ██║  ██║██║ ╚████║   ██║
╚═╝     ╚═════╝ ╚═╝         ╚═╝  ╚═╝╚══════╝╚══════╝╚═╝╚══════╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═══╝   ╚═╝
                                                                                                    
                        ██████╗  █████╗  ██████╗
                        ██╔══██╗██╔══██╗██╔════╝
                        ██████╔╝███████║██║  ███╗
                        ██╔══██╗██╔══██║██║   ██║
                        ██║  ██║██║  ██║╚██████╔╝
                        ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝

Enterprise Agentic Retrieval-Augmented Generation System


FastAPI Next.js Python LangChain ChromaDB HuggingFace Docker License: MIT


Upload · Embed · Retrieve · Chat — A production-grade AI document assistant built end-to-end with an agentic RAG pipeline, streaming responses, and per-user data isolation.


Features · Tech Stack · Getting Started · Architecture · RAG Pipeline · API Reference · Deployment · Contributing


🤝 Contributors

Thanks to all the amazing people who have contributed to PDF-Assistant-RAG! 🎉

📋 Contributions

Avatar Contributor Role Key Contributions
@param20h — Paramjit Singh 🧭 Project Lead Founded the project; core RAG architecture (FastAPI + ChromaDB + Next.js); Docker multi-stage & HuggingFace Spaces deployment; GitHub Actions CI/CD; JWT auth & Google OAuth; documentation with pipeline diagrams; project governance
@Yuvraj-Sarathe — Yuvraj Sarathe 📐 Docs & Build Added Mermaid architecture diagram; inline RAG pipeline comments; .env.example docs; created Makefile with concurrent dev commands & CHANGELOG.md
@SatyamPrakash09 — Satyam Prakash ⚙️ Backend Engineer Chat history export & auto-refresh auth; user profile endpoints; /health endpoint (Vector DB + SQL DB monitoring); document pagination; MIME file validation; JWT access + refresh token system
@akmhatey-ai 🎨 UI/UX Chat textarea auto-resize; clear messages on document switch
@drishtisharma14052007-eng — Drishti Sharma 📝 Documentation GSSOC contributor FAQ content
@Pika-pika06 — Pika 📝 Documentation Changelog tracking historical commits through v0.4.0
@blinkerbit / @algojogacor — Arya Rizky 🐛 Bug Buster Fixed 0-indexed page number display in source cards
@HirenGajjar 🔒 DevOps Restricted CORS origins via ALLOWED_ORIGINS environment variable
@Kaustub26Pvgda — Kaustub Pavagada Frontend Engineer Copy LLM response capability; increased max file upload size to 20 MB
@akshy-yy — Akshaya 🎨 Frontend Engineer Typing indicator animation while AI responds
@GHX5T-SOL — Bruce Wayne 🐛 Bug Buster Backend offline error message display
@viswanatha 📊 Observability Health check endpoint

🌟 Want to join them? Check out CONTRIBUTING.md for contribution guidelines and look for good first issues to get started!



🌟 Overview

PDF-Assistant-RAG is a complete, production-ready AI document assistant that lets users upload complex PDFs, financial reports, legal contracts, and research papers — then chat with an AI that provides accurate, cited answers powered by a multi-stage Retrieval-Augmented Generation pipeline.

The system uses semantic search + cross-encoder reranking to find the most relevant document chunks, streams AI-generated answers token-by-token, and highlights exact source citations with page numbers — all inside a sleek Next.js UI with JWT-secured per-user data isolation.


🏗️ Architecture

graph TD
    subgraph Frontend["Frontend (Next.js 16)"]
        UI["Dashboard UI (React)"]
        Chat["Chat Panel (SSE)"]
        Viewer["PDF Viewer (iframe)"]
    end

    subgraph Backend["Backend (FastAPI 0.115+)"]
        API["API Router (/api/v1)"]
        Auth["Auth (JWT/bcrypt)"]
        DB[(SQLite Metadata)]

        subgraph RAG["RAG Pipeline"]
            Upload["Ingestion Task (Chunking)"]
            Embed["Local Embeddings (all-MiniLM-L6-v2)"]
            Retriever["Two-Stage Retriever"]
            Rerank["Cross-Encoder Reranker"]
            Agent["Agent/Generator"]
        end
    end

    subgraph Storage["Vector Storage"]
        Chroma[(ChromaDB)]
    end

    subgraph External["External Services"]
        HF["HuggingFace Inference API (Qwen 72B)"]
    end

    %% Frontend to Backend Connections
    UI <-->|REST / Auth| API
    Chat <-->|SSE Streaming| API
    Viewer -->|Fetch PDF| API

    %% Backend Internals
    API <--> Auth
    API <--> DB
    API --> Upload
    API <--> Retriever
    API <--> Agent

    %% RAG Ingestion Flow
    Upload --> Embed
    Embed -->|Store Vectors| Chroma

    %% RAG Query Flow
    Retriever -->|1. Semantic Search| Chroma
    Retriever -->|2. Score & Sort| Rerank
    Retriever -->|Context| Agent

    %% External LLM Flow
    Agent <-->|LLM Generation| HF
Loading

🔄 System Flow Overview

  1. The user interacts with the Next.js frontend to upload documents and ask questions.
  2. FastAPI handles authentication, document ingestion, and chat APIs.
  3. Uploaded documents are parsed, chunked, and converted into vector embeddings.
  4. Embeddings are stored in ChromaDB for semantic retrieval.
  5. During querying, the retriever fetches relevant chunks from ChromaDB.
  6. A reranker improves retrieval quality before sending context to the LLM.
  7. Hugging Face Inference API generates the final response.
  8. Responses are streamed back to the frontend using SSE.

🛠 Tech Stack

Backend

Technology Purpose
FastAPI 0.115+ Async REST API framework
Python 3.11 Runtime environment
SQLite + SQLAlchemy User & document metadata storage
JWT + Passlib Authentication & authorization
LangChain RAG orchestration
ChromaDB Persistent vector store (per-user)
HuggingFace Hub LLM inference API

Frontend

Technology Purpose
Next.js 16 React framework (App Router)
Tailwind CSS v4 Utility-first styling
shadcn/ui Accessible component library
TypeScript Type-safe frontend
react-pdf In-browser PDF viewer
react-markdown + GFM Markdown-rendered AI responses

AI / ML Pipeline

Technology Purpose
all-MiniLM-L6-v2 Local sentence embeddings
ms-marco-MiniLM-L-6-v2 Cross-encoder reranker
Qwen2.5-72B-Instruct LLM (HuggingFace Inference API)
PyMuPDF + python-docx Document parsing

DevOps & Tooling

Technology Purpose
Docker Multi-Stage Containerized deployment
GitHub Actions CI pipeline (dev branch)
Git LFS Binary asset management
HuggingFace Spaces Production deployment

✨ Key Features

👤 Users

  • 🔐 JWT-secured register & login
  • 📄 Upload PDF and DOCX documents
  • 💬 Ask questions in natural language
  • 🌊 Streaming AI responses token-by-token
  • 📚 Inline source citations with page numbers
  • 🗂️ Per-user complete data isolation

🤖 RAG Pipeline

  • 🔪 Smart recursive text chunking (configurable size & overlap)
  • 🧠 Local embeddings — no data leaves your machine
  • 🔍 Two-stage retrieval — semantic search → cross-encoder rerank
  • ✂️ Top-K filtering for precision answers
  • 📝 Custom system prompts with citation instructions
  • 🧾 Source scoring with confidence levels

⚙️ Engineering

  • 🚀 Async FastAPI with Server-Sent Events streaming
  • 🗄️ ChromaDB with persistent per-user collections
  • 🐳 Multi-stage Docker build (Node → Python)
  • 🔄 GitHub Actions CI on dev branch
  • 🛡️ CORS, file validation, JWT expiry
  • 📊 Chat history persistence per document

📁 Project Structure

PDF-Assistant-RAG/
│
├── backend/                          # FastAPI + RAG server
│   ├── app/
│   │   ├── main.py                   # App entrypoint, middleware, static files
│   │   ├── config.py                 # Pydantic settings (env vars)
│   │   ├── database.py               # SQLAlchemy async engine
│   │   ├── models.py                 # ORM models (User, Document, Message)
│   │   ├── schemas.py                # Pydantic request/response schemas
│   │   ├── auth.py                   # JWT creation & verification
│   │   │
│   │   ├── routes/
│   │   │   ├── auth.py               # POST /register, /login, /me
│   │   │   ├── documents.py          # Upload, list, delete, retrieve
│   │   │   └── chat.py               # Streaming chat + history
│   │   │
│   │   └── rag/
│   │       ├── agent.py              # Main RAG orchestrator
│   │       ├── chunker.py            # Recursive text splitter
│   │       ├── embeddings.py         # SentenceTransformer wrapper
│   │       ├── vectorstore.py        # ChromaDB collection manager
│   │       ├── retriever.py          # Semantic search + reranking
│   │       └── prompts.py            # System & user prompt templates
│   │
│   ├── requirements.txt
│   └── .env                          # Local env (never committed)
│
├── frontend/                         # Next.js 16 App Router
│   └── src/
│       ├── app/
│       │   ├── layout.tsx            # Root layout + fonts
│       │   ├── page.tsx              # Landing / redirect
│       │   ├── login/                # Auth pages
│       │   ├── register/
│       │   └── dashboard/            # Main app page
│       │
│       ├── components/
│       │   ├── chat/
│       │   │   ├── ChatPanel.tsx     # Chat UI + SSE streaming
│       │   │   ├── MessageBubble.tsx # User / assistant message
│       │   │   └── SourceCard.tsx    # Citation cards
│       │   ├── document/             # Upload + sidebar components
│       │   └── layout/               # Navbar, sidebar shell
│       │
│       └── lib/
│           └── api.ts                # Typed API client + SSE stream helper
│
├── .github/
│   ├── workflows/
│   │   ├── ci.yml                    # CI — runs on dev branch only
│   │   ├── deploy.yml                # Docker build — main branch only
│   │   └── devsecops.yml             # Security scans — main branch only
│   ├── ISSUE_TEMPLATE/               # Bug report & feature request forms
│   ├── pull_request_template.md      # PR checklist
│   └── CODEOWNERS                    # Auto-review assignment
│
├── Dockerfile                        # Multi-stage: Node build → Python serve
├── docker-compose.yml                # Local Docker stack
├── CONTRIBUTING.md                   # contributor guide
└── .env.example                      # Template for environment variables

🚀 Getting Started

Prerequisites

  • Python Python 3.11+
  • Node.js Node.js 20+
  • HuggingFace HuggingFace account (free) for LLM inference

1. Clone the Repository

git clone https://github.com/param20h/PDF-Assistant-RAG.git
cd PDF-Assistant-RAG

2. Configure Environment

cp .env.example backend/.env

Edit backend/.env:

SECRET_KEY=your-strong-random-secret
DATABASE_URL=sqlite:///./data/app.db
HF_TOKEN=hf_your_huggingface_token_here
UPLOAD_DIR=./data/uploads
CHROMA_PERSIST_DIR=./data/chroma_db

Get your free HuggingFace token at huggingface.co/settings/tokens

3. Run Locally

Open two terminals:

# Terminal A — Backend
cd backend
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# → API running at http://localhost:8000
# → Swagger docs at http://localhost:8000/docs
# Terminal B — Frontend
cd frontend
npm install
npm run dev
# → App running at http://localhost:3000

4. Run with Docker

docker compose up --build
# → Full stack at http://localhost:7860

🧠 RAG Pipeline

                    ┌─────────────────────────────────────────────┐
                    │              PDF / DOCX Upload               │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │         PyMuPDF / python-docx Parser         │
                    │         (text extraction per page)           │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │      Recursive Character Text Splitter       │
                    │   chunk_size=1000  |  overlap=200            │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │    all-MiniLM-L6-v2  (local embeddings)      │
                    │    384-dim dense vectors                      │
                    └───────────────────┬─────────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────────┐
                    │   ChromaDB  — per-user persistent collection │
                    └─────────────────────────────────────────────┘

                              ── At Query Time ──

  User Question ──▶ Embed ──▶ Semantic Search (Top-K=10)
                                        │
                                        ▼
                         Cross-Encoder Reranker (Top-K=5)
                         ms-marco-MiniLM-L-6-v2
                                        │
                                        ▼
                    Prompt Assembly (system + context + question)
                                        │
                                        ▼
                    Qwen2.5-72B-Instruct (HF Inference API)
                                        │
                                        ▼
                    Streamed SSE tokens ──▶ Frontend ChatPanel

📡 API Reference

Method Endpoint Auth Description
POST /api/v1/auth/register Create a new user account
POST /api/v1/auth/login Login and receive JWT token
GET /api/v1/auth/me Get current user profile
POST /api/v1/documents/upload Upload PDF/DOCX and enqueue background indexing (202 Accepted)
GET /api/v1/documents List all documents for current user
GET /api/v1/documents/{id}/status Poll background document processing status
DELETE /api/v1/documents/{id} Delete a document and its vector data
POST /api/v1/chat/ask/stream Ask a question (SSE streaming response)
GET /api/v1/chat/history/{doc_id} Get chat history for a document
DELETE /api/v1/chat/history/{doc_id} Clear chat history for a document
GET /health Health check (db + chroma status)

Full interactive docs available at /docs (Swagger UI) when running locally.


📦 Environment Variables

Variable Required Default Description Where to Get It
SECRET_KEY JWT signing & session secret. Use a strong random string. Generate: python -c "import secrets; print(secrets.token_urlsafe(32))"
HF_TOKEN HuggingFace API token for LLM inference via Inference API. huggingface.co/settings/tokens (free)
ENVIRONMENT development Runtime mode. Set to production for deployment to lock CORS.
DEBUG False Enable debug mode with detailed error pages. Never enable in production.
ALLOWED_ORIGINS http://localhost:3000,http://localhost:7860 Comma-separated CORS origins (only enforced in production). Your deployed domain(s)
DATABASE_URL sqlite:///./data/app.db SQLAlchemy database connection string. SQLite (default), or your Postgres/MySQL connection string
JWT_ALGORITHM HS256 JWT signing algorithm.
JWT_EXPIRY_HOURS 72 JWT token lifetime in hours before re-login is required.
GOOGLE_CLIENT_ID Google OAuth web client ID used by FastAPI to verify ID tokens. Google Cloud Console
NEXT_PUBLIC_GOOGLE_CLIENT_ID Google OAuth web client ID exposed to the Next.js Google sign-in button. Google Cloud Console
UPLOAD_DIR ./data/uploads Local directory for storing uploaded documents.
MAX_FILE_SIZE_MB 50 Maximum allowed upload file size in MB.
ALLOWED_EXTENSIONS pdf,docx,txt,md Comma-separated list of permitted file extensions.
CHROMA_PERSIST_DIR ./data/chroma_db Directory where ChromaDB persists its vector index.
LLM_MODEL Qwen/Qwen2.5-72B-Instruct HuggingFace model ID for answer generation. huggingface.co/models
LLM_TEMPERATURE 0.3 LLM sampling temperature (0 = deterministic, 1 = creative).
LLM_MAX_NEW_TOKENS 1024 Maximum tokens per LLM response.
EMBEDDING_MODEL sentence-transformers/all-MiniLM-L6-v2 SentenceTransformer model for local embeddings (no external API). huggingface.co/sentence-transformers
EMBEDDING_DIMENSION 384 Embedding vector dimension (must match the model).
RERANKER_MODEL cross-encoder/ms-marco-MiniLM-L-6-v2 Cross-encoder model for reranking retrieved chunks by relevance. huggingface.co/cross-encoder
CHUNK_SIZE 1000 Characters per document chunk. Larger = more context, smaller = better precision.
CHUNK_OVERLAP 200 Overlap between consecutive chunks to maintain boundary context.
TOP_K_RETRIEVAL 10 Candidate chunks retrieved from vector store during semantic search.
TOP_K_RERANK 5 Final chunks passed to the LLM after reranking (must be ≤ TOP_K_RETRIEVAL).

📜 Scripts

Backend (backend/)

Command Description
uvicorn app.main:app --reload Start FastAPI with hot reload
uvicorn app.main:app --port 8000 Start FastAPI on port 8000

Frontend (frontend/)

Command Description
npm run dev Start Next.js dev server
npm run build Production build → out/ (static export)
npm run lint Run ESLint
npm run test:e2e Run Playwright end-to-end tests

Docker

Command Description
docker compose up --build Build and start the full stack
docker compose down Stop all containers

🌐 Deployment

This project is deployed on HuggingFace Spaces using Docker.

HuggingFace Spaces

  1. Fork this repo and create a new Space at huggingface.co/new-space (SDK: Docker)
  2. Set the following Space secrets:
    • HF_TOKEN — your HuggingFace API token
    • SECRET_KEY — a strong random string
  3. Push to the hf remote — the Space will auto-build
git remote add hf https://<username>:<HF_TOKEN>@huggingface.co/spaces/<username>/<space-name>
git push hf main

Self-Hosted / VPS

docker compose up -d --build
# App available at http://your-server:7860

🤝 Contributing

This project is participating in GirlScript Summer of Code! We welcome contributors of all skill levels.

Branch Strategy:

Branch Purpose
main Production — HuggingFace deployed (admin only)
dev All contributor PRs target here
feature/* / fix/* / docs/* Your working branches
# Always branch from dev
git checkout -b feature/my-feature upstream/dev

Quick links:


📄 License

Distributed under the MIT License. See LICENSE for more information.



Built with 💙 as a flagship AI engineering project

If you found this project helpful, please give it a ⭐ — it helps contributors discover it!


FastAPI


⬆ Back to top

About

PDF-Assistant-RAG is a complete, production-ready AI document assistant that lets users upload complex PDFs, financial reports, legal contracts, and research papers — then chat with an AI that provides accurate, cited answers powered by a multi-stage Retrieval-Augmented Generation pipeline.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors