MasterRAG — Multimodal RAG with Gemini & YOLO

A production-ready pipeline for extracting and retrieving information from complex documents containing text, tables, images, and formulas. Uses YOLO for document-layout detection, Camelot for table parsing, and Gemini for semantic understanding.

Features

"Eyes & Brain" Approach — Computer Vision (YOLO) + Multimodal LLMs (Gemini).
Hybrid Retrieval — Vector search + BM25 keyword search fused with Reciprocal Rank Fusion (RRF).
Agentic RAG — LangGraph-based agent with query optimization and optional fact-checking via Tavily.
Visual Understanding — Extracts and captions charts, diagrams, formulas, and tables.
Auth & Multi-tenancy — JWT-based auth with per-user document isolation.
Fully Configurable — All models, keys, and settings controlled via .env.

├── config.py           # Central configuration (reads .env)
├── db.py               # MongoDB vector service (handles indexing & vector/BM25 search)
├── embedding.py        # Embedding generation script (using Gemini models)
├── ingest.py           # PDF extraction pipeline (YOLO layout detection + Gemini vision)
├── retriever.py        # LangGraph RAG agent (query logic & fact checking)
├── server.py           # FastAPI backend server (auth, upload, query endpoints)
├── main.py             # CLI entry point for local operations (ingest, query)
├── diagnose_rag.py     # Diagnostic tool for testing vector/BM25/hybrid search manually
├── ui.py               # Streamlit chat UI frontend
├── prompts.py          # LLM prompt templates for analysis, extraction, and generation
├── logger.py           # Colored CLI logging setup
├── .env.example        # Template for environment variables (.env should be created locally)
├── static/             # Frontend assets (HTML/CSS/JS for the dashboard)
├── uploads/            # Temporary directory where uploaded PDFs are stored before ingestion
├── models/             # Directory for downloaded local weights (e.g., YOLO model)
├── data/               # Local data repository (used for large sample PDFs not pushed to Git)
└── output/             # Stores intermediate ingestion artifacts (useful for debugging!)

🔎 Verifying Extraction Quality in the `output/` Folder

Whenever a new document is ingested (e.g., document.pdf), the system automatically saves local artifacts inside the output/document/ directory. You can use these files to verify whether the pipeline extracted data properly:

output/{filename}/full_text.txt: This shows the complete extracted text representation of the document before chunking. You can see how tables, figures, and formulas have been represented or captioned by Gemini.
output/{filename}/chunks.json: This allows you to inspect the exact chunks logic, showing the size, overlap, and embedding representation of what will be stored in MongoDB.
output/{filename}/yolo/page_{num}.png: For every page processed, you will find an image here containing the exact bounding boxes drawn over the document by the YOLO model (highlighting tables, figures, headings, etc.). This is extremely useful to visually verify if the YOLO layout detection is working accurately.

Quick Start

1. Install dependencies

python -m venv .venv
.venv\Scripts\activate      # Windows
pip install -r requirements.txt

Note: You also need Poppler for PDF-to-image conversion.

2. Configure environment

cp .env.example .env

Edit .env with your keys:

GOOGLE_API_KEY=your_gemini_key
TAVILY_API_KEY=your_tavily_key
MONGO_URI=your_mongodb_uri
DB_NAME=MasterRAG
GEMINI_MODEL=models/gemini-2.5-flash
EMBEDDING_MODEL=models/gemini-embedding-001
JWT_SECRET=your_secret

3. Run the API server

python server.py

Server starts at http://localhost:8000.

4. UI

UI starts at http://localhost:8000.

CLI Usage

# Ingest a PDF
python main.py ingest path/to/document.pdf --user-id user123

# Query
python main.py query "What is the revenue growth?" --user-id user123

API Endpoints

Method	Endpoint	Auth	Description
POST	`/api/register`	No	Create account
POST	`/api/login`	No	Get JWT token
POST	`/api/documents/upload`	Bearer	Upload & ingest PDF
GET	`/api/documents`	Bearer	List user's documents
POST	`/api/query`	Bearer	Ask a question (RAG)
GET	`/api/history`	Bearer	Get query history

🧠 Extraction Pipeline Deep Dive

For a comprehensive technical deep dive into how our multimodal extraction pipeline works (incorporating YOLO11, Gemini Vision, and LangGraph), please refer to:

👉 extraction_pipeline_deep_dive.md

📸 Screenshots

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MasterRAG — Multimodal RAG with Gemini & YOLO

Features

🔎 Verifying Extraction Quality in the `output/` Folder

Quick Start

1. Install dependencies

2. Configure environment

3. Run the API server

4. UI

CLI Usage

API Endpoints

🧠 Extraction Pipeline Deep Dive

📸 Screenshots

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
masterRag-master		masterRag-master
screenshorts		screenshorts
static		static
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
db.py		db.py
diagnose_rag.py		diagnose_rag.py
embedding.py		embedding.py
extraction_pipeline_deep_dive.md		extraction_pipeline_deep_dive.md
ingest.py		ingest.py
logger.py		logger.py
main.py		main.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retriever.py		retriever.py
server.py		server.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MasterRAG — Multimodal RAG with Gemini & YOLO

Features

🔎 Verifying Extraction Quality in the output/ Folder

Quick Start

1. Install dependencies

2. Configure environment

3. Run the API server

4. UI

CLI Usage

API Endpoints

🧠 Extraction Pipeline Deep Dive

📸 Screenshots

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔎 Verifying Extraction Quality in the `output/` Folder

Packages