Skip to content

A FastAPI backend for document processing and question answering using Retrieval-Augmented Generation (RAG).

License

Notifications You must be signed in to change notification settings

heshinth/rag-query-system

Repository files navigation

RAG Query System

A FastAPI backend for document processing and question answering using Retrieval-Augmented Generation (RAG). Built for HackRx 6.0 Hackathon.

Overview

RAG Query System enables accurate question answering over documents using retrieval, query decomposition, and reranking techniques. It supports hybrid search, web search, and language translation tools to enhance LLM capabilities.

Features

  • Hybrid Search: Combines semantic and keyword search for improved accuracy.
  • Query Decomposition: Splits complex questions into simpler sub-queries for better retrieval.
  • Reranking: Selects the most relevant document chunks for precise answers.
  • Web Browser & Translation: Integrates web search and language translation tools for the LLM.

Tech Stack

  • FastAPI (backend API)
  • unstructured.io & PyMuPDF4LLM (document parsing and chunking)
  • PostgreSQL + SQLAlchemy (database via Supabase)
  • Pinecone (vector database, embedding, and reranking models; fully provided by GCP Marketplace)
  • Google Gemini (LLM)
  • Google Cloud Platform (hosting; using GCP Free Trial)

System Dependencies

The following system dependencies are installed via the Dockerfile:

  • curl
  • ca-certificates
  • libgl1
  • libmagic-dev
  • libglib2.0-0
  • poppler-utils
  • tesseract-ocr
  • tesseract-ocr-eng
  • tesseract-ocr-mal
  • libreoffice
  • wget
  • pandoc (installed from official release)

Prerequisites

  • Python 3.12
  • uv (dependency management)
  • Access to required API keys (see .env.example)

Quick Start

  1. Install uv: Follow the uv installation guide.

  2. Clone the repository:

    git clone https://github.com/heshinth/rag-query-system
    cd rag-query-system
  3. Install dependencies:

    uv sync
  4. Configure environment variables:

    • Copy .env.example to .env and fill in your credentials.
  5. Run the application:

    fastapi run ./app/main.py

API Endpoints

  • GET /api/v1/health — Health check
  • POST /api/v1/hackrx/run — Process documents and answer questions

LLM Models Used

Model Provider Purpose
Gemini 2.5 Flash Google AI Studio Main LLM
Gemini 2.0 Flash Lite Google AI Studio Query decomposition
Cohere Rerank 3.5 Pinecone Reranking
llama-text-embed-v2 Pinecone Dense embedding & semantic search
pinecone-sparse-english-v0 Pinecone Sparse embedding & lexical search

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A FastAPI backend for document processing and question answering using Retrieval-Augmented Generation (RAG).

Resources

License

Stars

Watchers

Forks