An intelligent document assistant with agentic retrieval, hybrid search, and multi-format support.
Built with FastAPI, React, PostgreSQL/pgvector, and LangChain.
Screenshots • Features • Quick Start • Docker • Architecture • Configuration • API • License
Agentic RAG is a production-ready Retrieval-Augmented Generation system that goes beyond simple document Q&A. It combines semantic vector search with SQL-based tabular analysis, using an agentic architecture that autonomously decides how to answer your questions.
Upload your documents — PDFs, Word files, spreadsheets, CSVs — and have a natural conversation with an AI that retrieves, analyzes, and cross-references your data intelligently.
- 🧠 Agentic Semantic Chunking — Instead of fixed-size chunks, an LLM analyzes your text and splits it at natural topic boundaries, preserving semantic coherence
- 🔀 Hybrid Search — Combines BM25 keyword matching with vector semantic search via Reciprocal Rank Fusion, delivering the best of both approaches
- 📊 Structured + Unstructured — Text documents are vectorized for semantic search; tabular data (CSV, Excel, JSON) is stored for SQL queries. The agent picks the right tool automatically
- 💬 Multi-Channel — Chat via the web UI, Telegram bot, or WhatsApp
- 🏭 Production-Ready — Automatic backups, health checks, audit logging, rate limiting, security headers, and async document processing
AI-powered conversational interface with source attribution and tool selection badges.
Upload, organize, and manage documents across multiple formats with chunk tracking.
Configure API keys, select models, auto-detect Ollama — all from the UI.
System analytics with real-time metrics, resource monitoring, and embedding coverage.
- Upload PDF, TXT, Word (.docx), Markdown, CSV, Excel (.xlsx), JSON
- Organize documents into collections
- Preview content, add names and notes
- Duplicate detection and file size limits (100MB)
- Background async processing queue with progress indicators
- Soft delete with recovery
- Semantic vector search via PostgreSQL pgvector
- BM25 keyword search for exact terms, acronyms, and technical jargon
- Hybrid Reciprocal Rank Fusion combining both approaches
- Re-ranking with Cohere API or local Cross-Encoder models
- Agentic chunking — LLM-based semantic splitting instead of fixed-size chunks
- Anti-hallucination guardrails — the system knows when it doesn't know
- ReAct Agent architecture with tool calling
- Automatic selection between RAG (text search) and SQL (tabular queries)
- Streaming responses via WebSocket
- Conversational context — query rewriting for follow-up questions
- Suggested follow-up questions after each response
- Response caching to reduce API costs
- Bilingual support (Italian & English)
- OpenAI — GPT-4o, GPT-4o-mini (default)
- OpenRouter — Access 100+ models (Llama, Mistral, Claude, etc.)
- Ollama — Run models locally with auto-detection of installed models
- All configurable from the Settings UI — no code changes needed
- Telegram Bot — Full document Q&A via Telegram, with document upload support
- WhatsApp Bot — Integration via Twilio API
- Ngrok tunneling — For webhook testing during local development
- Automatic daily backups with configurable retention
- Health checks —
/api/health,/api/ready,/api/live, embedding integrity - Audit logging for document operations
- Rate limiting via SlowAPI
- Security headers and request tracing
- File integrity verification scheduler
- Structured JSON logging
- Dashboard with system analytics
- Maintenance panel — re-embedding, soft delete management, integrity checks
- Feedback system — rate AI responses and individual chunks
- Export conversations and analysis results
- Notes — personal annotations on documents
| Layer | Technology |
|---|---|
| Backend | Python 3.11+, FastAPI, LangChain/LangGraph |
| Frontend | React 18, TypeScript 5, Tailwind CSS, Zustand, React Query |
| Database | PostgreSQL 16 with pgvector extension |
| LLM | OpenAI / OpenRouter / Ollama |
| Embeddings | OpenAI text-embedding-3-small / Ollama / OpenRouter |
| Re-ranking | Cohere / Cross-Encoder |
| Deployment | Docker Compose (4 services) |
| Messaging | Telegram Bot API, Twilio (WhatsApp) |
- Python 3.11+
- Node.js 18+
- PostgreSQL 15+ with pgvector extension
- OpenAI API key (or Ollama for local models)
git clone https://github.com/logfab-stack/agentic-rag.git
cd agentic-rag
./init.shThe setup script will:
- ✅ Check prerequisites (Python, Node, PostgreSQL, pgvector)
- ✅ Create Python virtual environment and install dependencies
- ✅ Install frontend dependencies
- ✅ Start both backend and frontend servers
Click to expand manual setup instructions
1. Database
# Create PostgreSQL database
createdb agentic_rag
# Enable pgvector extension
psql -d agentic_rag -c "CREATE EXTENSION IF NOT EXISTS vector;"2. Backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys and database URL
# Run database migrations
alembic upgrade head
# Start server
uvicorn main:app --reload --host 0.0.0.0 --port 80003. Frontend
cd frontend
npm install
npm run dev| Service | URL |
|---|---|
| 🌐 Web UI | http://localhost:3000 |
| 🔌 API | http://localhost:8000 |
| 📚 API Docs (Swagger) | http://localhost:8000/docs |
- Open Settings → enter your OpenAI API key
- Upload a document (PDF, Word, CSV, etc.)
- Start chatting — ask questions about your documents!
The recommended way to run in production:
# Copy and edit environment config
cp .env.docker.example .env
# Start all services
docker compose -f docker-compose.prod.yml up -d| Service | Port | Description |
|---|---|---|
| PostgreSQL | 5432 | Database with pgvector |
| Backend | 8000 | FastAPI application |
| Frontend | 3000 | React app served via Nginx |
| Ollama (optional) | 11434 | Local LLM inference |
# Include Ollama with GPU support
docker compose -f docker-compose.prod.yml --profile ollama up -d# HTTPS with SSL certificates
docker compose -f docker-compose.prod.yml -f docker-compose.ssl.yml up -dAll data is stored in Docker volumes:
postgres_data— Databasebackend_uploads— Uploaded documentsbackend_backups— Automatic backupsbackend_logs— Application logs
┌─────────────────────────────────────────────────────────────┐
│ React Frontend │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Chat │ │Documents │ │Dashboard │ │ Settings │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
└───────┼─────────────┼────────────┼──────────────┼───────────┘
│ REST API + WebSocket │ │
┌───────┼─────────────┼────────────┼──────────────┼───────────┐
│ ▼ ▼ ▼ ▼ │
│ FastAPI Backend │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ReAct Agent (LangChain) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ │ │
│ │ │ RAG Tool │ │ SQL Tool │ │ Chat Tool │ │ │
│ │ │ (text docs) │ │(tabular data)│ │ (general) │ │ │
│ │ └──────┬──────┘ └──────┬───────┘ └───────────┘ │ │
│ └─────────┼────────────────┼──────────────────────────┘ │
│ │ │ │
│ ┌─────────▼────────┐ ┌───▼──────────────┐ │
│ │ Hybrid Search │ │ SQL Generation │ │
│ │ Vector + BM25 │ │ (Pandas/SQL) │ │
│ │ + Re-ranking │ │ │ │
│ └─────────┬────────┘ └───┬──────────────┘ │
└────────────┼────────────────┼───────────────────────────────┘
│ │
┌────────▼────────────────▼────────┐
│ PostgreSQL + pgvector │
│ ┌────────────┐ ┌────────────┐ │
│ │ Vectors │ │ Structured │ │
│ │ (embeddings│ │ (rows, │ │
│ │ + chunks) │ │ metadata) │ │
│ └────────────┘ └────────────┘ │
└──────────────────────────────────┘
Document Upload
│
├── Text (PDF, DOCX, TXT, MD)
│ │
│ ▼
│ Agentic Semantic Chunking
│ │ (LLM detects topic changes)
│ ▼
│ Generate Embeddings
│ │
│ ▼
│ Store in pgvector
│
└── Tabular (CSV, XLSX, JSON)
│
▼
Parse with Pandas
│
▼
Extract Schema + Rows
│
▼
Store as JSONB in PostgreSQL
Copy the example and customize:
cp backend/.env.example backend/.envKey variables:
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql+asyncpg://postgres:postgres@localhost:5432/agentic_rag |
OPENAI_API_KEY |
OpenAI API key | (set via UI) |
OPENROUTER_API_KEY |
OpenRouter API key | (optional) |
COHERE_API_KEY |
Cohere re-ranking API key | (optional) |
OLLAMA_BASE_URL |
Ollama endpoint | http://localhost:11434 |
TELEGRAM_BOT_TOKEN |
Telegram bot token | (optional) |
BACKUP_ENABLED |
Enable automatic backups | true |
BACKUP_RETENTION_DAYS |
Days to keep backups | 30 |
💡 Most settings can be configured from the Settings UI — no need to edit files manually.
Configure via Settings UI or environment:
| Purpose | Options |
|---|---|
| Chat LLM | GPT-4o, GPT-4o-mini, Ollama models, OpenRouter models |
| Embeddings | OpenAI text-embedding-3-small, Ollama, OpenRouter |
| Re-ranking | Cohere, Cross-Encoder (local) |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Send message (streaming via WebSocket) |
GET |
/api/documents |
List all documents |
POST |
/api/documents/upload |
Upload and process document |
GET |
/api/documents/{id} |
Get document details |
DELETE |
/api/documents/{id} |
Delete document |
GET |
/api/collections |
List collections |
POST |
/api/collections |
Create collection |
GET |
/api/conversations |
List conversations |
GET |
/api/settings |
Get configuration |
PATCH |
/api/settings |
Update configuration |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Detailed health check |
GET |
/api/ready |
Startup readiness probe |
GET |
/api/live |
Liveness probe |
GET |
/api/embeddings/health-check |
Embedding service health |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/admin/maintenance/dashboard |
System dashboard |
POST |
/api/admin/maintenance/reembed |
Re-embed documents |
POST |
/api/backup |
Create manual backup |
Full API documentation available at /docs (Swagger UI) when running.
agentic-rag/
├── backend/
│ ├── api/ # REST API route handlers (17 modules)
│ ├── core/ # Config, database, middleware, errors
│ ├── models/ # Pydantic + SQLAlchemy models
│ ├── services/ # Business logic (23 services)
│ │ ├── ai_service.py # LLM orchestration, RAG pipeline
│ │ ├── agentic_splitter.py # Semantic chunking engine
│ │ ├── bm25_service.py # BM25 keyword search
│ │ ├── embedding_store.py # Vector storage & retrieval
│ │ ├── telegram_service.py # Telegram bot handler
│ │ └── ...
│ ├── alembic/ # Database migrations
│ ├── utils/ # Helper functions
│ ├── main.py # FastAPI app entry point
│ ├── Dockerfile # Backend container
│ └── requirements.txt # Python dependencies
├── frontend/
│ ├── src/
│ │ ├── components/ # React components (31 files)
│ │ ├── hooks/ # Custom React hooks
│ │ ├── services/ # API client functions
│ │ ├── types/ # TypeScript interfaces
│ │ └── App.tsx # Root component & router
│ ├── nginx.conf # Reverse proxy config
│ ├── Dockerfile # Frontend container
│ └── package.json # Node dependencies
├── prompts/ # AI system prompts
├── docker-compose.prod.yml # Production deployment
├── docker-compose.ssl.yml # SSL/TLS overlay
├── .env.docker.example # Docker env template
├── init.sh # One-command local setup
└── README.md
cd backend
source venv/bin/activate
pytestcd backend
alembic upgrade head # Apply all migrations
alembic revision --autogenerate -m "description" # Create new migration- Add parser in
backend/services/ - Register in the ingestion pipeline (
backend/api/documents.py) - Update accepted MIME types in frontend upload component
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- LangChain — LLM framework
- pgvector — Vector similarity search for PostgreSQL
- FastAPI — Modern Python web framework
- OpenWebUI — UI inspiration
This project is licensed under the MIT License — see the LICENSE file for details.



