Enterprise-grade RAG platform with multi-provider LLM support, document intelligence, and real-time streaming
- Overview
- Demo
- Features
- Architecture
- Tech Stack
- Installation
- Configuration
- API Documentation
- Usage Guide
- Development
- Contributing
- License
RAG Multi-Expert is a production-ready Retrieval-Augmented Generation platform that combines the power of vector databases, intelligent document processing, and multi-provider LLM integration. Built for enterprises and developers who need:
- Intelligent Document Search: ChromaDB-powered semantic search with adjustable retrieval parameters
- Multi-Provider LLM Support: Seamless integration with OpenAI, Claude, Gemini, Ollama, and 10+ providers
- Project-Based Organization: Isolate document collections and conversations by project
- Real-Time Streaming: Server-Sent Events for progressive responses
- Fine-Grained Control: Per-conversation temperature, top-k, and context window management
- Enterprise Security: JWT authentication, AES-256 encryption, GDPR compliance
| Feature | Traditional RAG | This Platform |
|---|---|---|
| Context Window | Fixed 4K-8K | Adaptive 8K-2M (auto-detects model) |
| Chunking | Static 512 tokens | Smart chunking with overlap (100-2000 tokens) |
| History Management | No truncation | Intelligent truncation with recency bias |
| Provider Support | Single provider | 13+ providers with fallback |
| Real-Time UX | Blocking requests | SSE streaming with auto-reconnect |
*Real-time streaming with source attribution and intelligent context management*
*Repo/Manual upload with automatic chunking and vectorization*
*Multi-project organization with statistics and document management*
*Graphical configuration for 13+ LLM providers*
- Supported Formats: PDF, DOCX, TXT, MD, HTML, CSV, JSON (50+ file types)
- Smart Chunking: Adaptive chunk size (100-2000 tokens) with configurable overlap
- Metadata Extraction: Automatic filename, page number, and document type tagging
- Token Tracking: Real-time token counting for cost estimation
- Batch Processing: Background async processing with progress tracking
- Vector Database: ChromaDB with HNSW indexing
- Embedding Models: sentence-transformers/all-MiniLM-L6-v2 (default), OpenAI embeddings
- Adjustable top-k: Dynamic retrieval (1-20 chunks) based on model context
- Distance Scoring: Cosine similarity with configurable threshold
- Metadata Filtering: Filter by document type, date, or custom tags
| Provider | Models | Context Window | Streaming | Temperature |
|---|---|---|---|---|
| OpenAI | GPT-4, GPT-4-turbo, o1-preview | 8K-128K | ✅ | 0.0-2.0 |
| Anthropic | Claude 3.5 Sonnet, Opus, Haiku | 200K | ✅ | 0.0-1.0 |
| Gemini 1.5 Pro/Flash, 2.0 | 2M | ✅ | 0.0-2.0 | |
| Ollama | Llama 3.1, Mistral, Phi-3 | 8K-128K | ✅ | 0.0-2.0 |
| Groq | Llama 3, Mixtral | 32K | ✅ | 0.0-2.0 |
| OpenRouter | 200+ models | Varies | ✅ | 0.0-2.0 |
| HuggingFace | Custom models | Varies | ✅ | 0.0-2.0 |
Features:
- ✅ Auto-detect model context limits
- ✅ Dynamic top-k adjustment (more chunks for large context models)
- ✅ Intelligent history truncation (keeps recent messages + summary)
- ✅ RAG context capped at 50% of total context
- ✅ Real-time token counting with overflow protection
- Multi-Conversation: Unlimited conversations per project
- History Storage: PostgreSQL with full conversation replay
- Smart Truncation: Keeps recent messages, summarizes older ones
- Export: JSON, Markdown, HTML formats
- Search: Full-text search across all conversations
- Temperature: Per-conversation creativity control (0.0-2.0)
- Top-k: Adjustable chunk retrieval (1-20)
- Chunk Size: Configurable per project (100-2000 tokens)
- Overlap: Smart overlap to preserve context (10-500 tokens)
- Real-time token usage tracking
- Per-provider cost estimation
- Document processing statistics
- Conversation metrics (latency, token count)
- JWT-based authentication
- AES-256 API key encryption
- CORS protection
- Rate limiting (60 req/min)
- GDPR-compliant data storage
- Framework: FastAPI 0.104+ (async, type-safe)
- Vector DB: ChromaDB 0.4+ (persistent, HNSW indexing)
- Database: PostgreSQL 15+ (production) / SQLite (dev)
- ORM: SQLAlchemy 2.0+ (async support)
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Document Processing: PyPDF2, python-docx, beautifulsoup4
- Auth: JWT (PyJWT), AES-256 (cryptography)
- Streaming: SSE (Server-Sent Events)
- Framework: React 18.2 (Vite)
- State Management: Zustand
- Routing: React Router v6
- UI: Tailwind CSS 3.4
- Icons: Lucide React
- HTTP: Axios with interceptors
- Markdown: react-markdown, react-syntax-highlighter
- Containerization: Docker + Docker Compose
- Reverse Proxy: Nginx (production)
- Deployment: Single
docker-compose up
# Required
- Docker 24.0+
- Docker Compose 2.20+
- 8GB RAM minimum (16GB recommended)
- 10GB disk space
# Optional (for local development)
- Python 3.11+
- Node.js 18+
- PostgreSQL 15+Helm Chart + Documentation
📚 Documentation Kubernetes officiel →
https://iwebbo.github.io/RAG.io/
# 1. Clone repository
git clone https://github.com/iwebbo/rag.io.git
cd rag.io
# 2. Copy config files
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
# 3. Generate secrets
python3 -c "import secrets; print(secrets.token_hex(32))" # SECRET_KEY
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" # ENCRYPTION_KEY
# 4. Edit backend/.env with your secrets
# 4.1 Edit backend/.env if (need to be run from VM/PROD Server)
# Change by your hostname.fqdn or IP
CORS_ORIGINS=http://localhost:5173,http://localhost:3000,http://localhost,http://localhost:80
# Change by your hostname.fqdn or IP
# 4.1 Edit frontend/.env if (need to be run from VM/PROD Server)
# Change by your hostname.fqdn or IP
VITE_API_URL=http://localhost:8000
# Will be solve in 1.1.0
cd backend/
mkdir -p /app/data/chromadb
chmod -R 777 /app/data
# 5. Start application
docker-compose up -d
# 6. Check status
docker-compose ps
# 4. Access application
# Frontend: http://localhost:3000
# Backend: http://localhost:8000
# API Docs: http://localhost:8000/docsFirst-time setup:
# Create first user
curl -X POST http://localhost:8000/api/auth/register \
-H "Content-Type: application/json" \
-d '{"username":"admin","email":"admin@example.com","password":"SecurePass123!"}'
# Login
curl -X POST http://localhost:8000/api/auth/login \
-d "username=admin&password=SecurePass123!"cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Setup database
python -c "from app.database import init_db; init_db()"
# Run development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Configure API URL
echo "VITE_API_URL=http://localhost:8000" > .env.local
# Run development server
npm run devAPI Docs: http://localhost:8000/docs# Via API
curl -X POST http://localhost:8000/api/projects/ \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{
"name": "Company Docs",
"description": "Internal documentation",
"chunk_size": 500,
"chunk_overlap": 50
}'Or via UI: Projects → New Project
# Via API
curl -X POST http://localhost:8000/api/documents/projects/{project_id}/upload \
-H "Authorization: Bearer {token}" \
-F "file=@employee_handbook.pdf"Or via UI: Project → Documents → Upload
Processing Status:
processing: Document is being chunked and vectorizedcompleted: Ready for RAG queriesfailed: Checkerror_messagefield
Check Projects → Statistics for:
- Document processing status
- Total chunks and tokens
- Conversation metrics
- Provider usage statistics
This project is licensed under the MIT License. See LICENSE file for details.
Built with ❤️ for Community
⭐ Star us on GitHub if you find this useful!