A production-ready Enterprise RAG (Retrieval-Augmented Generation) system that enables organizations to efficiently query and interact with their document collections through intelligent search and conversational AI capabilities.
This Enterprise RAG System provides a comprehensive solution for document management, semantic search, and AI-powered question answering. Built with modern microservices architecture, it offers scalable document processing, hybrid search capabilities, and an intuitive chat interface for interacting with your knowledge base.
Demo: https://huggingface.co/spaces/Phoenixak99/rag-demo
- Multi-format Document Processing - PDF, DOCX, TXT, CSV support with robust text extraction
- Hybrid Search Engine - Semantic + keyword search with cross-encoder re-ranking
- Conversational AI Interface - Chat-based document interaction with context management
- Real-time Document Management - Upload, process, and search documents instantly
- Advanced Text Processing - Smart chunking with overlap and deduplication
- Production-Ready Architecture - Microservices with clear separation of concerns
- JWT Authentication - Role-based access control with secure token management
- Vector Database - ChromaDB integration with persistent storage
- Caching Layer - Redis-based response and embedding caching
- Health Monitoring - Comprehensive health checks and observability
- Docker Deployment - Container-based deployment with orchestration
- Async Processing - Non-blocking document processing with background queues
- Batch Operations - Efficient embedding generation and search operations
- Error Recovery - Graceful fallback mechanisms and comprehensive error handling
- API Documentation - Auto-generated OpenAPI specifications
- Development Tools - Integrated development environment with debugging support
The system follows a microservices architecture with the following components:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Streamlit UI β β FastAPI β β ChromaDB β
β (Port 8501) βββββΊβ Backend βββββΊβ Vector Store β
β β β (Port 8000) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β βΌ β
β βββββββββββββββββββ β
β β Redis β β
ββββββββββββββββ€ Cache ββββββββββββββββ
β β
βββββββββββββββββββ
- π¨ Frontend Service - Streamlit-based user interface with chat, document management, and admin panels
- β‘ API Gateway - FastAPI backend with authentication, rate limiting, and comprehensive middleware
- ποΈ Document Processing - Multi-format text extraction, chunking, and embedding generation
- π Search Engine - Hybrid semantic/keyword search with re-ranking capabilities
- πΎ Vector Store - ChromaDB for persistent vector storage and similarity search
- π Cache Layer - Redis for response caching and performance optimization
# Clone the repository
git clone https://github.com/phoenixak/enterprise-rag-system.git
cd enterprise-rag-system
# Install dependencies
pip install -r requirements.txt
# Set up environment
cp .env.example .env
# Start the complete system
python run.pyAccess the system:
- Frontend: http://localhost:8501
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Admin:
admin@example.com/admin123! - User:
user@example.com/password123!
- Python: 3.11 or higher
- Memory: Minimum 4GB RAM (8GB+ recommended for production)
- Storage: 2GB available space
- Network: Internet connection for initial model downloads
fastapi>=0.104.1
streamlit>=1.28.1
chromadb>=0.4.15
sentence-transformers>=2.2.2
redis>=5.0.1
python-multipart>=0.0.6
python-jose[cryptography]>=3.3.0
passlib[bcrypt]>=1.7.4-
Clone and Setup
git clone https://github.com/phoenixak/enterprise-rag-system.git cd enterprise-rag-system python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure Environment
cp .env.example .env # Edit .env with your configurations -
Initialize Services
python scripts/dev.py setup
# Start development environment with hot reload
python scripts/dev.py start
# Run backend only
python run.py backend
# Run frontend only
python run.py frontend# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down# Build production image
docker build -t enterprise-rag:latest .
# Run with environment variables
docker run -d \
-p 8000:8000 \
-p 8501:8501 \
-e DATABASE_URL="your-db-url" \
-e REDIS_URL="your-redis-url" \
enterprise-rag:latestCreate a .env file with the following configurations:
# Database Configuration
DATABASE_URL=postgresql://user:password@localhost:5432/rag_db
REDIS_URL=redis://localhost:6379/0
# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
FRONTEND_PORT=8501
# Document Processing
MAX_FILE_SIZE_MB=50
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# Vector Database
CHROMA_PERSIST_DIRECTORY=./chroma_db
COLLECTION_NAME=documents
# Caching
CACHE_TTL_SECONDS=3600
EMBEDDING_CACHE_SIZE=1000For production deployment, ensure these additional settings:
# Security
CORS_ORIGINS=["https://yourdomain.com"]
ALLOWED_HOSTS=["yourdomain.com"]
# Performance
WORKERS=4
MAX_CONNECTIONS=100
POOL_SIZE=20
# Monitoring
LOG_LEVEL=INFO
METRICS_ENABLED=true
HEALTH_CHECK_INTERVAL=30POST /api/v1/auth/login
POST /api/v1/auth/refresh
POST /api/v1/auth/logoutPOST /api/v1/documents/upload # Upload documents
GET /api/v1/documents # List documents
GET /api/v1/documents/{id} # Get document details
DELETE /api/v1/documents/{id} # Delete document
GET /api/v1/documents/{id}/chunks # Get document chunksPOST /api/v1/search/semantic # Semantic search
POST /api/v1/search/hybrid # Hybrid search
POST /api/v1/search/contextual # Contextual search
GET /api/v1/search/similar/{doc_id} # Find similar documentsPOST /api/v1/conversations # Start conversation
GET /api/v1/conversations/{id} # Get conversation
POST /api/v1/conversations/{id}/query # Send queryimport requests
# Upload a document
files = {"file": open("document.pdf", "rb")}
response = requests.post(
"http://localhost:8000/api/v1/documents/upload",
files=files,
headers={"Authorization": f"Bearer {token}"}
)
# Search documents
search_query = {
"query": "artificial intelligence",
"limit": 10,
"similarity_threshold": 0.7
}
response = requests.post(
"http://localhost:8000/api/v1/search/semantic",
json=search_query,
headers={"Authorization": f"Bearer {token}"}
)# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test categories
pytest tests/test_api.py # API tests
pytest tests/test_documents.py # Document processing tests
pytest tests/test_search.py # Search functionality tests
pytest tests/test_integration.py # Integration tests- Unit Tests - Individual component testing
- Integration Tests - Service interaction testing
- API Tests - Endpoint validation and authentication
- Performance Tests - Load testing and benchmarking
- End-to-End Tests - Complete workflow validation
# Start test environment
python scripts/dev.py test
# Run specific test file
pytest tests/test_search.py -v
# Run tests with debugging
pytest tests/test_api.py -s --pdbPostgreSQL (recommended for production):
CREATE DATABASE rag_system;
CREATE USER rag_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE rag_system TO rag_user;Redis configuration:
maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
-
Enable Caching
REDIS_URL=redis://production-redis:6379/0 CACHE_TTL_SECONDS=3600
-
Configure Workers
uvicorn src.main:app --workers 4 --host 0.0.0.0 --port 8000
-
Database Connection Pooling
DATABASE_POOL_SIZE=20 DATABASE_MAX_OVERFLOW=30
# Enable structured logging
LOG_FORMAT=json
LOG_LEVEL=INFO
# Health check endpoints
GET /health # Basic health status
GET /health/detailed # Detailed component status
GET /health/ready # Kubernetes readiness probe
GET /health/live # Kubernetes liveness probe- JWT Token-based Authentication with configurable expiration
- Role-based Access Control (Admin, User roles)
- Secure Password Hashing using bcrypt
- Token Refresh Mechanism for extended sessions
- Input Validation using Pydantic models
- SQL Injection Protection via SQLAlchemy ORM
- File Upload Security with type validation and size limits
- CORS Configuration for cross-origin request control
- Rate Limiting to prevent abuse
- Request Size Limits for file uploads
- Security Headers (HSTS, CSP, X-Frame-Options)
- Error Handling without information disclosure
- Change default JWT secret key
- Configure HTTPS/TLS certificates
- Set up database access controls
- Enable audit logging
- Configure firewall rules
- Set up monitoring and alerting
| Operation | Performance | Notes |
|---|---|---|
| Document Upload | ~2MB/s | PDF processing with OCR |
| Text Extraction | ~50 pages/s | Standard PDF documents |
| Embedding Generation | ~1000 chunks/min | sentence-transformers model |
| Semantic Search | <100ms | ChromaDB vector similarity |
| Hybrid Search | <200ms | Combined semantic + keyword |
-
Embedding Caching
EMBEDDING_CACHE_SIZE=1000 # Cache frequently used embeddings CACHE_TTL_SECONDS=3600 # 1-hour cache TTL
-
Batch Processing
BATCH_SIZE=32 # Optimal embedding batch size MAX_CONCURRENT_UPLOADS=5 # Limit concurrent processing
-
Vector Database Tuning
CHROMA_COLLECTION_METADATA={ "hnsw:space": "cosine", "hnsw:M": 16, "hnsw:ef_construction": 200 }
enterprise-rag-system/
βββ src/ # Core application code
β βββ api/ # FastAPI endpoints
β βββ auth/ # Authentication system
β βββ documents/ # Document processing
β βββ search/ # Search functionality
β βββ vector_store/ # Vector database integration
βββ frontend/ # Streamlit UI components
βββ tests/ # Test suite
βββ scripts/ # Development scripts
βββ docs/ # Documentation
βββ docker-compose.yml # Development environment
-
Start Development Environment
python scripts/dev.py start
-
Code Quality Tools
black src/ # Code formatting isort src/ # Import sorting flake8 src/ # Linting mypy src/ # Type checking
-
Testing
pytest tests/ -v # Run tests pytest --cov=src/ # Coverage report
python scripts/dev.py setup # Initialize development environment
python scripts/dev.py start # Start all services
python scripts/dev.py test # Run test suite
python scripts/dev.py clean # Clean temporary files
python scripts/dev.py docs # Generate documentationThe system provides comprehensive health monitoring:
# Basic health status
curl http://localhost:8000/health
# Detailed component status
curl http://localhost:8000/health/detailed
# Kubernetes probes
curl http://localhost:8000/health/ready # Readiness probe
curl http://localhost:8000/health/live # Liveness probe- Structured Logging with JSON format for production
- Performance Metrics for response times and throughput
- Error Tracking with detailed stack traces
- Resource Monitoring for memory and CPU usage
# Example Prometheus configuration
scrape_configs:
- job_name: 'enterprise-rag'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
scrape_interval: 30sWe welcome contributions to the Enterprise RAG System! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Install development dependencies:
pip install -r requirements-dev.txt - Make your changes with tests
- Run the test suite:
pytest - Submit a pull request
- Follow PEP 8 style guidelines
- Add type hints for all functions
- Write comprehensive tests for new features
- Update documentation for API changes
- Use meaningful commit messages
- Ensure all tests pass
- Update README.md if needed
- Add your changes to CHANGELOG.md
- Request review from maintainers
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
This is a proof of concept project designed to demonstrate enterprise RAG system capabilities and modern software architecture patterns. While the system includes production-ready features such as:
- Comprehensive authentication and authorization
- Scalable microservices architecture
- Enterprise-grade security measures
- Production deployment configurations
- Extensive testing and monitoring
For production use, please consider:
- Conducting thorough security audits
- Performance testing with your specific data and load requirements
- Implementing additional monitoring and alerting systems
- Customizing the system for your specific enterprise requirements
- Ensuring compliance with your organization's data governance policies
The system serves as an excellent foundation for building production enterprise RAG solutions, but should be properly evaluated and customized for specific production environments.
π Repository: https://github.com/phoenixak/enterprise-rag-system
π§ Contact: For questions or support, please open an issue on GitHub.
β Star this repository if you find it useful for your projects!