Production-Ready RAG Pipeline with Event-Driven Architecture, Two-Stage Chunking, and ChromaDB Integration
A production-ready, event-driven RAG processing system that combines automated file monitoring, Prefect workflow orchestration, two-stage semantic chunking with AI boundary refinement, embeddings generation, and ChromaDB vector storage for intelligent document processing and retrieval at enterprise scale.
- ๐ฏ System Overview
- ๐๏ธ RAG Pipeline Architecture
- ๐ Unified Orchestration
- โ๏ธ Two-Stage Chunking System
- ๐ง Embeddings & Vector Storage
- ๐จ Event-Driven Processing
- ๐ Prefect Task Orchestration
- ๐ Key Features
- ๐ ๏ธ Technology Stack
- ๐ฆ Installation
- ๐ฎ Quick Start
- ๐ Documentation
The RAG Processing Pipeline System is a production-ready, enterprise-grade platform that automatically processes documents through an intelligent, event-driven RAG pipeline combining file system monitoring, two-stage semantic chunking, AI boundary refinement, embeddings generation, and ChromaDB vector storage.
- ๐ Unified Orchestration: Centralized management of document processing and RAG pipeline workflows
- โ๏ธ Two-Stage Chunking: Advanced semantic chunking with AI-powered boundary refinement using Gemini 2.0 Flash
- ๐ง Smart Embeddings: High-quality embeddings using sentence transformers with batch processing
- ๐ Vector Storage: ChromaDB integration with persistent collections and metadata-rich storage
- ๐จ Event-Driven Architecture: Kafka-based messaging with scalable consumer groups
- โก Prefect Orchestration: Enterprise workflow management with monitoring, retries, and async task execution
- ๐ฏ Production Ready: Complete error handling, timeout management, and horizontal scaling
graph TB
subgraph "Document Processing"
A[๐ Raw Documents] --> B[๐ DoclingProcessor]
B --> C[๐๏ธ Vision Enhancement]
C --> D[๐พ Document Storage]
end
subgraph "RAG Pipeline"
D --> E[๐จ document-available Event]
E --> F[โ๏ธ Two-Stage Chunking]
F --> G[๐ง Embeddings Generation]
G --> H[๐ ChromaDB Storage]
end
subgraph "Two-Stage Chunking Detail"
F --> F1[๐ Semantic Chunker]
F1 --> F2[๐ค Boundary Agent]
F2 --> F3[๐ Final Chunks]
end
subgraph "Query & Retrieval"
H --> I[๐ Vector Search]
I --> J[๐ค RAG Generation]
J --> K[โจ Intelligent Responses]
end
style A fill:#e1f5fe
style F fill:#fff3e0
style H fill:#f3e5f5
style K fill:#e8f5e8
The system implements a sophisticated event-driven RAG pipeline architecture with unified orchestration, two-stage chunking, embeddings generation, and ChromaDB vector storage.
Based on our production docs/component.puml
, the RAG pipeline consists of:
graph TB
subgraph "๐ Unified Orchestration"
UO[UnifiedOrchestrator] --> DO[DocumentOrchestrator]
UO --> RO[RAGOrchestrator]
RO --> CC[ChunkingConsumers]
RO --> EC[EmbeddingConsumers]
RO --> SC[StorageConsumers]
end
subgraph "๐จ Event-Driven Pipeline"
DO --> DP[DocumentProducer]
DP --> K[Kafka Event Bus]
K --> CC
CC --> RP[RAGProducer]
RP --> K
K --> EC
EC --> RP
RP --> K
K --> SC
end
subgraph "๐ Prefect RAG Tasks"
CC --> RF[RAG Processing Flow]
RF --> CT[Chunking Task]
RF --> ET[Embedding Task]
RF --> VT[Vector Storage Task]
end
subgraph "โ๏ธ Two-Stage Chunking"
CT --> TSC[TwoStageChunker]
TSC --> SEM[SemanticChunker]
TSC --> BA[BoundaryAgent]
BA --> FC[Final Chunks]
end
subgraph "๐ง Embeddings & Storage"
ET --> EG[EmbeddingsGenerator]
VT --> CIE[ChunkIngestionEngine]
CIE --> CM[ChromaManager]
CM --> CD[(ChromaDB)]
end
style K fill:#fff3e0
style RF fill:#f3e5f5
style TSC fill:#e1f5fe
style CD fill:#e8f5e8
The system uses a unified orchestration approach that manages both document processing and RAG pipeline workflows through a hierarchical orchestrator structure.
class UnifiedOrchestrator:
"""Unified orchestrator managing both document processing and RAG pipelines."""
def __init__(self,
watch_directory: str = "data/documents/raw",
num_document_consumers: int = 2,
num_chunking_consumers: int = 2,
num_embedding_consumers: int = 2,
num_storage_consumers: int = 1):
# Document Processing Pipeline
self.document_orchestrator = DocumentProcessingOrchestrator(
watch_directory=watch_directory,
num_prefect_consumers=num_document_consumers
)
# RAG Processing Pipeline
self.rag_orchestrator = RAGOrchestrator(
num_chunking_consumers=num_chunking_consumers,
num_embedding_consumers=num_embedding_consumers,
num_storage_consumers=num_storage_consumers
)
def start(self) -> None:
"""Start both document processing and RAG pipeline orchestrators."""
print("๐ Starting Unified Document & RAG Processing System")
print("=" * 80)
# Start document processing (includes file monitoring)
self.document_orchestrator.start()
# Start RAG processing consumers
self.rag_orchestrator.start()
print("โ
All systems operational")
print("๐ Drop files in: data/documents/raw/")
print("๐๏ธ Vector search ready in ChromaDB collections")
class RAGOrchestrator:
"""Manages RAG pipeline consumers with horizontal scaling."""
def __init__(self, num_chunking_consumers: int = 2,
num_embedding_consumers: int = 2,
num_storage_consumers: int = 1):
# Initialize consumer pools
self.chunking_consumers = self._create_consumers(
ChunkingConsumer, num_chunking_consumers, "chunking"
)
self.embedding_consumers = self._create_consumers(
EmbeddingConsumer, num_embedding_consumers, "embedding"
)
self.storage_consumers = self._create_consumers(
StorageConsumer, num_storage_consumers, "storage"
)
def start(self) -> None:
"""Start all RAG processing consumers."""
all_consumers = (
self.chunking_consumers +
self.embedding_consumers +
self.storage_consumers
)
for consumer in all_consumers:
thread = threading.Thread(target=consumer.start_consuming)
thread.daemon = True
thread.start()
sequenceDiagram
participant User
participant UO as UnifiedOrchestrator
participant DO as DocumentOrchestrator
participant RO as RAGOrchestrator
participant Kafka as Kafka Event Bus
participant CC as ChunkingConsumer
participant EC as EmbeddingConsumer
participant SC as StorageConsumer
User->>UO: Start orchestrator
UO->>DO: Start document processing
UO->>RO: Start RAG consumers
RO->>CC: Start chunking consumers (2x)
RO->>EC: Start embedding consumers (2x)
RO->>SC: Start storage consumers (1x)
Note over DO: File detected โ Document processed
DO->>Kafka: Publish "document-available" event
Kafka->>CC: Trigger chunking pipeline
CC->>Kafka: Publish "chunking-complete" event
Kafka->>EC: Trigger embeddings pipeline
EC->>Kafka: Publish "embedding-ready" event
Kafka->>SC: Trigger storage pipeline
SC->>Kafka: Publish "ingestion-complete" event
- Unified Management: Single entry point managing document processing + RAG pipeline
- Horizontal Scaling: Configurable consumer pools for optimal throughput
- Event-Driven: Kafka-based messaging ensures fault tolerance and load distribution
- Consumer Groups: Automatic load balancing across multiple consumer instances
- Graceful Shutdown: Clean shutdown handling for all orchestrator components
- Health Monitoring: Real-time status monitoring for all pipeline stages
The system implements a sophisticated two-stage chunking approach that combines semantic understanding with AI-powered boundary analysis for optimal chunk creation.
Uses LangChain's SemanticChunker with Nomic embeddings for context-aware initial chunks:
class SemanticChunker:
"""LangChain-based semantic chunker with Nomic embeddings."""
def __init__(self, chunk_size: int = 8192, threshold: float = 0.75):
# Use Nomic embeddings for semantic similarity
embeddings = SentenceTransformerEmbeddings('nomic-ai/nomic-embed-text-v1.5')
# LangChain SemanticChunker for intelligent splitting
self.semantic_chunker = LangChainSemanticChunker(
embeddings=embeddings,
breakpoint_threshold_amount=threshold,
min_chunk_size=max(500, chunk_size // 4)
)
def chunk_text(self, text: str, source_file: str) -> List[str]:
"""Create semantically coherent chunks."""
docs = self.semantic_chunker.create_documents([text])
chunks = [d.page_content for d in docs]
return chunks
Uses Gemini 2.0 Flash to review and refine chunk boundaries with concurrent processing:
class BoundaryReviewAgent:
"""AI agent for reviewing chunk boundaries using Gemini 2.0 Flash."""
def __init__(self, context_window: int = 200, model_name: str = "gemini-2.0-flash"):
# Pydantic-AI agent with structured output
self.agent = Agent(
model=model_name,
result_type=BoundaryDecision, # MERGE or KEEP with confidence
deps_type=BoundaryReviewDeps,
)
async def review_all_boundaries(self, chunks: List[str], max_concurrent: int = 10):
"""Review all boundaries with concurrent AI agents."""
# Create boundary review tasks for concurrent processing
tasks = []
semaphore = asyncio.Semaphore(max_concurrent)
for i in range(len(chunks) - 1):
boundary_text = self.create_boundary_text(chunks[i], chunks[i + 1])
task = self._review_boundary_with_semaphore(boundary_text, i, semaphore)
tasks.append(task)
# Execute all reviews concurrently
boundary_decisions = await asyncio.gather(*tasks, return_exceptions=True)
return {
"boundary_decisions": boundary_decisions,
"merge_decisions": sum(1 for d in boundary_decisions if d["decision"] == "MERGE"),
"keep_decisions": sum(1 for d in boundary_decisions if d["decision"] == "KEEP"),
"concurrent_agents": max_concurrent
}
class TwoStageChunker:
"""Complete 2-stage chunking system with semantic analysis and boundary refinement."""
def __init__(self, chunk_size: int = 700, concurrent_agents: int = 10):
# Cached components for performance
self._cached_semantic_chunker = SemanticChunker(chunk_size=chunk_size)
self._cached_boundary_agent = BoundaryReviewAgent()
async def process_document(self, file_path: str, document_id: str) -> Dict[str, Any]:
"""Execute complete 2-stage chunking pipeline."""
# Stage 1: Semantic Chunking
initial_chunks = self._cached_semantic_chunker.chunk_text(text, document_id)
self.logger.info(f"โ
Stage 1: {len(initial_chunks)} semantic chunks")
# Stage 2: AI Boundary Review (concurrent processing)
if len(initial_chunks) > 1:
stage2_result = await self._cached_boundary_agent.review_all_boundaries(
initial_chunks, max_concurrent=self.concurrent_agents
)
final_chunks = self._apply_boundary_decisions(initial_chunks, stage2_result)
self.logger.info(f"โ
Stage 2: {len(final_chunks)} refined chunks")
else:
final_chunks = initial_chunks
# Save as structured TextChunk models with metadata
text_chunks = self._create_text_chunks(final_chunks, document_id)
chunks_path = self._save_text_chunks(text_chunks, document_id)
return {
"document_id": document_id,
"chunk_count": len(text_chunks),
"chunks_file_path": str(chunks_path)
}
The system uses sentence transformers for high-quality embeddings with batch processing and ChromaDB for persistent vector storage.
class EmbeddingsGenerator:
"""Batch embeddings generation with sentence transformers."""
def __init__(self, model_name: str = 'BAAI/bge-large-en-v1.5', batch_size: int = 32):
# Cached model for performance
self._cached_model = SentenceTransformer(model_name, trust_remote_code=True)
self.batch_size = batch_size
def process_chunks(self, chunks_file_path: str) -> Dict[str, Any]:
"""Generate embeddings for all chunks with batch processing."""
# Load chunks from file
chunks_data = self._load_chunks_file(chunks_file_path)
chunks = [chunk["content"] for chunk in chunks_data["chunks"]]
# Generate embeddings in batches
all_embeddings = []
for i in range(0, len(chunks), self.batch_size):
batch = chunks[i:i + self.batch_size]
batch_embeddings = self._cached_model.encode(batch)
all_embeddings.extend(batch_embeddings)
# Create ChromaDB-ready format
chromadb_data = self._create_chromadb_format(chunks_data["chunks"], all_embeddings)
return {
"document_id": chunks_data["document_id"],
"embeddings_count": len(all_embeddings),
"model_used": self.model_name,
"validated_embeddings": validated_embeddings,
"chromadb_ready": chromadb_data # Ready for direct ChromaDB ingestion
}
class ChromaManager:
"""ChromaDB connection and collection management."""
def __init__(self, persist_directory: str = None):
# Persistent storage for production use
self._cached_client = chromadb.PersistentClient(
path=persist_directory or "data/chroma_db",
settings=Settings(anonymized_telemetry=False, allow_reset=True)
)
self._cached_collections = {} # Collection caching
def get_collection(self, collection_name: str = "rag_documents"):
"""Get or create collection with caching."""
if collection_name in self._cached_collections:
return self._cached_collections[collection_name]
try:
# Try to get existing collection
collection = self._cached_client.get_collection(name=collection_name)
except:
# Create new collection if it doesn't exist
collection = self._cached_client.create_collection(
name=collection_name,
metadata={"created_at": datetime.now().isoformat()}
)
self._cached_collections[collection_name] = collection
return collection
class ChunkIngestionEngine:
"""Engine for ingesting embeddings into ChromaDB."""
def __init__(self):
self._cached_chroma_manager = ChromaManager()
def ingest_from_chromadb_ready_file(self, embeddings_file_path: str,
collection_name: str = None) -> bool:
"""Ingest embeddings directly from ChromaDB-ready format."""
# Load embeddings data
embeddings_data = self._load_json_file(embeddings_file_path)
chromadb_data = embeddings_data.get("chromadb_ready")
if not chromadb_data:
return False
# Get ChromaDB collection
collection = self._cached_chroma_manager.get_collection(collection_name)
# Store in ChromaDB with metadata enrichment
collection.add(
ids=chromadb_data["ids"],
embeddings=chromadb_data["embeddings"],
metadatas=chromadb_data["metadatas"], # Rich metadata for filtering
documents=chromadb_data["documents"]
)
self.logger.info(f"โ
Stored {len(chromadb_data['ids'])} chunks in ChromaDB")
return True
# Enhanced metadata for ChromaDB filtering and retrieval
metadata = {
# Core identifiers
"document_id": chunk["document_id"],
"chunk_id": chunk_id,
"chunk_index": chunk["chunk_index"],
"page_number": chunk["page_number"],
# Source information for filtering
"source_file": base_metadata.get("source_file_path", "unknown"),
"original_filename": base_metadata.get("original_filename", ""),
"document_type": base_metadata.get("document_type", "unknown"),
# Content characteristics
"chunk_length": len(chunk["content"]),
"word_count": len(chunk["content"].split()),
"chunk_position": base_metadata.get("chunk_position", "unknown"),
# Processing metadata
"chunking_strategy": "two_stage_semantic",
"embedding_model": "BAAI/bge-large-en-v1.5",
"ingested_at": datetime.now().isoformat()
}
The RAG pipeline uses Apache Kafka for event-driven processing with dedicated topics for each stage:
graph LR
subgraph "Kafka Topics & Flow"
A[๐ค document-available] --> B[๐ ChunkingConsumer]
B --> C[๐ค chunking-complete]
C --> D[๐ง EmbeddingConsumer]
D --> E[๐ค embedding-ready]
E --> F[๐ StorageConsumer]
F --> G[๐ค ingestion-complete]
end
subgraph "Consumer Processing"
B --> B1[Prefect: RAG Flow]
B1 --> B2[Two-Stage Chunking]
D --> D1[Prefect: Embedding Task]
D1 --> D2[Batch Embeddings]
F --> F1[Prefect: Storage Task]
F1 --> F2[ChromaDB Ingestion]
end
style A fill:#e1f5fe
style G fill:#e8f5e8
class ChunkingConsumer(BaseKafkaConsumer):
"""Consumer for document-available events triggering chunking pipeline."""
def get_subscribed_topics(self) -> list[str]:
return ["document-available"]
def process_message(self, message_data: dict, topic: str) -> bool:
"""Process document through chunking pipeline."""
document_id = message_data.get("document_id")
processed_file_path = message_data.get("processed_file_path")
# Execute chunking pipeline asynchronously
chunking_result = asyncio.run(self._execute_chunking_pipeline(
processed_file_path, document_id
))
# Publish chunking-complete event
self._publish_chunking_complete(document_id, chunking_result)
return True
async def _execute_chunking_pipeline(self, processed_file_path: str, document_id: str):
"""Execute the async chunking pipeline stages."""
# Stage 1: Semantic Chunking
stage1_result = semantic_chunking_task(
file_path=processed_file_path,
document_id=document_id
)
# Stage 2: Boundary Refinement (async with AI agents)
stage2_result = await boundary_refinement_task(
stage1_result=stage1_result,
concurrent_agents=10
)
# Stage 3: Chunk Formatting
chunking_result = await chunk_formatting_task(stage2_result=stage2_result)
return chunking_result
The system uses Prefect 3.0 for robust task orchestration with async support, retries, and monitoring.
@flow(
name="rag-processing-pipeline",
task_runner=ConcurrentTaskRunner(),
retries=1,
retry_delay_seconds=10
)
async def rag_processing_flow(file_path: str, document_id: str) -> Dict[str, Any]:
"""Complete RAG processing workflow with 5 stages."""
# Stage 1: Semantic Chunking (sync)
stage1_result = semantic_chunking_task(
file_path=file_path,
document_id=document_id,
chunk_size=700,
semantic_threshold=0.75
)
# Stage 2: Boundary Refinement (async with AI agents)
stage2_result = await boundary_refinement_task(
stage1_result=stage1_result,
concurrent_agents=10,
model_name="gemini-2.0-flash"
)
# Stage 3: Chunk Formatting (async)
chunking_result = await chunk_formatting_task(stage2_result=stage2_result)
# Stage 4: Embeddings Generation (async)
embeddings_result = await generate_embeddings_task(
chunks_file_path=chunking_result["chunks_file_path"],
embedding_model="BAAI/bge-large-en-v1.5",
batch_size=32
)
# Stage 5: Vector Storage (async)
storage_result = await store_vectors_task(
embeddings_file_path=embeddings_result["embeddings_file_path"],
collection_name="rag_documents"
)
return {
"pipeline_status": "success",
"document_id": document_id,
"total_chunks": chunking_result["chunk_count"],
"embeddings_generated": embeddings_result["embeddings_count"],
"vectors_stored": storage_result["vectors_stored"]
}
# Task timeouts and retry configuration
@task(
name="boundary-refinement",
retries=2,
timeout_seconds=300, # 5 minutes for AI boundary review
tags=["ai", "chunking", "boundary-review"]
)
async def boundary_refinement_task(stage1_result: Dict[str, Any]) -> Dict[str, Any]:
"""AI boundary refinement with concurrent agents."""
@task(
name="generate-embeddings",
retries=3,
timeout_seconds=600, # 10 minutes for embeddings
tags=["embeddings", "batch-processing"]
)
async def generate_embeddings_task(chunks_file_path: str) -> Dict[str, Any]:
"""Generate embeddings with batch processing."""
@task(
name="store-vectors",
retries=3,
timeout_seconds=120, # 2 minutes for ChromaDB storage
tags=["storage", "chromadb", "vectors"]
)
async def store_vectors_task(embeddings_file_path: str) -> Dict[str, Any]:
"""Store vectors in ChromaDB."""
- Two-Stage Chunking: Semantic chunking + AI boundary refinement for optimal chunks
- Concurrent AI Agents: 10 concurrent Gemini 2.0 Flash agents for boundary review
- High-Quality Embeddings: BAAI/bge-large-en-v1.5 with batch processing optimization
- Persistent Vector Storage: ChromaDB with rich metadata for advanced filtering
- Event-Driven Design: Kafka-based messaging for scalability and fault tolerance
- Horizontal Scaling: Configurable consumer groups for optimal throughput
- Unified Orchestration: Centralized management of document + RAG pipelines
- Enterprise Monitoring: Prefect UI with workflow observability
- Async Task Processing: Full async/await support throughout pipeline
- Intelligent Caching: Cached components (models, collections) for performance
- Robust Error Handling: Comprehensive timeout and retry logic
- Graceful Degradation: Fallback mechanisms for all critical components
- Gemini 2.0 Flash: AI boundary refinement with structured outputs
- BAAI/bge-large-en-v1.5: High-quality sentence embeddings
- Nomic-AI Embed v1.5: Semantic chunking embeddings
- LangChain SemanticChunker: Context-aware document splitting
- Pydantic-AI: Type-safe AI model interactions
- ChromaDB: High-performance vector database with persistence
- PostgreSQL: Document metadata and processing logs
- File System: Structured storage for chunks and embeddings
- Apache Kafka: Event streaming and message queuing
- Prefect 3.0: Async workflow orchestration and monitoring
- IBM Docling: Multi-format document processing
- Python 3.12+: Modern async Python with type hints
- Docker: Containerized deployment
- uv: Ultra-fast Python package management
- Throughput: 1000+ documents/hour with horizontal scaling
- Latency: Sub-second chunking, ~2s embeddings per document
- Concurrency: 10 concurrent AI agents, configurable consumer pools
- Scalability: Kafka partitions + consumer groups for linear scaling
- Storage: ChromaDB persistent collections with metadata indexing
- Python 3.12+
- Docker & Docker Compose
- Kafka (or use Docker Compose)
- ChromaDB (auto-installed)
# Clone repository
git clone https://github.com/your-repo/rag-processing-pipeline.git
cd rag-processing-pipeline
# Setup Python environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
# Environment configuration
cp .env.example .env
# Edit .env with your API keys (Gemini API key required)
# Start infrastructure
docker-compose up -d kafka postgres
# Start the unified orchestrator
python -m src.backend.doc_processing_system.services.unified_orchestrator
from src.backend.doc_processing_system.services.unified_orchestrator import UnifiedOrchestrator
# Initialize with balanced scaling
orchestrator = UnifiedOrchestrator(
watch_directory="data/documents/raw",
num_document_consumers=2, # Document processing
num_chunking_consumers=2, # RAG chunking
num_embedding_consumers=2, # Embeddings generation
num_storage_consumers=1 # ChromaDB storage
)
# Start complete RAG pipeline
orchestrator.start()
print("โ
RAG Pipeline System Running!")
print("๐ Drop files in: data/documents/raw/")
# Simply drop files into monitored directory
cp your_document.pdf data/documents/raw/
# Watch automatic processing:
# โ
Document processed with vision AI
# โ
Two-stage chunking complete
# โ
Embeddings generated
# โ
Vectors stored in ChromaDB
# Query ChromaDB directly
from src.backend.doc_processing_system.core_deps.chromadb.chroma_manager import ChromaManager
chroma = ChromaManager()
collection = chroma.get_collection("rag_documents")
# Semantic search
results = collection.query(
query_texts=["What are the key findings in the document?"],
n_results=5,
where={"document_type": "pdf"} # Filter by metadata
)
print(f"Found {len(results['documents'][0])} relevant chunks")
for doc, metadata in zip(results['documents'][0], results['metadatas'][0]):
print(f"๐ {metadata['original_filename']} (Page {metadata['page_number']})")
print(f"๐ {doc[:200]}...")
# Check orchestrator status
status = orchestrator.get_status()
print(f"Document Consumers: {status['document_consumers']['active']}")
print(f"Chunking Consumers: {status['rag_consumers']['chunking']['active']}")
print(f"Embedding Consumers: {status['rag_consumers']['embedding']['active']}")
print(f"Storage Consumers: {status['rag_consumers']['storage']['active']}")
# Prefect UI monitoring
print("๐ Monitor workflows at: http://localhost:4200")
# Custom chunking parameters
orchestrator = UnifiedOrchestrator(
# RAG Pipeline scaling
num_chunking_consumers=3, # More chunking throughput
num_embedding_consumers=2, # Embeddings processing
num_storage_consumers=1, # ChromaDB bottleneck typically here
# Two-stage chunking config
chunk_size=800, # Larger chunks
semantic_threshold=0.8, # Higher similarity threshold
concurrent_agents=15, # More AI boundary agents
# Storage configuration
collection_name="custom_rag", # Custom ChromaDB collection
embedding_model="sentence-transformers/all-MiniLM-L6-v2" # Faster model
)
- Architecture Diagrams - Complete class diagrams
- Component Overview - System component architecture
- Phase Documentation - Development phases and features
- API Reference - REST API documentation
- Contributing Guide - Development workflow
- Testing Guide - Testing framework and strategies
- Deployment Guide - Production deployment
- Performance Tuning - Optimization strategies
- Monitoring Setup - Observability and metrics
- Troubleshooting - Common issues and solutions
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with โค๏ธ using modern RAG and document processing technologies
โญ Star this project if you find it useful!