RAGForge is a Kubernetes-native Hybrid Retrieval-Augmented Generation (RAG) system built using official Kubernetes documentation as a real-world technical corpus.
This project is designed with AI Infrastructure principles from day one:
- Container-first architecture
- Kubernetes deployment (Minikube for lab)
- Modular RAG design
- Production-ready mindset
- Observability-ready structure
Build a fully containerized Hybrid RAG system running on Kubernetes (Minikube) using:
📄 Official Kubernetes Basics Documentation
Source:
https://kubernetes.io/docs/tutorials/kubernetes-basics/_print/
Initial Dataset: Printable version (~24 pages)
Why start with this dataset?
- Controlled size
- Structured technical content
- DevOps-aligned domain
- Ideal for parameter tuning
Flow:
User
↓
Ingress (Minikube)
↓
Service (ClusterIP / LoadBalancer)
↓
RAGForge API Pod (FastAPI)
├── Retriever
├── RAG Logic
└── LLM Client
↓
Vector DB Pod (Chroma / FAISS)
↓
External LLM API
Observability (Optional – Stage 4+):
- Prometheus
- Grafana
- Structured Logging
- Clean isolation from local OS
- Portability across environments
- Easy scaling (HPA later)
- Future-ready production design
- Strong alignment with AI Infra / MLOps roles
Minikube provides:
- Local cluster
- Safe experimentation
- No pollution of host system
- Real Kubernetes workflow
ragforge/ │ ├── README.md ├── ROADMAP.md ├── requirements.txt ├── .env.example ├── .gitignore │ ├── data/ │ ├── raw/ │ │ └── kubernetes_basics.pdf │ └── processed/ │ ├── src/ │ ├── ingestion/ │ ├── embeddings/ │ ├── retriever/ │ ├── generation/ │ ├── evaluation/ │ └── config/ │ ├── k8s/ │ ├── namespace.yaml │ ├── ragforge-deployment.yaml │ ├── ragforge-service.yaml │ ├── vector-db-deployment.yaml │ ├── vector-db-service.yaml │ └── ingress.yaml │ └── tests/
- Extract text from PDF
- Clean formatting artifacts
- Normalize text
- Remove duplicated headers
- Section-aware chunking
Key Parameters:
- chunk_size
- chunk_overlap
- section tagging
- page metadata
- Generate vector representations
- Batch processing
- Cache embeddings
- Normalize vectors
Key Concepts:
- Cosine similarity
- Embedding dimensionality
- Semantic proximity
- Top-K similarity search
- Similarity threshold filtering
- Metadata filtering
- Section-aware retrieval
Why: Retrieval precision > model size.
- Grounded prompt template
- Context injection
- Citation enforcement
- Anti-hallucination rules
Security:
- Prompt injection mitigation
- Input validation
- Context isolation
Metrics:
- Context relevance
- Faithfulness
- Latency
- Token usage
MLOps mindset from start.
- No secrets in Git
.envexcluded- Kubernetes secrets for API keys
- Input length control
- Prompt injection mitigation
- Mandatory citation policy
| Parameter | Impact |
|---|---|
| chunk_size | Retrieval granularity |
| chunk_overlap | Context continuity |
| top_k | Retrieval depth |
| similarity_threshold | Noise filtering |
| temperature | Determinism |
| max_tokens | Cost control |
| embedding_model | Vector quality |
- Start Minikube
- Build Docker image
- Apply namespace
- Deploy vector DB
- Deploy RAGForge API
- Expose via Ingress
- Query via browser / curl
This simulates a real production deployment workflow.
- Hybrid Search (BM25 + Vector)
- Re-ranking layer
- HPA autoscaling
- Prometheus metrics export
- Grafana dashboards
- CI/CD with GitHub Actions
- Helm chart
- Multi-environment setup (dev/stage/prod)
After completion, this project demonstrates:
- Kubernetes-native AI system design
- Hybrid RAG architecture
- AI Infrastructure thinking
- Observability awareness
- Production-oriented DevOps skills
Target Roles:
AI Infrastructure Engineer
MLOps Engineer
LLMOps Engineer
Platform Engineer (AI Focus)