A scalable, microservice-based Retrieval‑Augmented Generation system with Valkey‑powered queues, LangGraph orchestration, LangSmith tracing, and AWS EC2 + Load Balancer deployment.
MicroRAG is a production‑grade platform that analyzes Resumes and Job Descriptions (JDs), generates gap analyses & improvement suggestions, and exposes the workflow via APIs and minimal UI. The system is designed to be horizontally scalable, event‑driven, and observability‑first.
Core ideas:
- Microservices for separation of concerns and independent scaling.
- RAG pipeline for factual, source‑grounded responses.
- Valkey as the queue backbone for decoupling producers/consumers.
- LangGraph for deterministic, recoverable orchestration.
- LangSmith for tracing, evaluation, and monitoring.
- AWS EC2 + ALB for simple, flexible deployment.
-
Upload JD and Resume (PDF) → Enqueue in Valkey → Generate file ID.
-
Storage: Raw PDF stored in mounted Docker volume (
/mnt/volume
). -
Worker Step 1: Pick up file → Convert PDF pages into images → Save in Docker volume.
-
Worker Step 2: Send images to OpenAI Vision/Text API → Extract text.
-
Processing: Resume text + JD passed to LangGraph pipeline.
-
LangGraph Flow:
- Rewrite JD (clean, concise, ATS‑friendly).
- Perform analysis on rewritten JD.
- Generate suggestions & gap analysis.
-
Result Delivery: JSON/PDF report available for download.

-
api-gateway (FastAPI)
- Upload JD + Resume
- Enqueue jobs in Valkey
- Returns job ID
-
pdf-worker
- Reads jobs from Valkey
- Converts PDF pages → Images
- Stores in mounted Docker volume
- Reads images
- Sends to OpenAI API for text extraction
- Stores extracted text
-
rag-orchestrator (LangGraph)
- Rewrite JD
- Compare resume & rewritten JD
- Generate improvement suggestions
-
reporter
- Consolidates analysis
- Outputs JSON/PDF reports
- Languages: Python (FastAPI, workers)
- Orchestration: LangGraph
- Observability: LangSmith, CloudWatch
- Queue: Valkey (Redis‑compatible)
- Database: MongoDB (via devcontainer for metadata & tracing)
- Infra: Docker, Docker Compose, AWS EC2, ALB
- CI/CD: GitHub Actions → EC2 deploy via SSH/scp
- Dev Tools:
.devcontainer
for VSCode, Dockerfiles for all services
scalable-rag/
├─ app/
│ ├─ main.py
│ ├─ server.py
│ ├─ graph.py
│ ├─ db/
│ │ ├─ client.py
│ │ ├─ db.py
│ │ └─ collections/
│ ├─ queue/
│ └─ utils/
│ └─ file.py
├─ docker-compose.yaml
├─ docker-compose.prod.yaml
├─ Dockerfile
├─ requirements.txt
├─ freeze.sh
├─ run.sh
├─ start_worker.sh
├─ venv/
├─ .devcontainer/
└─ docs/
└── architecture.png
This project ships with a ready‑to‑use VSCode Dev Container for consistent local development.
# 1. Clone the repo
git clone https://github.com/mukulpythondev/MicroRAG.git
cd MicroRAG
# 2. Open in VSCode with Dev Containers extension installed
# VSCode will detect `.devcontainer` and build environment automatically.
# 3. Start services inside container
docker compose up -d
# 4. Run API service (FastAPI)
docker compose run app python app/server.py
# 5. Start a worker (PDF→Image, OCR, LangGraph orchestration)
docker compose run app bash start_worker.sh
Copy .env
:
OPENAI_API_KEY=
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=
LANGSMITH_API_KEY=
LANGSMITH_PROJECT=
- Provision EC2 instances (Ubuntu 22.04+)
- Install Docker + Docker Compose
- Clone repo → Pull environment secrets from AWS SSM/Secrets Manager
- Run services with
docker compose -f docker-compose.prod.yaml up -d
- Attach EC2 instances to AWS Application Load Balancer (ALB)
- Scaling: Add/remove EC2 instances behind ALB
- Store intermediate files in Docker volume (fast local access)
- Parallelize workers by increasing consumer count
- Batch OCR requests to OpenAI where possible