Distributed RAG Core is a production-style, microservice Retrieval-Augmented Generation platform for multimodal ingestion (text, PDF, image, audio, video), semantic retrieval, and model-flexible answering via Groq, Ollama, and Gemini.
- Modular service boundaries for independent scaling and fault isolation.
- Async ingestion and embedding queues for stable throughput under load.
- Strong retrieval pipeline with reranking and grounding-oriented answer constraints.
- Full-stack experience with a modern Next.js UI and operational observability endpoints.
flowchart LR
UI[Frontend\nNext.js] --> GW[Gateway\nFastAPI]
GW --> IQ[Redis ingestion:queue]
IQ --> ING[Ingestion Worker]
ING --> EQ[Redis embedding:queue]
EQ --> EMB[Embedding Worker]
EMB --> VS[Vector Store Service]
VS --> QD[(Qdrant)]
GW --> QS[Query Service]
QS --> EMB
QS --> VS
GW <--> PG[(PostgreSQL)]
ING <--> PG
EMB <--> PG
QS <--> PG
VS <--> PG
frontend- Next.js App Router UI (query, documents, scrape, ollama, system tabs).gateway- API ingress, upload and scrape endpoints, query proxy, health and metrics aggregation.ingestion-worker- file extraction and chunking pipeline, pushes embedding jobs.embedding-worker- embedding generation and vector upsert orchestration.vector-store- single ownership over Qdrant reads/writes plus search cache.query-service- retrieval, reranking, prompt orchestration, and provider dispatch.shared- shared SQL schema bootstrap.scripts- operational reset and cleanup utilities.docker-compose.yml- full local stack.docker-compose.worker.yml- optional remote worker scaling..env.example- complete runtime configuration and tuning reference.
frontend -> gateway -> ingestion:queue -> ingestion-worker -> embedding:queue -> embedding-worker -> vector-store -> qdrant
frontend -> gateway /api/websites/scrape -> document pipeline (same as upload)
frontend -> gateway /api/query -> query-service -> embedding-worker (/api/embed/text) -> vector-store (/api/vectors/search) -> provider (groq|ollama|gemini)
- Backend: FastAPI + Uvicorn (Python microservices)
- Frontend: Next.js 15 + React 19 + TypeScript
- Data and infra: PostgreSQL, Redis, Qdrant
- AI providers:
- Embeddings: Gemini
- Query-time LLMs: Groq, Ollama, Gemini
- Multimodal extraction support: Whisper, OCR, optional vision models
- Docker Desktop (WSL2 backend recommended on Windows)
- Git
- Optional: Ollama if you want local inference/vision
Copy-Item .env.example .envSet at minimum:
GEMINI_API_KEY=your_key_hereCommon optional variables:
PUBLIC_HOST=localhost
GROQ_API_KEY=your_key_here
OLLAMA_URL=http://host.docker.internal:11434docker compose up -d --build
docker compose ps- Frontend: http://localhost:3000
- Gateway health: http://localhost:8000/health
GET /healthPOST /api/documents/uploadGET /api/documentsGET /api/documents/{doc_id}/statusDELETE /api/documents/{doc_id}POST /api/websites/scrapePOST /api/queryGET /api/query/historyGET /api/system/statusGET /api/stats/queueGET /api/stats/overviewPOST /api/client/heartbeatGET /api/ollama/modelsPOST /api/ollama/pullDELETE /api/ollama/models/{model_name}POST /api/ollama/test
GET /healthPOST /api/embed/text
GET /healthGET /api/collections/infoPOST /api/vectors/upsertPOST /api/vectors/searchDELETE /api/vectors/{doc_id}GET /api/stats
GET /healthGET /api/modelsPOST /api/query
Defined in shared/init.sql.
documents- ingestion lifecycle and status tracking.chunks- extracted chunk payloads with metadata.embedding_jobs- async embedding work state.query_logs- query execution and model usage metadata.api_keys- reserved table for API key handling.
Document lifecycle:
queued -> extracting -> chunking -> embedding -> done
Failure path:
failed with error_msg.
Use docker-compose.worker.yml to add ingestion/embedding capacity on other machines.
Key requirements:
PUBLIC_HOSTpoints to the primary host running Redis/Postgres/Vector Store.SHARED_UPLOADS_DIRmaps a path visible to remote ingestion workers at/uploads.- Keep API keys and model settings aligned with the main stack.
Run:
docker compose -f docker-compose.worker.yml up -d --buildFrom frontend:
npm install
npm run devOther scripts:
npm run build
npm run start
npm run lintdocker compose logs -fpython scripts/clear_database.py --yespython scripts/clear_vector_db.py --yes- Ingestion file access failures:
- Verify uploads volume/path mapping, especially with remote workers.
- Tune
INGESTION_FILE_RETRY_MAXandINGESTION_FILE_RETRY_DELAY_SEC.
- Growing embedding backlog:
- Increase
WORKER_CONCURRENCYand/or run additional workers.
- Increase
- Query latency spikes:
- Increase
QUERY_EMBED_CONCURRENCY. - Review provider timeouts and retry settings.
- Increase
- Ollama model list empty:
- Confirm
OLLAMA_URLand model availability.
- Confirm
- LAN connectivity issues:
- Verify
PUBLIC_HOST, firewall rules, and reachable ports (3000,8000, and infra ports as needed).
- Verify
- All services are healthy in
docker compose ps. - Uploading files reaches final
donestatus. - Scrape creates documents that become queryable.
- Query succeeds with at least one provider.
- System tab reflects queue depth and service health.
SETUP.mdprovides detailed operations guidance and extended setup scenarios.