Production-oriented RAG application built as a showcase of how the same product can be implemented with both Go and Python/FastAPI backends while preserving shared frontend contracts, retrieval architecture, evaluation tooling, and deployment shape.
Built end-to-end with:
- Go backend (
/chat,/chat/stream,/ingest/text, eval APIs) - Python/FastAPI backend for the same product surface
- Postgres +
pgvectorretrieval - React UI (chat + eval dashboard)
- Eval runner + scoring + regression gate
- Observability (structured logs + request/span tracing)
- GCP deployment (Cloud Run + Cloud SQL + Secret Manager)
- Firebase Hosting frontend
- Frontend (Firebase): https://multi-model-rag-5713b.web.app
- Backend API (Cloud Run): https://multi-model-rag-api-ozzmnn5qja-uc.a.run.app
- Demo Video : https://youtu.be/iM-_bXSPgOQ
Note:
- I pause Cloud SQL outside demo/testing windows to control billing on a personal GCP account.
- If DB is paused, DB-backed features (
/ingest/text,/evals/*,rag=true) will fail until Cloud SQL is resumed. - The local repository contains both backend implementations. Go is the current primary runtime path, and the FastAPI backend remains available for comparison, reference, and parity checks.
Build a production-grade, production-shaped RAG system that demonstrates:
- model abstraction (multi-provider design)
- retrieval + grounding + citations
- evaluation and regression safety
- cloud deployment and operations
- observability and debugging discipline
- Core FastAPI-to-Go migration plan:
11/11phases complete - Remaining work is hardening, parity review, real provider integrations, and deciding the long-term coexistence or cleanup path for the two backend implementations
- Go backend endpoints:
GET /GET /healthPOST /chatPOST /chat/streamPOST /ingest/textGET /evals/runsGET /evals/runs/{id}GET /auth/demo-statusPOST /auth/demo-unlockPOST /auth/demo-lock
- RAG ingestion and retrieval via Postgres +
pgvector - Citation-grounded responses
- Demo auth cookie flow
- React UI with:
- JSON + SSE chat modes
- provider selection
- RAG/debug toggles
- citations panel
- retrieved chunks debug panel
- eval dashboard (runs + case-level scores)
- Eval dataset runner (
jsonl) - Judge scoring:
- correctness
- groundedness
- hallucination
- Regression gating against baseline
- Eval persistence in DB (
eval_run,eval_run_case)
- Go server runs from
cmd/apiwith packages underinternal/* - FastAPI backend also exists under
apps/api - Dockerfile builds and runs the Go backend
- Cloud SQL migrations remain SQL-first via
psql - Frontend remains unchanged in
apps/web
- Go backend currently implements deterministic stub behavior for:
geminiopenaigrok
- Default model names are wired for those providers, but real provider HTTP integrations in Go are still pending
- Real Gemini/OpenAI/Grok execution is post-migration follow-up work, not part of the completed core replacement
High-level flow:
apps/websends requests to the Go APIcmd/apiboots the server and loads env-based configinternal/httpapihandles routes, decoding, validation, and HTTP responsesinternal/serviceorchestrates chat, ingest, and eval behaviorinternal/llmprovides provider routing and stub responsesinternal/raghandles chunking, embeddings, grounding, citations, and prompt constructioninternal/storeaccesses Postgres +pgvectorviadatabase/sqlandgithub.com/jackc/pgx/v5/stdlibinternal/observabilityemits structured request and span logs
cmd/api+internal/*: Go backendapps/api: Python/FastAPI backend for the same product surfaceapps/web: React + Vite frontendpackages/llm: provider abstractionpackages/rag: ingestion/retrieval/citationspackages/evals: eval runner/scoring/gatingpackages/observability: logging/tracing helpersinfra/: deployment scriptsdatasets/: eval datasets + baselines
- End-to-end RAG pipeline works in a hosted environment (Firebase + Cloud Run + Cloud SQL)
- Retrieval and citations are visible in the UI and API
- Eval and regression tooling can catch behavior changes between stub and real providers
- The same RAG product can be implemented in both Go and Python/FastAPI without changing the frontend contract
- Cloud operations/debugging workflows are documented and reproducible
- Go backend real provider execution is not implemented yet; current Go behavior is stub-first
- Exact FastAPI validation and edge-case parity still needs a final review pass
- Maintaining both Go and FastAPI backends increases documentation and maintenance overhead
- CI is implemented; CD is still manual (deploy scripts + commands)
- Tracing is currently log-based spans (not full OpenTelemetry exporter)
- Real Gemini, OpenAI, and Grok provider integrations in Go
- Parity gap review against remaining FastAPI edge cases
- Cutover cleanup and eventual deprecation of
apps/api - CI/CD automation:
- backend deploy to Cloud Run
- frontend deploy to Firebase Hosting
- Real-mode baseline management (separate baselines per provider/model)
- Further RAG quality improvements (retrieval tuning/reranking)
cp .env.example .env
make up
make migrate
make apiPython/FastAPI backend:
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev]"
uv run uvicorn apps.api.main:app --reload --host 0.0.0.0 --port 8000Backend runtime summary:
Go API (primary): http://localhost:8080
FastAPI API: http://localhost:8000
The Go backend is the primary local and container runtime path:
make apiThe FastAPI backend remains available for comparison and parity checks:
uv run uvicorn apps.api.main:app --reload --port 8000cd apps/web
npm install
npm run devOpen http://localhost:5173
Default local ports:
- Go API:
http://localhost:8080 - Vite frontend:
http://localhost:5173
uv run ruff check .
uv run ruff format --check .
uv run mypy apps/api tests packages scripts
uv run pytest -q
GOCACHE=/tmp/go-build-cache go test ./cmd/... ./internal/...
docker build -t multi-model-rag-api:local .Use these docs for setup, troubleshooting, and reproducible cloud commands:
docs/gcp-commands.md- reusable GCP command reference (Cloud Run, Cloud SQL, Secret Manager, logs, eval commands)docs/gcp-setup-steps.md- chronological cloud setup + deployment steps actually performeddocs/troubleshooting.md- GCP/cloud issues encountered and how they were fixeddocs/RAG-TEST.md- positive/negative RAG test plan and proof criteriadocs/database_design.md- schema/data model notes,pgvectordesign, embedding dimension decisions
- The hosted demo is real cloud infrastructure (not localhost).
- If a live demo endpoint fails, it may be because Cloud SQL is paused to control costs.
- I can resume the DB and run a live walkthrough if needed.
This project was built iteratively (small slices with tests + proof commands + pushes), and the detailed planning artifacts remain in:
docs/project_plan.md