v1.0.0 — Production-Grade Trust-Verified RAG
Release date: 2026-04-24
A 13-night high-intensity sprint took TrustRAG from a single-app demo
to a production-deployed, ecosystem-integrated platform with measured
quality, three published PyPI packages, and Claude Desktop MCP
integration verified end-to-end.
🎯 Highlights
- ✅ WebSocket streaming with cancellation, multi-stage status,
and error frames (TTFT < 500ms target) - ✅ Hybrid retrieval (pgvector cosine + Postgres tsvector +
Reciprocal Rank Fusion k=60), benchmarked vs semantic-only
baseline — see "Measured Quality" below - ✅ 3 PyPI packages:
trustrag-langchain— Retriever, Tool, LangGraph multi-hop agent with trust budgettrustrag-mcp— MCP server, 3 tools (query / upload / audit), stdiotrustrag-eval— RAGAS pipeline with Groq / Gemini judge variants, deterministic substring-hit metric, CLI runner
- ✅ MCP in Claude Desktop verified end-to-end with production
Railway backend — seedocs/releases/v0.5.0-mcp.md - ✅ n8n workflow templates (3) — doc ingestion, Slack trust gate,
daily low-confidence digest - ✅ Live deployment: Vercel frontend + Railway backend with
pgvector + UptimeRobot keep-alive — $0/month infrastructure - ✅ Latency engineered for free-tier hardware: 30-60s → 5-10s
cache miss / sub-300ms cache hit via embedding cleanup, merged
generation+self-check prompt, and Postgres-backed query cache
📊 Measured Quality (15q synthetic, 8B-pipeline + Groq judge)
| Metric | Semantic | Hybrid | Δ |
|---|---|---|---|
| Faithfulness (RAGAS) | 0.241 | 0.377 | +13.6pp ✓ |
| Substring Hit (overall) | 0.333 | 0.357 | +2.4pp ✓ |
| ↳ Semantic queries | 0.300 | 0.400 | +10pp ✓ |
| ↳ Keyword queries | 0.400 | 0.200 | -20pp |
| Answer Relevancy | 0.729 | 0.596 | -13.3pp |
| Context Precision | 0.128 | 0.101 | -2.7pp |
| Context Recall | 0.377 | 0.273 | -10.4pp |
Hybrid significantly improves faithfulness (less hallucination,
+13.6pp) and substring-match on semantic queries (+10pp). Other
metrics show 8B-instant's synthesis weakness on broader RRF context;
70B re-run is a planned follow-up. Full methodology + honest analysis
in docs/releases/v0.3.0-hybrid.md.
⚡ Latency Profile (Railway production)
| Path | Latency |
|---|---|
| Cache hit | ~300ms (p95 < 500ms) |
| Cache miss, merged HTTP | 5-10s |
| Streaming TTFT | < 500ms (Llama 70B + Groq) |
| Cold start | 0 (UptimeRobot 5-min ping) |
📦 PyPI Packages
pip install trustrag-langchain # 0.1.0
pip install trustrag-mcp # 0.1.2
pip install trustrag-eval # 0.1.0🌐 Live URLs
- Frontend: https://trustrag.vercel.app
- Backend: https://trustrag-production.up.railway.app
- Health: https://trustrag-production.up.railway.app/health
🛡️ Architectural Tradeoffs Disclosed
- Merged-prompt HTTP path uses in-prompt LLM self-check (single
Groq call returns{answer, self_check.unsupported_claims}).
Known ~5-10% bias since the same model checks its own answer.
RAGAS faithfulness (independent evaluation) is the bias-free
reference. SIGN-112 inplans/guardrails.md. - Streaming WebSocket path keeps 2-call architecture (separate
hallucination check) for stricter fact-checking under the
token-flow UX where the second call's latency is hidden. - Railway free tier (1GB RAM / 0.5 vCPU): UptimeRobot keep-alive
prevents cold sleeps but doesn't buy more CPU. Embedding query
stays on the critical path at ~2-5s. - Benchmarks ran on
llama-3.1-8b-instantbecause the 70B daily
token quota was exhausted. Production reverts to
llama-3.3-70b-versatilefor both pipeline and (when needed)
trust-verification calls.
🔄 Breaking Changes
None. v0.1's API contract is preserved — QueryResponse.hallucination_check.flags,
/api/query/, WebSocket message shapes — all unchanged.
🗺️ Roadmap
- v1.1: DOCX + HTML ingestion
- v1.2: Session auth + per-user rate limits
- v1.3: Cross-encoder rerank between RRF and top-5 (addresses
keyword-query regression observed in v0.3.0 benchmark) - v2.0: Multi-tenant + usage quotas
Commits in this release (since v0.4.0-langchain)
Full git log:
- Spec & plan:
7c85b69,f5b7217 - WS1 backend opt:
c31af8f(embedding cleanup) +ed9e64b(ruff + /health HEAD) - Cache:
ef908cd+2b643f7 - Merged prompt:
fe50642 - Eval (Gemini):
ebc07a8+859b05b+ef01285 - Eval (Groq judge + tuning):
7bfa775+75f460d - Benchmarks:
75f460d(semantic) +f262f93(hybrid) - v0.3.0-hybrid release:
5f96b12 - v0.5.0-mcp draft:
07da34f
🙏 Credits
- Engineering & Architecture: Jigang Zhou (Harry) — github.com/jigangz
- Pair-programming partner: Claude Code (Anthropic)
Built during the 2026-04 sprint as a portfolio project for SWE / ML
engineer / Founding engineer roles. Production-grade decisions made
under realistic free-tier constraints — every architecture choice is
documented in docs/superpowers/specs/.
Install now:
pip install trustrag-langchain trustrag-mcp trustrag-evalTry the demo: https://trustrag.vercel.app