v1.0.0 — Production-Grade Trust-Verified RAG

Release date: 2026-04-24

A 13-night high-intensity sprint took TrustRAG from a single-app demo
to a production-deployed, ecosystem-integrated platform with measured
quality, three published PyPI packages, and Claude Desktop MCP
integration verified end-to-end.

🎯 Highlights

✅ WebSocket streaming with cancellation, multi-stage status,
and error frames (TTFT < 500ms target)
✅ Hybrid retrieval (pgvector cosine + Postgres tsvector +
Reciprocal Rank Fusion k=60), benchmarked vs semantic-only
baseline — see "Measured Quality" below
✅ 3 PyPI packages:
- trustrag-langchain — Retriever, Tool, LangGraph multi-hop agent with trust budget
- trustrag-mcp — MCP server, 3 tools (query / upload / audit), stdio
- trustrag-eval — RAGAS pipeline with Groq / Gemini judge variants, deterministic substring-hit metric, CLI runner
✅ MCP in Claude Desktop verified end-to-end with production
Railway backend — see docs/releases/v0.5.0-mcp.md
✅ n8n workflow templates (3) — doc ingestion, Slack trust gate,
daily low-confidence digest
✅ Live deployment: Vercel frontend + Railway backend with
pgvector + UptimeRobot keep-alive — $0/month infrastructure
✅ Latency engineered for free-tier hardware: 30-60s → 5-10s
cache miss / sub-300ms cache hit via embedding cleanup, merged
generation+self-check prompt, and Postgres-backed query cache

📊 Measured Quality (15q synthetic, 8B-pipeline + Groq judge)

Metric	Semantic	Hybrid	Δ
Faithfulness (RAGAS)	0.241	0.377	+13.6pp ✓
Substring Hit (overall)	0.333	0.357	+2.4pp ✓
↳ Semantic queries	0.300	0.400	+10pp ✓
↳ Keyword queries	0.400	0.200	-20pp
Answer Relevancy	0.729	0.596	-13.3pp
Context Precision	0.128	0.101	-2.7pp
Context Recall	0.377	0.273	-10.4pp

Hybrid significantly improves faithfulness (less hallucination,
+13.6pp) and substring-match on semantic queries (+10pp). Other
metrics show 8B-instant's synthesis weakness on broader RRF context;
70B re-run is a planned follow-up. Full methodology + honest analysis
in docs/releases/v0.3.0-hybrid.md.

⚡ Latency Profile (Railway production)

Path	Latency
Cache hit	~300ms (p95 < 500ms)
Cache miss, merged HTTP	5-10s
Streaming TTFT	< 500ms (Llama 70B + Groq)
Cold start	0 (UptimeRobot 5-min ping)

📦 PyPI Packages

pip install trustrag-langchain  # 0.1.0
pip install trustrag-mcp         # 0.1.2
pip install trustrag-eval        # 0.1.0

🌐 Live URLs

Frontend: https://trustrag.vercel.app
Backend: https://trustrag-production.up.railway.app
Health: https://trustrag-production.up.railway.app/health

🛡️ Architectural Tradeoffs Disclosed

Merged-prompt HTTP path uses in-prompt LLM self-check (single
Groq call returns {answer, self_check.unsupported_claims}).
Known ~5-10% bias since the same model checks its own answer.
RAGAS faithfulness (independent evaluation) is the bias-free
reference. SIGN-112 in plans/guardrails.md.
Streaming WebSocket path keeps 2-call architecture (separate
hallucination check) for stricter fact-checking under the
token-flow UX where the second call's latency is hidden.
Railway free tier (1GB RAM / 0.5 vCPU): UptimeRobot keep-alive
prevents cold sleeps but doesn't buy more CPU. Embedding query
stays on the critical path at ~2-5s.
Benchmarks ran on llama-3.1-8b-instant because the 70B daily
token quota was exhausted. Production reverts to
llama-3.3-70b-versatile for both pipeline and (when needed)
trust-verification calls.

🔄 Breaking Changes

None. v0.1's API contract is preserved — QueryResponse.hallucination_check.flags,
/api/query/, WebSocket message shapes — all unchanged.

🗺️ Roadmap

v1.1: DOCX + HTML ingestion
v1.2: Session auth + per-user rate limits
v1.3: Cross-encoder rerank between RRF and top-5 (addresses
keyword-query regression observed in v0.3.0 benchmark)
v2.0: Multi-tenant + usage quotas

Commits in this release (since v0.4.0-langchain)

Full git log:

Spec & plan: 7c85b69, f5b7217
WS1 backend opt: c31af8f (embedding cleanup) + ed9e64b (ruff + /health HEAD)
Cache: ef908cd + 2b643f7
Merged prompt: fe50642
Eval (Gemini): ebc07a8 + 859b05b + ef01285
Eval (Groq judge + tuning): 7bfa775 + 75f460d
Benchmarks: 75f460d (semantic) + f262f93 (hybrid)
v0.3.0-hybrid release: 5f96b12
v0.5.0-mcp draft: 07da34f

🙏 Credits

Engineering & Architecture: Jigang Zhou (Harry) — github.com/jigangz
Pair-programming partner: Claude Code (Anthropic)

Built during the 2026-04 sprint as a portfolio project for SWE / ML
engineer / Founding engineer roles. Production-grade decisions made
under realistic free-tier constraints — every architecture choice is
documented in docs/superpowers/specs/.

Install now:

pip install trustrag-langchain trustrag-mcp trustrag-eval

Try the demo: https://trustrag.vercel.app

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0 - Production-Grade Trust-Verified RAG

Choose a tag to compare

Sorry, something went wrong.