-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Feature Request: Pluggable Retrieval (RAG/Graph‑RAG) Backends With Upstream‑Compatible Default
- Summary: Allow admins to choose and control the RAG backend and LLM provider while keeping the default (builtin) behavior unchanged. Add optional Graph‑RAG and fine‑grained RAG controls, with minimal, generic hooks so upstream remains stable and easy to maintain.
Status of own contribution and PR
-
I’ve built a working fork that introduces a pluggable RAG backend interface (keeping
builtinas the default) and an optional Graph‑RAG adapter, while preserving upstream APIs and behavior by default. -
The changes are intentionally minimal and generic to keep diffs small and upstream‑friendly; when
RAG_BACKEND=builtin, the backend behaves exactly as today. -
Before opening a formal PR, I’d appreciate maintainer review and community feedback to ensure the approach, naming, and hooks align with project expectations. I’m happy to rework code to fit conventions, split into smaller PRs, and add or adjust tests/docs as requested.
-
I have emailed two of the leading repository developers to share context and request guidance, but have not yet received a response. I’ll proceed at the project’s preferred pace and can provide branch/commit references on request.
-
This fork is accessible at - https://github.com/ga-it/context_chat_backend
-
This is intended as a working proof of concept - I do not intend to maintain a working fork.
Choice of SciPhi R2R as the Pilot
- Open, production‑ready stack: R2R is open source (MIT) with a mature REST API, hybrid search, and optional Graph‑RAG (entities/relationships, graph search, deduplication), aligning with real‑world needs for complex corpora.
- Upstream‑friendly integration: R2R slots in behind a small adapter boundary; CCBE’s endpoints/shapes remain unchanged and
builtinstays the default (RAG_BACKEND=builtin). - Model neutrality: Via LiteLLM, R2R supports bring‑your‑own model/vendor (OpenAI‑compatible, Ollama, Anthropic, Mistral, etc.) with simple prefix routing, satisfying “own choice and control of LLM backend.”
- Fine‑grained RAG control: Tunables for chunking/overlap/strategy, summarization, embeddings (model, batch), re‑ranking, and concurrency; safe per‑request overrides and per‑collection defaults.
- Operational fit: Postgres + pgvector and S3/MinIO storage align with common ops stacks; optional Hatchet orchestration handles scalable ingestion without being mandatory.
- Permissions alignment: Collections and filters map cleanly to Nextcloud users/groups scoped by CCBE, preserving access control and tenant boundaries.
- Low coupling, easy rollback: R2R remains out‑of‑tree; switching back to upstream behavior is a single env flip to
builtin.
Pilot Scope and Evaluation
- Compatibility: Verify parity for create/list/retrieve/chunk/search/RAG/delete flows; ensure CCBE response shapes and status codes are identical behind the adapter.
- Performance & scale: Measure ingestion throughput, query latency, and index build times on representative datasets; validate batch settings and concurrency controls.
- Quality: Assess Graph‑RAG uplift on targeted collections (answer quality, citation relevance) versus plain semantic search.
- Security & compliance: Confirm collection‑scoped filtering, credential isolation, object store lifecycle, and export/delete coverage.
- Ops readiness: Exercise logs/metrics, backups, and maintenance tasks (index, vacuum); document runbooks and failure modes.
- Rollback path: Maintain immediate fallback to
builtinby env change; no data model changes required in CCBE.
Risks and Mitigations
- API drift: Introduce a small provider contract with conformance tests to catch regressions; keep the adapter minimal and generic.
- Resource footprint: Start with conservative defaults; document tunables (batch sizes, concurrency, index methods) for different hardware tiers.
- Vendor lock‑in: R2R is optional; CCBE retains
builtinas default. LiteLLM routing preserves multi‑vendor optionality.
Why R2R over Alternatives (for the pilot)
- Combines hybrid search and Graph‑RAG in one coherent API with strong admin controls, reducing glue code versus stitching separate vector DB + graph engines.
- Active ecosystem and clear configuration story for LLM neutrality and per‑deployment tuning.
Note: This pilot selection does not imply endorsement or coupling. The goal is to validate the adapter pattern with a capable backend while keeping upstream defaults intact.
Motivation
- Own LLM control: Many admins need to route via their preferred LLM gateway (e.g., OpenAI-compatible proxies like LiteLLM), select models per org policy, and change vendors without app upgrades.
- Graph‑RAG: Larger deployments increasingly want entity/relationship graphs to improve retrieval quality on complex corpora.
- Fine‑grained RAG controls: Sites want to tune chunking/overlap, deduplication, summarization, embedding model, re‑ranking, and concurrency per deployment, per collection, and sometimes per request (with sensible allowlists).
This proposal asks for a small adapter boundary in Context Chat Backend (CCBE) so retrieval can be handled by the builtin engine or an external provider, without any change to CCBE’s public API or default UX.
Goals
- Keep the default behavior identical to upstream (zero config = builtin).
- Make the change small, generic, and diff‑friendly (adapter boundary + env switch).
- Preserve all current endpoints, shapes, and status codes.
- Allow plug‑in backends (e.g., Graph‑RAG systems) without coupling them into CCBE’s codebase.
- Expose fine‑grained RAG controls through configuration and safe per‑request overrides.
Non‑Goals
- Don’t remove or break the builtin backend.
- Don’t hardwire any third‑party provider into CCBE.
- Don’t change the user‑facing Context Chat app flows.
Proposed Design
- Backend switch:
- Add
RAG_BACKENDenv var with defaultbuiltin. Example:RAG_BACKEND=builtin→ current behavior, unchanged.RAG_BACKEND=<provider>→ CCBE calls a provider via a thin adapter.
- Add
- Provider interface (contract outline):
ingest(file|chunks, metadata, userIds, collections, settings) -> task/resultsearch(query, filters, settings, userIds) -> hitsrag(question, filters, settings, userIds) -> answer + citationsdocuments.{list, get, deleteById, deleteByFilter}()- Optional:
graphs.{extract, deduplicate, enrich, search}() health()
- Identity and scope:
- CCBE forwards access scope (e.g.,
userIds, group memberships) so providers filter by authorized collections.
- CCBE forwards access scope (e.g.,
- LLM control (builtin and pass‑through):
- Builtin backend can honor
LLM_API_BASE,LLM_MODEL_*envs or similar. - External providers can handle model selection internally; CCBE needn’t embed vendor logic.
- Builtin backend can honor
- Fine‑grained RAG settings:
- Configurable defaults for chunk size/overlap/strategy, summarization (on/off, model), embedding model + batch, re‑ranking, and concurrency.
- Safe per‑request overrides (allowlisted fields only) for advanced users.
- Per‑collection overrides for multi‑tenant deployments.
- Backward compatibility:
- No endpoint/shape changes.
- If the adapter fails or isn’t configured, CCBE stays on
builtin.
Why This Approach
- Operational independence: Admins can scale/patch their retrieval/graph stack independently of CCBE releases.
- Vendor neutrality: Sites can pick LLMs/vectordbs/graphs without app changes.
- Minimal surface: A small adapter boundary keeps upstream maintainable while addressing frequent admin requests.
Security/Compliance
- Least privilege: CCBE keeps only the secrets needed to call the backend; provider enforces collection scoping.
- Isolation: Retrieval/LLM infra can run on separate hosts/subnets with their own audit/backup/retention policies.
- No data model breakage: The default/builtin behavior remains as‑is.
Alternatives Considered
- Keep a single, in‑tree backend: simpler short‑term, but constrains scale/vendor choice; frequent requests to change models/providers reappear.
- Full plugin framework inside CCBE: heavier lift and maintenance; an adapter boundary via env is enough to unblock real deployments.
Acceptance Criteria
- Default behavior identical to current upstream when
RAG_BACKENDis unset/builtin. - A provider contract doc (methods, status, auth/identity, filter semantics).
- Basic conformance tests that run CCBE against the builtin and one external provider, asserting identical response shapes/status codes for ingest/search/rag/delete.
- Clear admin docs showing:
- How to keep builtin.
- How to enable a provider.
- Which fine‑grained settings are supported and how to override safely.
Open Questions
- Would maintainers prefer env‑only selection, or also allow a config file entry?
- Where should the provider contract doc live (docs/ vs. wiki)?
- Versioning of the provider interface (semantic changes over time)?
- Any preferences for naming (e.g.,
RAG_BACKEND_PROVIDERvsRAG_BACKEND)?
Proposed Minimal Config Examples
- Keep defaults (no change):
RAG_BACKEND=builtin
- External provider (example):
RAG_BACKEND=r2rR2R_BASE_URL=https://r2r.example.comR2R_API_KEY=...(sent asX-API-Key)