hydrag-core

HydRAG — Multi-Headed Retrieval-Augmented Generation with CRAG supervision.

A standalone, domain-agnostic retrieval pipeline that fuses multiple retrieval heads via Reciprocal Rank Fusion (RRF) and uses a Corrective RAG (CRAG) supervisor to judge context sufficiency before triggering fallback strategies.

Features

Multi-headed retrieval — Five pipeline heads: BM25 fast-path → Primary (hybrid / code-aware) → CRAG supervisor → Semantic fallback → Web fallback
Per-head control — Enable or disable individual heads at runtime via config or environment variables
Domain-agnostic by default — prose profile works with any corpus (documentation, legal, medical, etc.)
Code-aware opt-in — code profile detects symbols (CamelCase, snake_case, dotted paths) and routes to code-aware search
CRAG supervisor — LLM-graded context sufficiency with streaming verdict parsing and confidence-gated fast-path skip
CRAG classifier — Optional distilled DistilBERT binary classifier (<15 ms inference) replaces LLM-based CRAG via teacher-student training pipeline
Pluggable adapters — Implement the VectorStoreAdapter protocol to connect any vector store
Multi-provider LLM support — Select built-in ollama, huggingface, or openai_compat via config, or inject any custom LLMProvider
Zero required dependencies — stdlib-only core; optional extras for ChromaDB, Firecrawl, fine-tuning, and dev tools
Fully typed — PEP 561 compatible with py.typed marker; strict mypy passes

Installation

pip install hydrag-core

With optional extras:

pip install hydrag-core[chromadb]     # ChromaDB adapter support
pip install hydrag-core[firecrawl]    # Web fallback via Firecrawl
pip install hydrag-core[tune]         # CRAG classifier fine-tuning (transformers, torch, onnxruntime)
pip install hydrag-core[dev]          # Development tools (pytest, ruff, mypy)

Quick Start

from hydrag import HydRAG, HydRAGConfig

# 1. Implement the VectorStoreAdapter protocol for your store
class MyAdapter:
    def semantic_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def keyword_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def hybrid_search(self, query: str, n_results: int = 5) -> list[str]: ...

# 2. Create engine and search
adapter = MyAdapter()
engine = HydRAG(adapter)
results = engine.search("How do I configure logging?")

for r in results:
    print(f"[{r.head_origin}] score={r.score:.4f}: {r.text[:80]}")

Code Profile

config = HydRAGConfig(profile="code")
engine = HydRAG(adapter, config=config)
# Queries with symbols (e.g. "HttpRequest class") trigger code-aware retrieval
results = engine.search("How does `HttpRequest` handle timeouts?")

Custom LLM Provider

from hydrag import LLMProvider

class MyLLM:
    def generate(self, prompt: str, model: str = "", timeout: int = 30) -> str | None:
        # Your LLM inference here
        ...

engine = HydRAG(adapter, llm=MyLLM())

Built-In LLM Providers

HydRAG can construct the LLM provider from config without custom wiring.

from hydrag import HydRAG, HydRAGConfig

# Default: Ollama
cfg = HydRAGConfig(llm_provider="ollama", ollama_host="http://localhost:11434")
engine = HydRAG(adapter, config=cfg)

# Hugging Face TGI-compatible endpoint
cfg = HydRAGConfig(
    llm_provider="huggingface",
    hf_api_base="http://localhost:8080",
    hf_model_id="meta-llama/Llama-3.1-8B-Instruct",
)
engine = HydRAG(adapter, config=cfg)

# OpenAI-compatible endpoint (vLLM, LM Studio, OpenRouter-compatible gateway, etc.)
cfg = HydRAGConfig(
    llm_provider="openai_compat",
    openai_compat_api_base="http://localhost:8000",
    openai_compat_model="Qwen/Qwen2.5-7B-Instruct",
)
engine = HydRAG(adapter, config=cfg)

Tokens are read from environment by default:

HYDRAG_HF_API_TOKEN for llm_provider="huggingface"
HYDRAG_OPENAI_COMPAT_API_KEY for llm_provider="openai_compat"

Streaming CRAG Provider

For lower-latency CRAG verdict parsing, implement StreamingLLMProvider:

from hydrag import StreamingLLMProvider

class MyStreamingLLM:
    def generate(self, prompt: str, model: str = "", timeout: int = 30) -> str | None: ...
    def generate_stream(self, prompt: str, model: str = "", timeout: int = 30) -> str | None:
        # Return full response; first token is parsed for early verdict
        ...

engine = HydRAG(adapter, llm=MyStreamingLLM())

Per-Head Control

# Run only BM25 fast-path + primary retrieval (skip CRAG, semantic, web)
config = HydRAGConfig(
    enable_head_0=True,
    enable_head_1=True,
    enable_head_2_crag=False,
    enable_head_3a_semantic=False,
    enable_head_3b_web=False,
)
engine = HydRAG(adapter, config=config)
results = engine.search("quick keyword lookup")

CRAG Classifier (Optional)

from hydrag import tune, CRAGClassifier

# Fine-tune a fast classifier from your corpus
tune(
    adapter=adapter,
    llm=llm,
    output_dir="models/crag_classifier",
    n_samples=500,
)

# Use it — auto-detected when crag_mode="auto" and path is set
config = HydRAGConfig(
    crag_mode="auto",
    crag_classifier_path="models/crag_classifier",
)
engine = HydRAG(adapter, config=config)

Configuration

All settings can be set via environment variables (HYDRAG_ prefix) or passed directly:

Setting	Env Var	Default	Description
`profile`	`HYDRAG_PROFILE`	`"prose"`	`"prose"` or `"code"`
`embedding_model`	`HYDRAG_EMBEDDING_MODEL`	`"Alibaba-NLP/gte-Qwen2-7B-instruct"`	Embedding model name
`crag_model`	`CRAG_MODEL`	`"qwen3:4b"`	Model for CRAG supervisor
`crag_timeout`	`CRAG_TIMEOUT`	`30`	Timeout (seconds) for CRAG calls
`ollama_host`	`OLLAMA_HOST`	`"http://localhost:11434"`	Ollama API endpoint
`llm_provider`	`HYDRAG_LLM_PROVIDER`	`"ollama"`	LLM backend selector: `ollama`, `huggingface`, `openai_compat`
`hf_model_id`	`HYDRAG_HF_MODEL_ID`	`""`	Optional model id for Hugging Face provider
`hf_api_base`	`HYDRAG_HF_API_BASE`	`""`	Hugging Face/TGI base URL (required when `llm_provider="huggingface"`)
`hf_timeout`	`HYDRAG_HF_TIMEOUT`	`30`	Timeout (seconds) for Hugging Face provider calls
`openai_compat_api_base`	`HYDRAG_OPENAI_COMPAT_API_BASE`	`""`	OpenAI-compatible API base URL (required when `llm_provider="openai_compat"`)
`openai_compat_model`	`HYDRAG_OPENAI_COMPAT_MODEL`	`""`	Model name for OpenAI-compatible provider (required when `llm_provider="openai_compat"`)
`openai_compat_timeout`	`HYDRAG_OPENAI_COMPAT_TIMEOUT`	`30`	Timeout (seconds) for OpenAI-compatible provider calls
`openai_compat_endpoint`	`HYDRAG_OPENAI_COMPAT_ENDPOINT`	`"/v1/chat/completions"`	Override OpenAI-compatible chat completions endpoint path
`enable_web_fallback`	`HYDRAG_ENABLE_WEB_FALLBACK`	`false`	Enable Firecrawl web fallback
`allow_web_on_empty_primary`	`HYDRAG_ALLOW_WEB_ON_EMPTY`	`false`	Allow web fallback when Head 1 returns empty
`allow_markdown_in_web_fallback`	`HYDRAG_ALLOW_MARKDOWN_WEB`	`false`	Preserve markdown in web fallback content
`rrf_k`	`HYDRAG_RRF_K`	`60`	RRF smoothing constant
`min_candidate_pool`	`HYDRAG_MIN_CANDIDATE_POOL`	`8`	Minimum candidates per head
`web_chunk_limit`	`HYDRAG_WEB_CHUNK_LIMIT`	`3000`	Max chars per web chunk after sanitization
`crag_min_relevance`	`HYDRAG_CRAG_MIN_RELEVANCE`	`0.12`	Minimum relevance threshold for CRAG
`crag_context_chunks`	`HYDRAG_CRAG_CONTEXT_CHUNKS`	`5`	Chunks sent to CRAG supervisor
`crag_char_limit`	`HYDRAG_CRAG_CHAR_LIMIT`	`1500`	Per-chunk character limit in CRAG prompt
`enable_fast_path`	`HYDRAG_ENABLE_FAST_PATH`	`true`	Head 0 BM25 fast-path
`fast_path_bm25_threshold`	`HYDRAG_FAST_PATH_BM25_THRESHOLD`	`0.6`	BM25 score threshold for fast-path
`fast_path_confidence_threshold`	`HYDRAG_FAST_PATH_CONFIDENCE_THRESHOLD`	`0.7`	Score threshold to skip CRAG entirely
`crag_stream`	`HYDRAG_CRAG_STREAM`	`true`	Parse first token for early CRAG verdict
`crag_mode`	`HYDRAG_CRAG_MODE`	`"auto"`	`"auto"`, `"llm"`, or `"classifier"`
`crag_classifier_path`	`HYDRAG_CRAG_CLASSIFIER_PATH`	`""`	Path to ONNX classifier model
`enable_head_0`	`HYDRAG_ENABLE_HEAD_0`	`true`	BM25 fast-path head
`enable_head_1`	`HYDRAG_ENABLE_HEAD_1`	`true`	Primary retrieval head
`enable_head_2_crag`	`HYDRAG_ENABLE_HEAD_2_CRAG`	`true`	CRAG supervisor head
`enable_head_3a_semantic`	`HYDRAG_ENABLE_HEAD_3A_SEMANTIC`	`true`	Semantic fallback head
`enable_head_3b_web`	`HYDRAG_ENABLE_HEAD_3B_WEB`	`false`	Web fallback head
`fallback_timeout_s`	`HYDRAG_FALLBACK_TIMEOUT_S`	`30.0`	Timeout for fallback head futures
`rrf_head_weights`	`HYDRAG_RRF_HEAD_WEIGHTS`	(JSON)	Per-head RRF weight map

# From environment
config = HydRAGConfig.from_env()

# Direct
config = HydRAGConfig(profile="code", rrf_k=100, enable_web_fallback=True)

Adapter Protocol

Required methods (must implement all three):

class VectorStoreAdapter(Protocol):
    def semantic_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def keyword_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def hybrid_search(self, query: str, n_results: int = 5) -> list[str]: ...

Optional methods (gracefully handled if missing):

    def crag_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def graph_search(self, query: str, n_results: int = 5) -> list[str]: ...
    def rewrite_query(self, query: str) -> str: ...

Pipeline Architecture

Query → [Profile Router]
           │
           ├─ Head 0: BM25 Fast-Path
           │    └─ score ≥ threshold? → early return
           │    └─ score ≥ confidence threshold? → skip CRAG
           │
           ├─ Head 1: Primary Retrieval
           │    ├─ prose → hybrid_search()
           │    └─ code + symbols → code-aware (semantic + keyword RRF)
           │
           ├─ Head 2: CRAG Supervisor
           │    ├─ classifier (ONNX, <15ms) ← crag_mode="auto"|"classifier"
           │    └─ LLM (streaming verdict)  ← crag_mode="auto"|"llm"
           │         ├─ SUFFICIENT → return Head 1 results
           │         └─ INSUFFICIENT ↓
           │
           ├─ Head 3a: Semantic Fallback
           │    └─ rewrite + CRAG search + keyword + graph → RRF
           │
           └─ Head 3b: Web Fallback (optional)
                └─ Firecrawl search → sanitize → RRF

           Final: RRF Fusion across all active heads → list[RetrievalResult]

Development

# Clone and install in development mode
git clone https://github.com/studio-playbook/hydrag-core.git
cd hydrag-core
pip install -e ".[dev]"

# Run tests
python -m pytest tests/

# Lint
ruff check src/ tests/

# Type check
mypy src/

Versioning

HydRAG uses SemVer. Version sources that must stay in sync:

pyproject.toml ([project].version)
src/hydrag/_version.py (__version__)

Type Checking

This package ships a PEP 561 py.typed marker. Downstream projects using mypy --strict will get full type coverage out of the box.

License

Apache-2.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs		docs
src/hydrag		src/hydrag
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hydrag-core

Features

Installation

Quick Start

Code Profile

Custom LLM Provider

Built-In LLM Providers

Streaming CRAG Provider

Per-Head Control

CRAG Classifier (Optional)

Configuration

Adapter Protocol

Pipeline Architecture

Development

Versioning

Type Checking

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

hydrag-core

Features

Installation

Quick Start

Code Profile

Custom LLM Provider

Built-In LLM Providers

Streaming CRAG Provider

Per-Head Control

CRAG Classifier (Optional)

Configuration

Adapter Protocol

Pipeline Architecture

Development

Versioning

Type Checking

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors