Hybrid Retrieval

Hybrid Retrieval (v7.4)

CKB v7.4 introduces hybrid retrieval that combines graph-based ranking with traditional text search to dramatically improve search quality.

Overview

Traditional code search relies on text matching (FTS), which finds symbols by name but doesn't understand code relationships. Hybrid retrieval adds Personalized PageRank (PPR) over the symbol graph to boost results that are structurally related to your query.

Results

Metric	Before	After	Improvement
Recall@10	62.1%	100%	+61%
MRR	0.546	0.914	+67%
Latency	29.4ms	4.5ms	-85%

How It Works

1. Initial Search (FTS)

When you search for a symbol, CKB first uses SQLite FTS5 for fast text matching:

Query: "Engine"
FTS Results: Engine, Engine#logger, Engine#config, EngineMock, ...

2. Graph-Based Re-ranking (PPR)

CKB then builds a symbol graph from SCIP edges and runs Personalized PageRank:

Seeds: Top FTS hits (Engine, Engine#logger, ...)
Graph: Call edges, reference edges, type edges
PPR: Propagate importance through graph
Output: Re-ranked by graph proximity + FTS score

3. Fusion Scoring

Multiple signals are combined with learned weights:

Signal	Weight	Description
FTS score	0.40	Text match quality
PPR score	0.30	Graph proximity
Hotspot	0.15	Recent code churn
Recency	0.10	File modification time
Exact match	0.05	Name equality bonus

Final score = weighted sum of normalized signals.

Eval Suite

CKB includes an evaluation framework to measure retrieval quality.

Running Eval

# Run built-in tests
ckb eval

# Custom fixtures
ckb eval --fixtures=./my-tests.json

# JSON output
ckb eval --format=json

Test Types

Needle tests - Find at least one expected symbol in top-K:

{
  "id": "find-engine",
  "type": "needle",
  "query": "Engine",
  "expectedSymbols": ["Engine", "query.Engine"],
  "topK": 10
}

Ranking tests - Verify expected symbol is highly ranked:

{
  "id": "engine-first",
  "type": "ranking",
  "query": "query engine",
  "expectedSymbols": ["Engine"],
  "topK": 3
}

Expansion tests - Check graph connectivity:

{
  "id": "engine-connects-backends",
  "type": "expansion",
  "query": "Engine",
  "expectedSymbols": ["Engine", "Orchestrator", "SCIPAdapter"],
  "topK": 20
}

Metrics

Recall@K - % of tests where expected symbol was in top-K
MRR - Mean Reciprocal Rank (higher = expected found earlier)
Latency - Average query time

PPR Algorithm

Personalized PageRank computes importance scores relative to seed nodes.

Algorithm

Input:
  - seeds: FTS hit symbol IDs
  - graph: SCIP call/reference edges
  - damping: 0.85 (probability of following edge)
  - iterations: 20 (max power iterations)

Process:
  1. Initialize scores: seeds get 1/n, others get 0
  2. Iterate: score[i] = damping * Σ(edge_weight * score[neighbor])
                        + (1-damping) * teleport[i]
  3. Stop when converged or max iterations

Output:
  - Ranked nodes with scores
  - Backtracked paths explaining "why"

Edge Weights

Edge Type	Weight	Meaning
Call	1.0	Function calls function
Definition	0.9	Reference to definition
Reference	0.8	General reference
Implements	0.7	Type implements interface
Type-of	0.6	Instance of type
Same-module	0.3	Co-located symbols

Export Organizer

The exportForLLM tool now includes an organizer step that structures output for better LLM comprehension.

Before (v7.3)

## internal/query/
  ! engine.go
    $ Engine
    # SearchSymbols()
    # GetSymbol()
  ! symbols.go
    # rankSearchResults()

After (v7.4)

## Module Map

| Module | Symbols | Files | Key Exports |
|--------|---------|-------|-------------|
| internal/query | 150 | 12 | Engine, SearchSymbols |
| internal/backends | 80 | 8 | Orchestrator, SCIPAdapter |

## Cross-Module Connections

- internal/query → internal/backends
- internal/mcp → internal/query

## Module Details

### internal/query/

**engine.go**
  $ Engine
  # SearchSymbols() [c=12] ★★
  # GetSymbol() [c=5] ★

Benefits

Module Map - Overview of codebase structure at a glance
Cross-Module Bridges - Key integration points highlighted
Importance Ordering - Most important symbols first
Context Efficiency - LLMs understand structure before details

Configuration

No configuration required. Hybrid retrieval is automatic when:

SCIP index is available (ckb index was run)
Search returns more than 3 results
Symbol graph has nodes

Disabling PPR

If you need to disable PPR re-ranking (not recommended):

// .ckb/config.json
{
  "queryPolicy": {
    "enablePPR": false
  }
}

Research Basis

Hybrid retrieval is based on 2024-2025 research:

Paper	Key Insight
HippoRAG 2 (ICML 2025)	PPR over knowledge graphs improves associative retrieval
CodeRAG (Sep 2025)	Multi-path retrieval + reranking beats single-path
GraphCoder (Jun 2024)	Code context graphs for repo-level retrieval
GraphRAG surveys	Explicit organizer step improves context packing

What's NOT Included

Per CKB's "structured over semantic" principle:

Feature	Why Skipped
Embeddings	Adds complexity, PPR sufficient for code navigation
Learned reranker	Deterministic scoring works well
External vector DB	Violates single-binary principle

Troubleshooting

Low Recall@K

Index freshness - Run ckb index to rebuild
FTS population - Check ckb status for FTS symbol count
Query specificity - More specific queries work better

Slow Queries

Graph size - Very large codebases may need graph pruning
PPR iterations - Default 20 is usually sufficient
Cache - Subsequent queries benefit from caching

Debugging

# Check index status
ckb status

# Run diagnostics
ckb doctor

# Verbose eval output
ckb eval --verbose

Uh oh!

Uh oh!

Hybrid Retrieval

Hybrid Retrieval (v7.4)

Overview

Results

How It Works

1. Initial Search (FTS)

2. Graph-Based Re-ranking (PPR)

3. Fusion Scoring

Eval Suite

Running Eval

Test Types

Metrics

PPR Algorithm

Algorithm

Edge Weights

Export Organizer

Before (v7.3)

After (v7.4)

Benefits

Configuration

Disabling PPR

Research Basis

What's NOT Included

Troubleshooting

Low Recall@K

Slow Queries

Debugging

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally