-
-
Notifications
You must be signed in to change notification settings - Fork 11
Architecture
Lisa edited this page Dec 16, 2025
·
11 revisions
CKB (Code Knowledge Backend) is designed as a layered system that abstracts multiple code intelligence backends behind a unified query interface.
┌─────────────────────────────────────────────────────────┐
│ Interfaces │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ HTTP API │ │ MCP Server │ │
│ └────┬────┘ └──────┬──────┘ └──────────┬──────────┘ │
└───────┼──────────────┼────────────────────┼─────────────┘
│ │ │
└──────────────┼────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Query Engine │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ Router │ │ Merger │ │ Compressor │ │
│ └────────────┘ └────────────┘ └────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Backend Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ SCIP │ │ LSP │ │ Git │ │ (Glean) │ │
│ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Storage Layer │
│ ┌────────────────┐ ┌────────────────────────────────┐ │
│ │ SQLite │ │ Cache Tiers │ │
│ │ (Symbols, │ │ Query │ View │ Negative │ │
│ │ Aliases) │ │ Cache │ Cache│ Cache │ │
│ └────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
- Cobra-based command structure
- Human-readable output
- Interactive commands
- REST endpoints
- JSON responses
- OpenAPI specification
- Middleware (logging, CORS, recovery)
- Model Context Protocol implementation
- Tool definitions for AI assistants
- Streaming support
Routes queries to appropriate backends based on:
- Query type (definition, references, search)
- Backend availability
- Query policy configuration
Combines results from multiple backends:
- prefer-first: Use first successful response
- union: Merge all responses, deduplicate
Optimizes responses for LLM consumption:
- Enforces response budgets
- Truncates with drilldown suggestions
- Deduplicates results
- Reads pre-computed SCIP indexes
- Fastest and most accurate
- Requires index generation
- Communicates with language servers
- Real-time analysis
- May require workspace initialization
- Fallback for basic operations
- File listing, blame, history
- Always available in git repos
Tables:
-
symbol_mappings- Stable ID to backend ID mappings -
symbol_aliases- Redirect mappings for renamed symbols -
modules- Detected modules cache -
dependency_edges- Module dependency graph
| Tier | TTL | Key Contains | Use Case |
|---|---|---|---|
| Query Cache | 5 min | headCommit | Frequent queries |
| View Cache | 1 hour | repoStateId | Expensive computations |
| Negative Cache | 5-60s | repoStateId | Avoid repeated failures |
Provides stable symbol identification across refactors.
┌─────────────────────────────────────────┐
│ Symbol Identity │
│ │
│ Stable ID: ckb:repo:sym:<fingerprint> │
│ │
│ Fingerprint = hash( │
│ container + name + kind + signature │
│ ) │
└─────────────────────────────────────────┘
Alias Resolution:
Old ID ──alias──> New ID ──alias──> Current ID
│ │
└── max depth: 3 ─┘
Analyzes the blast radius of code changes.
┌─────────────────────────────────────────┐
│ Impact Analysis │
│ │
│ 1. Derive Visibility │
│ - SCIP modifiers (0.95 confidence) │
│ - Reference patterns (0.7-0.9) │
│ - Naming conventions (0.5-0.7) │
│ │
│ 2. Classify References │
│ - direct-caller │
│ - transitive-caller │
│ - type-dependency │
│ - test-dependency │
│ │
│ 3. Compute Risk Score │
│ - Visibility (30%) │
│ - Direct callers (35%) │
│ - Module spread (25%) │
│ - Impact kind (10%) │
└─────────────────────────────────────────┘
Ensures identical queries produce identical bytes.
Guarantees:
- Stable key ordering (alphabetical)
- Float precision (6 decimals)
- Consistent sorting (multi-field, stable)
- Nil/empty field omission
Tracks repository state for cache invalidation.
RepoStateID = hash(
headCommit +
stagedDiffHash +
workingTreeDiffHash +
untrackedListHash
)
1. Request arrives (CLI/HTTP/MCP)
│
▼
2. Parse parameters, validate
│
▼
3. Check cache (query/view/negative)
│
┌────┴────┐
│ cached? │
└────┬────┘
│
yes ──┴── no
│ │
▼ ▼
4. Return 5. Route to backends
cached │
┌┴┐
│ │ (parallel or sequential)
└┬┘
│
▼
6. Merge results
│
▼
7. Compress (apply budget)
│
▼
8. Generate drilldowns
│
▼
9. Cache result
│
▼
10. Return response
1. Receive symbol ID
│
▼
2. Check if alias exists
│
┌────┴────┐
│ alias? │
└────┬────┘
│
yes ──┴── no
│ │
▼ │
3. Follow │
chain │
(max 3) │
│ │
└────┬───┘
│
▼
4. Return resolved symbol
(with redirect info if aliased)
{
"queryPolicy": {
"backendLadder": ["scip", "lsp", "git"],
"mergeStrategy": "prefer-first"
}
}{
"budget": {
"maxModules": 10,
"maxSymbolsPerModule": 5,
"maxImpactItems": 20,
"maxDrilldowns": 5,
"estimatedMaxTokens": 4000
}
}{
"backendLimits": {
"maxRefsPerQuery": 10000,
"maxSymbolsPerSearch": 1000,
"maxFilesScanned": 5000,
"maxUnionModeTimeMs": 60000
}
}All errors include:
- Error code (machine-readable)
- Message (human-readable)
- Details (context-specific)
- Suggested fixes
- Drilldown queries
Failed queries are cached to avoid repeated failures:
| Error Type | TTL | Triggers Warmup |
|---|---|---|
| symbol-not-found | 60s | No |
| backend-unavailable | 15s | No |
| workspace-not-ready | 10s | Yes |
| timeout | 5s | No |
- Implement backend interface in
internal/backends/ - Register in backend factory
- Add to configuration schema
- Update backend ladder options
- Add handler in
internal/api/handlers.go - Register route in
internal/api/routes.go - Add MCP tool definition in
internal/mcp/ - Update OpenAPI spec
- Add table in
internal/storage/schema.go - Implement cache methods in
internal/storage/cache.go - Define invalidation triggers