-
-
Notifications
You must be signed in to change notification settings - Fork 11
Architecture
CKB (Code Knowledge Backend) is designed as a layered system that abstracts multiple code intelligence backends behind a unified query interface. v6.0 adds an Architectural Memory layer for persistent knowledge.
┌─────────────────────────────────────────────────────────┐
│ Interfaces │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ HTTP API │ │ MCP Server │ │
│ └────┬────┘ └──────┬──────┘ └──────────┬──────────┘ │
└───────┼──────────────┼────────────────────┼─────────────┘
│ │ │
└──────────────┼────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Query Engine │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ Router │ │ Merger │ │ Compressor │ │
│ └────────────┘ └────────────┘ └────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Architectural Memory (v6.0) │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────┐ │
│ │ Modules │ │ Ownership │ │ Hotspots │ │ ADRs │ │
│ │ Registry │ │ Registry │ │ Tracker │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ └──────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Backend Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ SCIP │ │ LSP │ │ Git │ │ (Glean) │ │
│ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Storage Layer │
│ ┌────────────────┐ ┌────────────────────────────────┐ │
│ │ SQLite │ │ Cache Tiers │ │
│ │ (Symbols, │ │ Query │ View │ Negative │ │
│ │ Aliases, │ │ Cache │ Cache│ Cache │ │
│ │ Ownership, │ │ │ │
│ │ Decisions) │ │ │ │
│ └────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
- Cobra-based command structure
- Human-readable output
- Interactive commands
- REST endpoints
- JSON responses
- OpenAPI specification
- Middleware (logging, CORS, recovery)
- Model Context Protocol implementation
- Tool definitions for AI assistants
- Streaming support
Routes queries to appropriate backends based on:
- Query type (definition, references, search)
- Backend availability
- Query policy configuration
Combines results from multiple backends:
- prefer-first: Use first successful response
- union: Merge all responses, deduplicate
Optimizes responses for LLM consumption:
- Enforces response budgets
- Truncates with drilldown suggestions
- Deduplicates results
- Reads pre-computed SCIP indexes
- Fastest and most accurate
- Requires index generation
- Communicates with language servers
- Real-time analysis
- May require workspace initialization
- Fallback for basic operations
- File listing, blame, history
- Always available in git repos
v6.0 introduces persistent architectural knowledge that survives across sessions.
- Tracks module boundaries from MODULES.toml or inference
- Stores responsibilities, ownership, and tags
- Supports declared (explicit) and inferred (automatic) modules
- Parses CODEOWNERS files (confidence: 1.0)
- Computes git-blame ownership (confidence: 0.79)
- Tracks ownership history over time
- Merges sources with priority: CODEOWNERS > blame > heuristic
- Stores historical hotspot snapshots (append-only)
- Computes trends (increasing/stable/decreasing)
- Projects future scores based on velocity
- Parses ADR markdown files
- Indexes decisions for search
- Links decisions to affected modules
Core Tables (v5.x):
-
symbol_mappings- Stable ID to backend ID mappings -
symbol_aliases- Redirect mappings for renamed symbols -
modules- Detected modules cache -
dependency_edges- Module dependency graph
Architectural Memory Tables (v6.0):
-
ownership- Ownership rules with source and confidence -
ownership_history- Ownership changes over time (append-only) -
hotspot_snapshots- Historical hotspot metrics (append-only) -
responsibilities- Module/file responsibility descriptions -
decisions- ADR metadata (content in markdown files) -
module_renames- Tracks module ID changes across renames
Full-Text Search:
-
decisions_fts- FTS5 index for decision search -
responsibilities_fts- FTS5 index for responsibility search
| Tier | TTL | Key Contains | Use Case |
|---|---|---|---|
| Query Cache | 5 min | headCommit | Frequent queries |
| View Cache | 1 hour | repoStateId | Expensive computations |
| Negative Cache | 5-60s | repoStateId | Avoid repeated failures |
~/.ckb/
├── config.toml # global config
└── repos/
└── <repo-hash>/
├── ckb.db # unified SQLite database
├── decisions/ # ADR markdown files (canonical)
│ ├── ADR-001-*.md
│ └── ...
└── index.scip # SCIP index
Data Classification:
| Data Type | Classification | Rebuild Behavior |
|---|---|---|
| Declared modules | Canonical | Preserved |
| Inferred modules | Derived | Regenerated |
| CODEOWNERS rules | Canonical | Reparsed from file |
| Git-blame ownership | Derived | Regenerated |
| Hotspot snapshots | Derived (append-only) | Kept; new appended |
| ADR files | Canonical | Never rebuilt |
| ADR index | Derived | Regenerated from files |
Provides stable symbol identification across refactors.
┌─────────────────────────────────────────┐
│ Symbol Identity │
│ │
│ Stable ID: ckb:repo:sym:<fingerprint> │
│ │
│ Fingerprint = hash( │
│ container + name + kind + signature │
│ ) │
└─────────────────────────────────────────┘
Alias Resolution:
Old ID ──alias──> New ID ──alias──> Current ID
│ │
└── max depth: 3 ─┘
Analyzes the blast radius of code changes.
┌─────────────────────────────────────────┐
│ Impact Analysis │
│ │
│ 1. Derive Visibility │
│ - SCIP modifiers (0.95 confidence) │
│ - Reference patterns (0.7-0.9) │
│ - Naming conventions (0.5-0.7) │
│ │
│ 2. Classify References │
│ - direct-caller │
│ - transitive-caller │
│ - type-dependency │
│ - test-dependency │
│ │
│ 3. Compute Risk Score │
│ - Visibility (30%) │
│ - Direct callers (35%) │
│ - Module spread (25%) │
│ - Impact kind (10%) │
└─────────────────────────────────────────┘
Ensures identical queries produce identical bytes.
Guarantees:
- Stable key ordering (alphabetical)
- Float precision (6 decimals)
- Consistent sorting (multi-field, stable)
- Nil/empty field omission
Computes code ownership from git blame with time decay.
┌─────────────────────────────────────────┐
│ Ownership Algorithm │
│ │
│ 1. Run git blame on file │
│ 2. Filter out bots + merge commits │
│ 3. Apply time decay: │
│ weight = 0.5 ^ (age / 90 days) │
│ 4. Normalize weights to 0-1 │
│ 5. Assign scope: │
│ >= 50% → maintainer │
│ >= 20% → reviewer │
│ >= 5% → contributor │
└─────────────────────────────────────────┘
Source Priority:
- CODEOWNERS file (confidence: 1.0)
- Git blame (confidence: 0.79)
- Heuristics (confidence: 0.59)
Architectural data can become stale:
| Staleness | Condition | Action |
|---|---|---|
| fresh | < 7 days, < 50 commits | Use as-is |
| aging | 7-30 days or 50-200 commits | Use with warning |
| stale | 30-90 days or 200-500 commits | Suggest refresh |
| obsolete | > 90 days or > 500 commits | Require refresh |
Tracks repository state for cache invalidation.
RepoStateID = hash(
headCommit +
stagedDiffHash +
workingTreeDiffHash +
untrackedListHash
)
1. Request arrives (CLI/HTTP/MCP)
│
▼
2. Parse parameters, validate
│
▼
3. Check cache (query/view/negative)
│
┌────┴────┐
│ cached? │
└────┬────┘
│
yes ──┴── no
│ │
▼ ▼
4. Return 5. Route to backends
cached │
┌┴┐
│ │ (parallel or sequential)
└┬┘
│
▼
6. Merge results
│
▼
7. Compress (apply budget)
│
▼
8. Generate drilldowns
│
▼
9. Cache result
│
▼
10. Return response
1. Receive symbol ID
│
▼
2. Check if alias exists
│
┌────┴────┐
│ alias? │
└────┬────┘
│
yes ──┴── no
│ │
▼ │
3. Follow │
chain │
(max 3) │
│ │
└────┬───┘
│
▼
4. Return resolved symbol
(with redirect info if aliased)
{
"queryPolicy": {
"backendLadder": ["scip", "lsp", "git"],
"mergeStrategy": "prefer-first"
}
}{
"budget": {
"maxModules": 10,
"maxSymbolsPerModule": 5,
"maxImpactItems": 20,
"maxDrilldowns": 5,
"estimatedMaxTokens": 4000
}
}{
"backendLimits": {
"maxRefsPerQuery": 10000,
"maxSymbolsPerSearch": 1000,
"maxFilesScanned": 5000,
"maxUnionModeTimeMs": 60000
}
}All errors include:
- Error code (machine-readable)
- Message (human-readable)
- Details (context-specific)
- Suggested fixes
- Drilldown queries
Failed queries are cached to avoid repeated failures:
| Error Type | TTL | Triggers Warmup |
|---|---|---|
| symbol-not-found | 60s | No |
| backend-unavailable | 15s | No |
| workspace-not-ready | 10s | Yes |
| timeout | 5s | No |
- Implement backend interface in
internal/backends/ - Register in backend factory
- Add to configuration schema
- Update backend ladder options
- Add handler in
internal/api/handlers.go - Register route in
internal/api/routes.go - Add MCP tool definition in
internal/mcp/ - Update OpenAPI spec
- Add table in
internal/storage/schema.go - Implement cache methods in
internal/storage/cache.go - Define invalidation triggers