-
-
Notifications
You must be signed in to change notification settings - Fork 11
Architecture
CKB (Code Knowledge Backend) is designed as a layered system that abstracts multiple code intelligence backends behind a unified query interface. The architecture has evolved through several versions:
- v6.0 — Architectural Memory (persistent knowledge)
- v6.1 — Background Jobs & CI/CD Integration
- v6.2 — Federation (cross-repo queries)
- v6.2.1 — Daemon Mode (always-on service)
- v6.2.2 — Tree-sitter Complexity
- v6.3 — Contract-Aware Impact Analysis
- v6.4 — Runtime Telemetry
┌─────────────────────────────────────────────────────────┐
│ Interfaces │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ HTTP API │ │ MCP Server │ │
│ └────┬────┘ └──────┬──────┘ └──────────┬──────────┘ │
└───────┼──────────────┼────────────────────┼─────────────┘
│ │ │
└──────────────┼────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Query Engine │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ Router │ │ Merger │ │ Compressor │ │
│ └────────────┘ └────────────┘ └────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Architectural Memory (v6.0) │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────┐ │
│ │ Modules │ │ Ownership │ │ Hotspots │ │ ADRs │ │
│ │ Registry │ │ Registry │ │ Tracker │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ └──────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Cross-Cutting Concerns (v6.1-v6.4) │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Jobs │ │Federation│ │ Complexity│ │ Telemetry │ │
│ │ (v6.1) │ │ (v6.2) │ │ (v6.2.2) │ │ (v6.4) │ │
│ └─────────┘ └──────────┘ └───────────┘ └────────────┘ │
│ ┌─────────┐ ┌──────────┐ ┌───────────────────────────┐ │
│ │ Daemon │ │Contracts │ │ (v6.2.1 Services) │ │
│ │ (v6.2.1)│ │ (v6.3) │ │ Scheduler│Watcher│Webhooks│ │
│ └─────────┘ └──────────┘ └───────────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Backend Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ SCIP │ │ LSP │ │ Git │ │Tree-sitter│ │
│ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────┐
│ Storage Layer │
│ ┌────────────────┐ ┌────────────────────────────────┐ │
│ │ SQLite │ │ Cache Tiers │ │
│ │ (Symbols, │ │ Query │ View │ Negative │ │
│ │ Ownership, │ │ Cache │ Cache│ Cache │ │
│ │ Decisions, │ │ │ │
│ │ Telemetry) │ │ │ │
│ └────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
- Cobra-based command structure
- Human-readable output
- Interactive commands
- REST endpoints
- JSON responses
- OpenAPI specification
- Middleware (logging, CORS, recovery)
- Model Context Protocol implementation
- Tool definitions for AI assistants
- Streaming support
Routes queries to appropriate backends based on:
- Query type (definition, references, search)
- Backend availability
- Query policy configuration
Combines results from multiple backends:
- prefer-first: Use first successful response
- union: Merge all responses, deduplicate
Optimizes responses for LLM consumption:
- Enforces response budgets
- Truncates with drilldown suggestions
- Deduplicates results
- Reads pre-computed SCIP indexes
- Fastest and most accurate
- Requires index generation
- Communicates with language servers
- Real-time analysis
- May require workspace initialization
- Fallback for basic operations
- File listing, blame, history
- Always available in git repos
v6.0 introduces persistent architectural knowledge that survives across sessions.
- Tracks module boundaries from MODULES.toml or inference
- Stores responsibilities, ownership, and tags
- Supports declared (explicit) and inferred (automatic) modules
- Parses CODEOWNERS files (confidence: 1.0)
- Computes git-blame ownership (confidence: 0.79)
- Tracks ownership history over time
- Merges sources with priority: CODEOWNERS > blame > heuristic
- Stores historical hotspot snapshots (append-only)
- Computes trends (increasing/stable/decreasing)
- Projects future scores based on velocity
- Parses ADR markdown files
- Indexes decisions for search
- Links decisions to affected modules
Core Tables (v5.x):
-
symbol_mappings- Stable ID to backend ID mappings -
symbol_aliases- Redirect mappings for renamed symbols -
modules- Detected modules cache -
dependency_edges- Module dependency graph
Architectural Memory Tables (v6.0):
-
ownership- Ownership rules with source and confidence -
ownership_history- Ownership changes over time (append-only) -
hotspot_snapshots- Historical hotspot metrics (append-only) -
responsibilities- Module/file responsibility descriptions -
decisions- ADR metadata (content in markdown files) -
module_renames- Tracks module ID changes across renames
Full-Text Search:
-
decisions_fts- FTS5 index for decision search -
responsibilities_fts- FTS5 index for responsibility search
| Tier | TTL | Key Contains | Use Case |
|---|---|---|---|
| Query Cache | 5 min | headCommit | Frequent queries |
| View Cache | 1 hour | repoStateId | Expensive computations |
| Negative Cache | 5-60s | repoStateId | Avoid repeated failures |
~/.ckb/
├── config.toml # global config
└── repos/
└── <repo-hash>/
├── ckb.db # unified SQLite database
├── decisions/ # ADR markdown files (canonical)
│ ├── ADR-001-*.md
│ └── ...
└── index.scip # SCIP index
Data Classification:
| Data Type | Classification | Rebuild Behavior |
|---|---|---|
| Declared modules | Canonical | Preserved |
| Inferred modules | Derived | Regenerated |
| CODEOWNERS rules | Canonical | Reparsed from file |
| Git-blame ownership | Derived | Regenerated |
| Hotspot snapshots | Derived (append-only) | Kept; new appended |
| ADR files | Canonical | Never rebuilt |
| ADR index | Derived | Regenerated from files |
Provides stable symbol identification across refactors.
┌─────────────────────────────────────────┐
│ Symbol Identity │
│ │
│ Stable ID: ckb:repo:sym:<fingerprint> │
│ │
│ Fingerprint = hash( │
│ container + name + kind + signature │
│ ) │
└─────────────────────────────────────────┘
Alias Resolution:
Old ID ──alias──> New ID ──alias──> Current ID
│ │
└── max depth: 3 ─┘
Analyzes the blast radius of code changes.
┌─────────────────────────────────────────┐
│ Impact Analysis │
│ │
│ 1. Derive Visibility │
│ - SCIP modifiers (0.95 confidence) │
│ - Reference patterns (0.7-0.9) │
│ - Naming conventions (0.5-0.7) │
│ │
│ 2. Classify References │
│ - direct-caller │
│ - transitive-caller │
│ - type-dependency │
│ - test-dependency │
│ │
│ 3. Compute Risk Score │
│ - Visibility (30%) │
│ - Direct callers (35%) │
│ - Module spread (25%) │
│ - Impact kind (10%) │
└─────────────────────────────────────────┘
Ensures identical queries produce identical bytes.
Guarantees:
- Stable key ordering (alphabetical)
- Float precision (6 decimals)
- Consistent sorting (multi-field, stable)
- Nil/empty field omission
Computes code ownership from git blame with time decay.
┌─────────────────────────────────────────┐
│ Ownership Algorithm │
│ │
│ 1. Run git blame on file │
│ 2. Filter out bots + merge commits │
│ 3. Apply time decay: │
│ weight = 0.5 ^ (age / 90 days) │
│ 4. Normalize weights to 0-1 │
│ 5. Assign scope: │
│ >= 50% → maintainer │
│ >= 20% → reviewer │
│ >= 5% → contributor │
└─────────────────────────────────────────┘
Source Priority:
- CODEOWNERS file (confidence: 1.0)
- Git blame (confidence: 0.79)
- Heuristics (confidence: 0.59)
Architectural data can become stale:
| Staleness | Condition | Action |
|---|---|---|
| fresh | < 7 days, < 50 commits | Use as-is |
| aging | 7-30 days or 50-200 commits | Use with warning |
| stale | 30-90 days or 200-500 commits | Suggest refresh |
| obsolete | > 90 days or > 500 commits | Require refresh |
Tracks repository state for cache invalidation.
RepoStateID = hash(
headCommit +
stagedDiffHash +
workingTreeDiffHash +
untrackedListHash
)
1. Request arrives (CLI/HTTP/MCP)
│
▼
2. Parse parameters, validate
│
▼
3. Check cache (query/view/negative)
│
┌────┴────┐
│ cached? │
└────┬────┘
│
yes ──┴── no
│ │
▼ ▼
4. Return 5. Route to backends
cached │
┌┴┐
│ │ (parallel or sequential)
└┬┘
│
▼
6. Merge results
│
▼
7. Compress (apply budget)
│
▼
8. Generate drilldowns
│
▼
9. Cache result
│
▼
10. Return response
1. Receive symbol ID
│
▼
2. Check if alias exists
│
┌────┴────┐
│ alias? │
└────┬────┘
│
yes ──┴── no
│ │
▼ │
3. Follow │
chain │
(max 3) │
│ │
└────┬───┘
│
▼
4. Return resolved symbol
(with redirect info if aliased)
{
"queryPolicy": {
"backendLadder": ["scip", "lsp", "git"],
"mergeStrategy": "prefer-first"
}
}{
"budget": {
"maxModules": 10,
"maxSymbolsPerModule": 5,
"maxImpactItems": 20,
"maxDrilldowns": 5,
"estimatedMaxTokens": 4000
}
}{
"backendLimits": {
"maxRefsPerQuery": 10000,
"maxSymbolsPerSearch": 1000,
"maxFilesScanned": 5000,
"maxUnionModeTimeMs": 60000
}
}All errors include:
- Error code (machine-readable)
- Message (human-readable)
- Details (context-specific)
- Suggested fixes
- Drilldown queries
Failed queries are cached to avoid repeated failures:
| Error Type | TTL | Triggers Warmup |
|---|---|---|
| symbol-not-found | 60s | No |
| backend-unavailable | 15s | No |
| workspace-not-ready | 10s | Yes |
| timeout | 5s | No |
- Implement backend interface in
internal/backends/ - Register in backend factory
- Add to configuration schema
- Update backend ladder options
- Add handler in
internal/api/handlers.go - Register route in
internal/api/routes.go - Add MCP tool definition in
internal/mcp/ - Update OpenAPI spec
- Add table in
internal/storage/schema.go - Implement cache methods in
internal/storage/cache.go - Define invalidation triggers
Async job execution for long-running operations.
┌─────────────────────────────────────────┐
│ Job Queue │
│ │
│ States: queued → running → completed │
│ ↓ │
│ failed │
│ │
│ Job Types: │
│ - refresh_architecture │
│ - analyze_impact │
│ - federation_sync │
│ - export │
└─────────────────────────────────────────┘
Features:
- SQLite-backed persistence
- Progress tracking with percentage
- Cancellation support
- Result storage
Cross-repository query aggregation.
┌─────────────────────────────────────────┐
│ Federation │
│ │
│ ~/.ckb/federation/<name>/ │
│ ├── config.toml (repo list) │
│ └── index.db (aggregated data) │
│ │
│ Indexed Data: │
│ - Modules (top N per repo) │
│ - Ownership patterns │
│ - Hotspots (top 20 per repo) │
│ - Decisions (all) │
│ - Contracts (v6.3) │
└─────────────────────────────────────────┘
Staleness Model:
- fresh: all repos synced < 24h
- aging: some repos 1-7 days old
- stale: some repos 7-30 days old
- obsolete: some repos > 30 days old
Always-on background service.
┌─────────────────────────────────────────┐
│ Daemon Process │
│ │
│ ┌───────────┐ ┌───────────────────┐ │
│ │ HTTP API │ │ Components │ │
│ │ :9120 │ │ │ │
│ └───────────┘ │ ┌─────────────┐ │ │
│ │ │ Scheduler │ │ │
│ Storage: │ │ (cron/int) │ │ │
│ ~/.ckb/daemon/ │ ├─────────────┤ │ │
│ ├── daemon.pid │ │ Watcher │ │ │
│ ├── daemon.log │ │ (fsnotify) │ │ │
│ └── daemon.db │ ├─────────────┤ │ │
│ │ │ Webhooks │ │ │
│ │ │ (outbound) │ │ │
│ │ └─────────────┘ │ │
└─────────────────────────────────────────┘
Scheduler (internal/scheduler/):
- Cron expressions:
*/5 * * * * - Interval syntax:
every 30m,daily at 02:00 - Task types: refresh, federation_sync, cleanup, health_check
Watcher (internal/watcher/):
- Monitors
.git/HEADand.git/index - Debounced refresh (default 5s)
- Configurable ignore patterns
Webhooks (internal/webhooks/):
- Formats: JSON, Slack, PagerDuty, Discord
- HMAC-SHA256 signing
- Exponential backoff retry (5 attempts)
- Dead letter queue
Language-agnostic code complexity metrics.
┌─────────────────────────────────────────┐
│ Complexity Analysis │
│ │
│ Supported: Go, JS, TS, Python, │
│ Rust, Java, Kotlin │
│ │
│ Metrics: │
│ ┌─────────────────────────────────┐ │
│ │ Cyclomatic = Σ decision points │ │
│ │ (if, for, while, switch, │ │
│ │ case, &&, ||, catch, ?:) │ │
│ ├─────────────────────────────────┤ │
│ │ Cognitive = Σ (nesting × cost) │ │
│ │ Penalizes deep nesting │ │
│ └─────────────────────────────────┘ │
│ │
│ Integration: feeds getHotspots risk │
└─────────────────────────────────────────┘
Cross-repo API contract tracking.
┌─────────────────────────────────────────┐
│ Contract Analysis │
│ │
│ Contract Types: │
│ - proto (.proto files) │
│ - openapi (.yaml/.json with openapi) │
│ │
│ Visibility Classification: │
│ - public: api/, proto/, versioned │
│ - internal: internal/, testdata/ │
│ - unknown: no clear signals │
│ │
│ Evidence Tiers: │
│ ┌────────────────────────────────┐ │
│ │ Tier 1 (declared): buf.yaml, │ │
│ │ proto imports, *.pb.go │ │
│ │ Confidence: 1.0 │ │
│ ├────────────────────────────────┤ │
│ │ Tier 2 (derived): type match, │ │
│ │ package refs │ │
│ │ Confidence: 0.7-0.9 │ │
│ ├────────────────────────────────┤ │
│ │ Tier 3 (heuristic): naming │ │
│ │ patterns (hidden by default) │ │
│ │ Confidence: ≤0.5 │ │
│ └────────────────────────────────┘ │
│ │
│ Risk Assessment: │
│ - Low: ≤2 consumers, internal │
│ - Medium: 3-5 consumers │
│ - High: >5 consumers, public, services│
└─────────────────────────────────────────┘
Observed usage from production runtime.
┌─────────────────────────────────────────┐
│ Telemetry Integration │
│ │
│ Ingest: │
│ ┌─────────────────────────────────┐ │
│ │ OTLP (/v1/metrics) │ │
│ │ JSON (/api/v1/ingest/json) │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Symbol Matching │ │
│ │ │ │
│ │ Exact: file + func + line │ │
│ │ → confidence 0.95 │ │
│ │ Strong: file + func │ │
│ │ → confidence 0.85 │ │
│ │ Weak: namespace + func │ │
│ │ → confidence 0.60 │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Storage (SQLite) │ │
│ │ - observed_usage table │ │
│ │ - Weekly/monthly buckets │ │
│ │ - 365-day retention │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Coverage Model │ │
│ │ │ │
│ │ attribute: % events w/ attrs │ │
│ │ match: % matched to symbols │ │
│ │ service: % repos w/ telemetry │ │
│ │ overall: weighted average │ │
│ │ │ │
│ │ Levels: │ │
│ │ high (≥0.8): full features │ │
│ │ medium (≥0.6): with warnings │ │
│ │ low (≥0.4): usage only │ │
│ │ insufficient (<0.4): disabled │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
Dead Code Detection:
- Requires medium+ coverage
- Only exact/strong matches
- Confidence capped at 0.90
- Configurable exclusions (tests, migrations, scheduled jobs)
Impact Enrichment:
- Adds
observedImpacttoanalyzeImpact - Shows observed callers not found in static analysis
- Comparison: staticOnly vs observedOnly vs both
Hotspot Enhancement:
- Adds
observedUsagefield to hotspots - Usage weight (0.20) in scoring formula