Architecture

CKB Architecture

Overview

CKB (Code Knowledge Backend) is designed as a layered system that abstracts multiple code intelligence backends behind a unified query interface. The architecture has evolved through several versions:

v6.0 — Architectural Memory (persistent knowledge)
v6.1 — Background Jobs & CI/CD Integration
v6.2 — Federation (cross-repo queries)
v6.2.1 — Daemon Mode (always-on service)
v6.2.2 — Tree-sitter Complexity
v6.3 — Contract-Aware Impact Analysis
v6.4 — Runtime Telemetry

┌─────────────────────────────────────────────────────────┐
│                    Interfaces                            │
│  ┌─────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   CLI   │  │  HTTP API   │  │     MCP Server      │  │
│  └────┬────┘  └──────┬──────┘  └──────────┬──────────┘  │
└───────┼──────────────┼────────────────────┼─────────────┘
        │              │                    │
        └──────────────┼────────────────────┘
                       ▼
┌─────────────────────────────────────────────────────────┐
│                   Query Engine                           │
│  ┌────────────┐  ┌────────────┐  ┌────────────────────┐ │
│  │   Router   │  │  Merger    │  │    Compressor      │ │
│  └────────────┘  └────────────┘  └────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│              Architectural Memory (v6.0)                 │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────┐ │
│  │  Modules  │  │ Ownership │  │ Hotspots  │  │ ADRs │ │
│  │  Registry │  │  Registry │  │  Tracker  │  │      │ │
│  └───────────┘  └───────────┘  └───────────┘  └──────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│            Cross-Cutting Concerns (v6.1-v6.4)            │
│  ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌────────────┐ │
│  │  Jobs   │ │Federation│ │ Complexity│ │ Telemetry  │ │
│  │ (v6.1)  │ │  (v6.2)  │ │  (v6.2.2) │ │   (v6.4)   │ │
│  └─────────┘ └──────────┘ └───────────┘ └────────────┘ │
│  ┌─────────┐ ┌──────────┐ ┌───────────────────────────┐ │
│  │ Daemon  │ │Contracts │ │      (v6.2.1 Services)    │ │
│  │ (v6.2.1)│ │  (v6.3)  │ │ Scheduler│Watcher│Webhooks│ │
│  └─────────┘ └──────────┘ └───────────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│                   Backend Layer                          │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌───────────┐  │
│  │  SCIP   │  │   LSP   │  │   Git   │  │Tree-sitter│  │
│  └─────────┘  └─────────┘  └─────────┘  └───────────┘  │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│                   Storage Layer                          │
│  ┌────────────────┐  ┌────────────────────────────────┐ │
│  │    SQLite      │  │         Cache Tiers            │ │
│  │  (Symbols,     │  │  Query │ View │ Negative       │ │
│  │   Ownership,   │  │  Cache │ Cache│ Cache          │ │
│  │   Decisions,   │  │                                │ │
│  │   Telemetry)   │  │                                │ │
│  └────────────────┘  └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Core Components

1. Interface Layer

CLI (`cmd/ckb/`)

Cobra-based command structure
Human-readable output
Interactive commands

HTTP API (`internal/api/`)

REST endpoints
JSON responses
OpenAPI specification
Middleware (logging, CORS, recovery)

MCP Server (`internal/mcp/`)

Model Context Protocol implementation
Tool definitions for AI assistants
Streaming support

2. Query Engine

Router

Routes queries to appropriate backends based on:

Query type (definition, references, search)
Backend availability
Query policy configuration

Merger

Combines results from multiple backends:

prefer-first: Use first successful response
union: Merge all responses, deduplicate

Compressor (`internal/compression/`)

Optimizes responses for LLM consumption:

Enforces response budgets
Truncates with drilldown suggestions
Deduplicates results

3. Backend Layer

SCIP Backend

Reads pre-computed SCIP indexes
Fastest and most accurate
Requires index generation

LSP Backend

Communicates with language servers
Real-time analysis
May require workspace initialization

Git Backend

Fallback for basic operations
File listing, blame, history
Always available in git repos

4. Architectural Memory Layer (v6.0)

v6.0 introduces persistent architectural knowledge that survives across sessions.

Module Registry (`internal/modules/`)

Tracks module boundaries from MODULES.toml or inference
Stores responsibilities, ownership, and tags
Supports declared (explicit) and inferred (automatic) modules

Ownership Registry (`internal/ownership/`)

Parses CODEOWNERS files (confidence: 1.0)
Computes git-blame ownership (confidence: 0.79)
Tracks ownership history over time
Merges sources with priority: CODEOWNERS > blame > heuristic

Hotspot Tracker

Stores historical hotspot snapshots (append-only)
Computes trends (increasing/stable/decreasing)
Projects future scores based on velocity

Decision Log (`internal/decisions/`)

Parses ADR markdown files
Indexes decisions for search
Links decisions to affected modules

5. Storage Layer

SQLite Database (`.ckb/ckb.db`)

Core Tables (v5.x):

symbol_mappings - Stable ID to backend ID mappings
symbol_aliases - Redirect mappings for renamed symbols
modules - Detected modules cache
dependency_edges - Module dependency graph

Architectural Memory Tables (v6.0):

ownership - Ownership rules with source and confidence
ownership_history - Ownership changes over time (append-only)
hotspot_snapshots - Historical hotspot metrics (append-only)
responsibilities - Module/file responsibility descriptions
decisions - ADR metadata (content in markdown files)
module_renames - Tracks module ID changes across renames

Full-Text Search:

decisions_fts - FTS5 index for decision search
responsibilities_fts - FTS5 index for responsibility search

Cache Tiers

Tier	TTL	Key Contains	Use Case
Query Cache	5 min	headCommit	Frequent queries
View Cache	1 hour	repoStateId	Expensive computations
Negative Cache	5-60s	repoStateId	Avoid repeated failures

Persistence Model (v6.0)

~/.ckb/
├── config.toml              # global config
└── repos/
    └── <repo-hash>/
        ├── ckb.db            # unified SQLite database
        ├── decisions/        # ADR markdown files (canonical)
        │   ├── ADR-001-*.md
        │   └── ...
        └── index.scip        # SCIP index

Data Classification:

Data Type	Classification	Rebuild Behavior
Declared modules	Canonical	Preserved
Inferred modules	Derived	Regenerated
CODEOWNERS rules	Canonical	Reparsed from file
Git-blame ownership	Derived	Regenerated
Hotspot snapshots	Derived (append-only)	Kept; new appended
ADR files	Canonical	Never rebuilt
ADR index	Derived	Regenerated from files

Key Subsystems

Identity System (`internal/identity/`)

Provides stable symbol identification across refactors.

┌─────────────────────────────────────────┐
│           Symbol Identity               │
│                                         │
│  Stable ID: ckb:repo:sym:<fingerprint>  │
│                                         │
│  Fingerprint = hash(                    │
│    container + name + kind + signature  │
│  )                                      │
└─────────────────────────────────────────┘

Alias Resolution:

Old ID ──alias──> New ID ──alias──> Current ID
         │                 │
         └── max depth: 3 ─┘

Impact Analysis (`internal/impact/`)

Analyzes the blast radius of code changes.

┌─────────────────────────────────────────┐
│           Impact Analysis               │
│                                         │
│  1. Derive Visibility                   │
│     - SCIP modifiers (0.95 confidence)  │
│     - Reference patterns (0.7-0.9)      │
│     - Naming conventions (0.5-0.7)      │
│                                         │
│  2. Classify References                 │
│     - direct-caller                     │
│     - transitive-caller                 │
│     - type-dependency                   │
│     - test-dependency                   │
│                                         │
│  3. Compute Risk Score                  │
│     - Visibility (30%)                  │
│     - Direct callers (35%)              │
│     - Module spread (25%)               │
│     - Impact kind (10%)                 │
└─────────────────────────────────────────┘

Deterministic Output (`internal/output/`)

Ensures identical queries produce identical bytes.

Guarantees:

Stable key ordering (alphabetical)
Float precision (6 decimals)
Consistent sorting (multi-field, stable)
Nil/empty field omission

Ownership Algorithm (v6.0)

Computes code ownership from git blame with time decay.

┌─────────────────────────────────────────┐
│           Ownership Algorithm           │
│                                         │
│  1. Run git blame on file               │
│  2. Filter out bots + merge commits     │
│  3. Apply time decay:                   │
│     weight = 0.5 ^ (age / 90 days)      │
│  4. Normalize weights to 0-1            │
│  5. Assign scope:                       │
│     >= 50% → maintainer                 │
│     >= 20% → reviewer                   │
│     >= 5%  → contributor                │
└─────────────────────────────────────────┘

Source Priority:

CODEOWNERS file (confidence: 1.0)
Git blame (confidence: 0.79)
Heuristics (confidence: 0.59)

Staleness Model (v6.0)

Architectural data can become stale:

Staleness	Condition	Action
fresh	< 7 days, < 50 commits	Use as-is
aging	7-30 days or 50-200 commits	Use with warning
stale	30-90 days or 200-500 commits	Suggest refresh
obsolete	> 90 days or > 500 commits	Require refresh

Repository State (`internal/repostate/`)

Tracks repository state for cache invalidation.

RepoStateID = hash(
  headCommit +
  stagedDiffHash +
  workingTreeDiffHash +
  untrackedListHash
)

Data Flow

Query Flow

1. Request arrives (CLI/HTTP/MCP)
           │
           ▼
2. Parse parameters, validate
           │
           ▼
3. Check cache (query/view/negative)
           │
      ┌────┴────┐
      │ cached? │
      └────┬────┘
           │
     yes ──┴── no
      │        │
      ▼        ▼
4. Return   5. Route to backends
   cached      │
              ┌┴┐
              │ │ (parallel or sequential)
              └┬┘
               │
               ▼
6. Merge results
               │
               ▼
7. Compress (apply budget)
               │
               ▼
8. Generate drilldowns
               │
               ▼
9. Cache result
               │
               ▼
10. Return response

Symbol Resolution Flow

1. Receive symbol ID
         │
         ▼
2. Check if alias exists
         │
    ┌────┴────┐
    │ alias?  │
    └────┬────┘
         │
   yes ──┴── no
    │        │
    ▼        │
3. Follow   │
   chain    │
   (max 3)  │
    │        │
    └────┬───┘
         │
         ▼
4. Return resolved symbol
   (with redirect info if aliased)

Configuration

Query Policy

{
  "queryPolicy": {
    "backendLadder": ["scip", "lsp", "git"],
    "mergeStrategy": "prefer-first"
  }
}

Response Budget

{
  "budget": {
    "maxModules": 10,
    "maxSymbolsPerModule": 5,
    "maxImpactItems": 20,
    "maxDrilldowns": 5,
    "estimatedMaxTokens": 4000
  }
}

Backend Limits

{
  "backendLimits": {
    "maxRefsPerQuery": 10000,
    "maxSymbolsPerSearch": 1000,
    "maxFilesScanned": 5000,
    "maxUnionModeTimeMs": 60000
  }
}

Error Handling

Error Taxonomy (`internal/errors/`)

All errors include:

Error code (machine-readable)
Message (human-readable)
Details (context-specific)
Suggested fixes
Drilldown queries

Negative Caching

Failed queries are cached to avoid repeated failures:

Error Type	TTL	Triggers Warmup
symbol-not-found	60s	No
backend-unavailable	15s	No
workspace-not-ready	10s	Yes
timeout	5s	No

Extension Points

Adding a New Backend

Implement backend interface in internal/backends/
Register in backend factory
Add to configuration schema
Update backend ladder options

Adding a New Tool

Add handler in internal/api/handlers.go
Register route in internal/api/routes.go
Add MCP tool definition in internal/mcp/
Update OpenAPI spec

Adding a New Cache Tier

Add table in internal/storage/schema.go
Implement cache methods in internal/storage/cache.go
Define invalidation triggers

Cross-Cutting Subsystems (v6.1-v6.4)

Background Jobs (`internal/jobs/`) — v6.1

Async job execution for long-running operations.

┌─────────────────────────────────────────┐
│            Job Queue                     │
│                                         │
│  States: queued → running → completed   │
│                    ↓                    │
│                  failed                 │
│                                         │
│  Job Types:                             │
│  - refresh_architecture                 │
│  - analyze_impact                       │
│  - federation_sync                      │
│  - export                               │
└─────────────────────────────────────────┘

Features:

SQLite-backed persistence
Progress tracking with percentage
Cancellation support
Result storage

Federation (`internal/federation/`) — v6.2

Cross-repository query aggregation.

┌─────────────────────────────────────────┐
│           Federation                     │
│                                         │
│  ~/.ckb/federation/<name>/              │
│  ├── config.toml   (repo list)          │
│  └── index.db      (aggregated data)    │
│                                         │
│  Indexed Data:                          │
│  - Modules (top N per repo)             │
│  - Ownership patterns                   │
│  - Hotspots (top 20 per repo)           │
│  - Decisions (all)                      │
│  - Contracts (v6.3)                     │
└─────────────────────────────────────────┘

Staleness Model:

fresh: all repos synced < 24h
aging: some repos 1-7 days old
stale: some repos 7-30 days old
obsolete: some repos > 30 days old

Daemon Mode (`internal/daemon/`) — v6.2.1

Always-on background service.

┌─────────────────────────────────────────┐
│           Daemon Process                 │
│                                         │
│  ┌───────────┐  ┌───────────────────┐  │
│  │ HTTP API  │  │    Components     │  │
│  │ :9120     │  │                   │  │
│  └───────────┘  │  ┌─────────────┐  │  │
│                 │  │  Scheduler  │  │  │
│  Storage:       │  │  (cron/int) │  │  │
│  ~/.ckb/daemon/ │  ├─────────────┤  │  │
│  ├── daemon.pid │  │   Watcher   │  │  │
│  ├── daemon.log │  │  (fsnotify) │  │  │
│  └── daemon.db  │  ├─────────────┤  │  │
│                 │  │  Webhooks   │  │  │
│                 │  │  (outbound) │  │  │
│                 │  └─────────────┘  │  │
└─────────────────────────────────────────┘

Scheduler (internal/scheduler/):

Cron expressions: */5 * * * *
Interval syntax: every 30m, daily at 02:00
Task types: refresh, federation_sync, cleanup, health_check

Watcher (internal/watcher/):

Monitors .git/HEAD and .git/index
Debounced refresh (default 5s)
Configurable ignore patterns

Webhooks (internal/webhooks/):

Formats: JSON, Slack, PagerDuty, Discord
HMAC-SHA256 signing
Exponential backoff retry (5 attempts)
Dead letter queue

Tree-sitter Complexity (`internal/complexity/`) — v6.2.2

Language-agnostic code complexity metrics.

┌─────────────────────────────────────────┐
│        Complexity Analysis               │
│                                         │
│  Supported: Go, JS, TS, Python,         │
│             Rust, Java, Kotlin          │
│                                         │
│  Metrics:                               │
│  ┌─────────────────────────────────┐   │
│  │ Cyclomatic = Σ decision points  │   │
│  │   (if, for, while, switch,      │   │
│  │    case, &&, ||, catch, ?:)     │   │
│  ├─────────────────────────────────┤   │
│  │ Cognitive = Σ (nesting × cost)  │   │
│  │   Penalizes deep nesting        │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Integration: feeds getHotspots risk   │
└─────────────────────────────────────────┘

Contract Analysis (`internal/federation/contracts/`) — v6.3

Cross-repo API contract tracking.

┌─────────────────────────────────────────┐
│        Contract Analysis                 │
│                                         │
│  Contract Types:                        │
│  - proto (.proto files)                 │
│  - openapi (.yaml/.json with openapi)   │
│                                         │
│  Visibility Classification:             │
│  - public: api/, proto/, versioned      │
│  - internal: internal/, testdata/       │
│  - unknown: no clear signals            │
│                                         │
│  Evidence Tiers:                        │
│  ┌────────────────────────────────┐    │
│  │ Tier 1 (declared): buf.yaml,   │    │
│  │   proto imports, *.pb.go       │    │
│  │   Confidence: 1.0              │    │
│  ├────────────────────────────────┤    │
│  │ Tier 2 (derived): type match,  │    │
│  │   package refs                 │    │
│  │   Confidence: 0.7-0.9          │    │
│  ├────────────────────────────────┤    │
│  │ Tier 3 (heuristic): naming     │    │
│  │   patterns (hidden by default) │    │
│  │   Confidence: ≤0.5             │    │
│  └────────────────────────────────┘    │
│                                         │
│  Risk Assessment:                       │
│  - Low: ≤2 consumers, internal         │
│  - Medium: 3-5 consumers               │
│  - High: >5 consumers, public, services│
└─────────────────────────────────────────┘

Runtime Telemetry (`internal/telemetry/`) — v6.4

Observed usage from production runtime.

┌─────────────────────────────────────────┐
│        Telemetry Integration             │
│                                         │
│  Ingest:                                │
│  ┌─────────────────────────────────┐   │
│  │ OTLP (/v1/metrics)              │   │
│  │ JSON (/api/v1/ingest/json)      │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Symbol Matching                 │   │
│  │                                 │   │
│  │ Exact:  file + func + line     │   │
│  │         → confidence 0.95      │   │
│  │ Strong: file + func            │   │
│  │         → confidence 0.85      │   │
│  │ Weak:   namespace + func       │   │
│  │         → confidence 0.60      │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Storage (SQLite)                │   │
│  │ - observed_usage table          │   │
│  │ - Weekly/monthly buckets        │   │
│  │ - 365-day retention             │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Coverage Model                  │   │
│  │                                 │   │
│  │ attribute: % events w/ attrs   │   │
│  │ match: % matched to symbols    │   │
│  │ service: % repos w/ telemetry  │   │
│  │ overall: weighted average      │   │
│  │                                 │   │
│  │ Levels:                        │   │
│  │ high (≥0.8): full features     │   │
│  │ medium (≥0.6): with warnings   │   │
│  │ low (≥0.4): usage only         │   │
│  │ insufficient (<0.4): disabled  │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Dead Code Detection:

Requires medium+ coverage
Only exact/strong matches
Confidence capped at 0.90
Configurable exclusions (tests, migrations, scheduled jobs)

Impact Enrichment:

Adds observedImpact to analyzeImpact
Shows observed callers not found in static analysis
Comparison: staticOnly vs observedOnly vs both

Hotspot Enhancement:

Adds observedUsage field to hotspots
Usage weight (0.20) in scoring formula

Uh oh!

Uh oh!

Architecture

CKB Architecture

Overview

Core Components

1. Interface Layer

CLI (cmd/ckb/)

HTTP API (internal/api/)

MCP Server (internal/mcp/)

2. Query Engine

Router

Merger

Compressor (internal/compression/)

3. Backend Layer

SCIP Backend

LSP Backend

Git Backend

4. Architectural Memory Layer (v6.0)

Module Registry (internal/modules/)

Ownership Registry (internal/ownership/)

Hotspot Tracker

Decision Log (internal/decisions/)

5. Storage Layer

SQLite Database (.ckb/ckb.db)

Cache Tiers

Persistence Model (v6.0)

Key Subsystems

Identity System (internal/identity/)

Impact Analysis (internal/impact/)

Deterministic Output (internal/output/)

Ownership Algorithm (v6.0)

Staleness Model (v6.0)

Repository State (internal/repostate/)

Data Flow

Query Flow

Symbol Resolution Flow

Configuration

Query Policy

Response Budget

Backend Limits

Error Handling

Error Taxonomy (internal/errors/)

Negative Caching

Extension Points

Adding a New Backend

Adding a New Tool

Adding a New Cache Tier

Cross-Cutting Subsystems (v6.1-v6.4)

Background Jobs (internal/jobs/) — v6.1

Federation (internal/federation/) — v6.2

Daemon Mode (internal/daemon/) — v6.2.1

Tree-sitter Complexity (internal/complexity/) — v6.2.2

Contract Analysis (internal/federation/contracts/) — v6.3

Runtime Telemetry (internal/telemetry/) — v6.4

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CLI (`cmd/ckb/`)

HTTP API (`internal/api/`)

MCP Server (`internal/mcp/`)

Compressor (`internal/compression/`)

Module Registry (`internal/modules/`)

Ownership Registry (`internal/ownership/`)

Decision Log (`internal/decisions/`)

SQLite Database (`.ckb/ckb.db`)

Identity System (`internal/identity/`)

Impact Analysis (`internal/impact/`)

Deterministic Output (`internal/output/`)

Repository State (`internal/repostate/`)

Error Taxonomy (`internal/errors/`)

Background Jobs (`internal/jobs/`) — v6.1

Federation (`internal/federation/`) — v6.2

Daemon Mode (`internal/daemon/`) — v6.2.1

Tree-sitter Complexity (`internal/complexity/`) — v6.2.2

Contract Analysis (`internal/federation/contracts/`) — v6.3

Runtime Telemetry (`internal/telemetry/`) — v6.4