Skip to content
Lisa edited this page Dec 22, 2025 · 59 revisions

CKB - Code Knowledge Backend

A language-agnostic codebase comprehension layer that orchestrates multiple code intelligence backends (SCIP, LSP, Git) and provides semantically compressed, LLM-optimized views with persistent architectural understanding.

CKB analyzes, indexes, and explains your code but never modifies it. It won't refactor, lint, format, auto-fix, or enforce coding standards. Think of it as a librarian who knows everything about the books but never rewrites them.

What is CKB?

CKB (Code Knowledge Backend) is the missing link between your codebase and AI assistants. While AI coding tools like Claude, Cursor, and GitHub Copilot are powerful, they struggle with large codebases because they lack deep structural understanding of your code.

CKB solves this by providing:

  • A unified query layer that abstracts away the complexity of different code intelligence tools
  • Semantic compression that delivers exactly what an LLM needs without overwhelming its context window
  • Stable symbol tracking that survives refactoring, renames, and code moves
  • Architectural memory that maintains persistent knowledge about your codebase structure, ownership, and design decisions

The Problem CKB Solves

AI Assistants Are Blind to Code Structure

When you ask an AI assistant "what calls this function?", it typically:

  1. Searches for text patterns (error-prone)
  2. Reads random files hoping to find context (inefficient)
  3. Gives up and asks you to provide more context (frustrating)

Existing Tools Don't Talk to Each Other

Your codebase has valuable intelligence scattered across:

  • SCIP indexes - Precise symbol information, but requires setup
  • Language servers - Real-time analysis, but slow for large queries
  • Git - History and blame, but no semantic understanding
  • CODEOWNERS - Ownership rules, but no integration with code intelligence

Each tool speaks a different language. None of them are optimized for AI consumption.

Context Windows Are Limited

Even with 100K+ token context windows, you can't just dump your entire codebase into an LLM. You need:

  • Relevant information only
  • Properly compressed responses
  • Smart truncation with follow-up suggestions

How CKB Helps

For AI-Assisted Development

You: "What's the impact of changing the UserService.authenticate() method?"

CKB provides:
├── Symbol details (signature, visibility, location)
├── 12 direct callers across 4 modules
├── Risk score: HIGH (public API, many dependents)
├── Affected modules: auth, api, admin, tests
├── Code owners: @security-team, @api-team
└── Suggested drilldowns for deeper analysis

For Code Understanding

You: "Show me the architecture of this codebase"

CKB provides:
├── Module dependency graph
├── Key symbols per module
├── Module responsibilities and ownership
├── Import/export relationships
└── Compressed to fit LLM context

For Refactoring Safety

You: "Is it safe to rename this function?"

CKB provides:
├── All references (not just text matches)
├── Cross-module dependencies
├── Test coverage of affected code
├── Hotspot risk assessment
└── Breaking change warnings

For Code Review

You: "Who should review changes to internal/api?"

CKB provides:
├── Primary owners from CODEOWNERS
├── Recent contributors from git blame
├── Related architectural decisions
└── Historical hotspot trends

Key Features

Multi-Backend Orchestration

Query SCIP, LSP, and Git through a single interface. CKB automatically:

  • Routes queries to the best available backend
  • Falls back gracefully when backends are unavailable
  • Merges results from multiple sources

Stable Symbol Identity

Symbols get permanent IDs that survive:

  • Renames (oldNamenewName)
  • Moves (pkg/old/pkg/new/)
  • Refactoring (extract method, inline, etc.)

Old references automatically redirect to current locations.

Smart Compression

Responses are optimized for LLM consumption:

  • Configurable token budgets
  • Intelligent truncation (most relevant first)
  • Drilldown suggestions for deeper exploration
  • Deterministic output for reliable caching

Impact Analysis

Understand the blast radius before making changes:

  • Visibility detection (public/private/internal)
  • Risk scoring based on usage patterns
  • Module-level impact summaries
  • Breaking change detection

Three-Tier Caching

Fast responses through intelligent caching:

  • Query cache - Recent query results
  • View cache - Expensive computations
  • Negative cache - Avoid repeated failures

All caches invalidate automatically when code changes.

Architectural Memory

Persistent knowledge that survives across sessions:

  • Module registry - Boundaries, responsibilities, and tags from MODULES.toml or inference
  • Ownership tracking - CODEOWNERS integration + git-blame analysis with time decay
  • Hotspot trends - Historical risk tracking with trend analysis and 30-day projections
  • Decision log - Architectural Decision Records (ADRs) with full-text search

Background Operations (v6.1)

Long-running operations run asynchronously:

  • Job queue - SQLite-backed job persistence in ~/.ckb/jobs.db
  • Async refresh - refreshArchitecture with async: true returns immediately with jobId
  • Progress tracking - Poll getJobStatus for progress and results
  • Job management - List, filter, and cancel running jobs

CI/CD Integration (v6.1)

Built for automated pipelines:

  • PR analysis - summarizePr assesses risk, suggests reviewers, identifies affected modules
  • Ownership drift - getOwnershipDrift compares CODEOWNERS vs actual contributors
  • GitHub Actions - Example workflows in examples/github-actions/

Federation (v6.2)

Cross-repository queries and unified visibility:

  • Multi-repo collections - Group related repositories into named federations
  • Cross-repo search - Search modules, ownership, hotspots, and decisions across repos
  • Stable identity - UUID-based repo identity survives renames
  • Staleness propagation - Federation freshness reflects weakest link

Daemon Mode (v6.2.1)

Always-on service for continuous code intelligence:

  • Background daemon - Long-running process with HTTP API on port 9120
  • Job queue - Async operations with progress tracking and cancellation
  • Scheduler - Cron and interval expressions for automated refresh
  • File watcher - Git change detection with debounced refresh
  • Webhooks - Outbound notifications to Slack, PagerDuty, Discord with retry logic

Tree-sitter Complexity (v6.2.2)

Language-agnostic complexity metrics via tree-sitter:

  • Multi-language support - Go, JavaScript, TypeScript, Python, Rust, Java, Kotlin
  • Cyclomatic complexity - Decision points analysis (if, for, while, switch, &&, ||)
  • Cognitive complexity - Nesting-weighted complexity for maintainability assessment
  • Hotspot integration - Complexity metrics feed into hotspot risk scores

Contract-Aware Impact Analysis (v6.3)

Cross-repo intelligence through explicit API boundaries:

  • Contract detection - Automatic discovery of protobuf (.proto) and OpenAPI specs
  • Visibility classification - Public, internal, or unknown based on paths and metadata
  • Consumer detection - Three evidence tiers (declared, derived, heuristic)
  • Impact analysis - "What breaks if I change this shared API?"
  • Risk assessment - Low/medium/high risk with detailed factors
  • Transitive analysis - Follow proto import graphs across repos

Runtime Telemetry (v6.4)

Observed reality through OpenTelemetry integration:

  • OTLP ingest - Accept metrics from OpenTelemetry Collector
  • Symbol matching - Map telemetry to code symbols (exact, strong, weak quality levels)
  • Coverage tracking - Know how much of your code is observed
  • Usage display - See actual call counts for any symbol
  • Dead code detection - Find symbols with zero runtime calls
  • Blended confidence - Combine static analysis with observed reality
  • Impact enrichment - Add observed callers to impact analysis

Developer Intelligence (v6.5)

Go beyond what code does to understand why it exists:

  • Symbol origin - Who wrote it, when, why, and what issues/PRs are linked
  • Evolution timeline - How has this code changed over time
  • Co-change coupling - Find files that historically change together
  • Proactive warnings - Detect temporary code, single-author risk, high coupling, staleness
  • LLM export - Token-efficient codebase summaries with importance ranking
  • Risk audit - 8-factor risk scoring (complexity, coverage, bus factor, security, staleness, errors, coupling, churn)
  • Quick wins - Find high-impact, low-effort refactoring targets

Zero-Friction UX (v7.0)

Get started in seconds without building from source:

  • npm distribution - npm install -g @tastehub/ckb or npx @tastehub/ckb
  • 58 MCP tools - Full code intelligence via Model Context Protocol

Zero-Friction Operation (v7.1)

Code intelligence without requiring a SCIP index upfront:

  • Tree-sitter fallback - Symbol extraction for 8 languages without SCIP
  • ckb index command - Auto-detects language and runs the right indexer
  • Universal MCP docs - Setup instructions for all major AI tools

Multi-Tool Setup & Smart Indexing (v7.2)

  • ckb setup - Interactive wizard for Claude Code, Cursor, Windsurf, VS Code, OpenCode, Claude Desktop
  • Extended languages - Added C/C++, Dart, Ruby, C#, PHP indexer support
  • Smart indexing - Skip-if-fresh, freshness tracking, concurrent lock protection
  • ckb mcp --watch - Auto-reindex mode with 30-second polling
  • Explicit tiers - Control analysis depth with --tier=fast|standard|full
  • ckb doctor --tier - Check tool requirements for each analysis tier

Doc-Symbol Linking (v7.3)

Bridge documentation and code with automatic symbol detection:

  • Backtick detection - Automatically detect Symbol.Name references in markdown
  • Directive support - Explicit <!-- ckb:symbol --> and <!-- ckb:module --> directives
  • Fence scanning - Extract symbols from fenced code blocks (8 languages via tree-sitter)
  • Staleness detection - Find broken references when symbols are renamed or deleted
  • Rename awareness - Suggest new names when documented symbols are renamed
  • CI enforcement - --fail-under flag for documentation coverage thresholds
  • known_symbols - Allow single-segment symbol detection via directive

Remote Index Serving (v7.3)

Serve symbol indexes over HTTP for remote federation clients:

  • Index Server Mode - ckb serve --index-server enables remote index endpoints
  • Multi-Repo Support - Serve multiple repositories from a single CKB instance
  • REST API - 10 endpoints for repos, symbols, refs, callgraph, and search
  • HMAC-Signed Cursors - Secure pagination with tamper-proof cursors
  • Privacy Redaction - Per-repo controls for exposing paths, docs, and signatures
  • TOML Configuration - Configure repos, privacy settings, and pagination limits

Ownership Intelligence

Know who owns what code:

  • Parse CODEOWNERS files automatically
  • Compute ownership from git blame with time decay
  • Track ownership changes over time
  • Suggest reviewers for pull requests

Hotspot Detection

Identify volatile areas before they become problems:

  • Track churn metrics over time
  • Compute composite risk scores (churn + coupling + complexity)
  • Detect trends (increasing/stable/decreasing)
  • Project future hotspot scores

Use Cases

Use Case Without CKB With CKB
Find all callers Grep + manual filtering Precise semantic results
Understand function Read surrounding files Structured summary with context
Safe refactoring Hope for the best Impact analysis + risk score
Code review Check changed files only See downstream effects + owners
Onboarding Read docs + explore Query architecture instantly
Find code owner Search CODEOWNERS manually Query ownership for any path
Track tech debt Gut feeling Hotspot trends with data

Who Should Use CKB?

  • Developers using AI assistants - Give your AI tools superpowers
  • Teams with large codebases - Navigate complexity efficiently
  • Anyone doing refactoring - Understand impact before changing
  • Code reviewers - See the full picture of changes
  • Tech leads - Track architectural health over time

Table of Contents

  • Quick Start - Step-by-step installation for Windows, macOS, and Linux
  • Prompt Cookbook - Real prompts for real problems (start here if you're new!)
  • Language Support - Which languages work best, SCIP indexers, and support tiers
  • Practical Limits - Accuracy notes, blind spots, and how to validate results
  • User Guide - Getting started, CLI commands, best practices
  • Incremental Indexing - Fast index updates for Go projects, accuracy guarantees (v7.3)
  • Doc-Symbol Linking - Automatic symbol detection in documentation, staleness checking (v7.3)
  • Authentication - API tokens, scopes, rate limiting for index server (v7.3)
  • Federation - Cross-repository queries, contract analysis, and unified visibility (v6.3)
  • Telemetry - Runtime observability, dead code detection, observed usage (v6.4)
  • CI/CD Integration - GitHub Actions workflows, PR analysis, automated refresh
  • API Reference - HTTP API documentation
  • Daemon Mode - Always-on service with scheduler, watcher, and webhooks (v6.2.1)
  • MCP Integration - Claude Desktop / AI assistant setup (71 tools available)
  • Architecture - System design and components
  • Configuration - All configuration options, MODULES.toml format, and ADR workflow
  • Performance - Latency targets and benchmarks
  • Contributing - Development guidelines

Installation

npm (Recommended)

# Install globally
npm install -g @tastehub/ckb

# Or run directly without installing
npx @tastehub/ckb --help

Build from Source

git clone https://github.com/SimplyLiz/CodeMCP.git
cd CodeMCP
go build -o ckb ./cmd/ckb

New to CKB? See the Quick Start guide for detailed instructions.

Quick Start

# Initialize in your project
cd /path/to/your/project
ckb init   # or: npx @tastehub/ckb init

# Generate SCIP index (auto-detects language)
ckb index

# Check status
ckb status

# Configure Claude Code
ckb setup

# Search for symbols
ckb search "myFunction"

# Find references
ckb refs "symbol-id"

# Analyze impact
ckb impact "symbol-id"

# Query ownership
ckb ownership internal/api/handler.go

# View architectural decisions
ckb decisions

# Start MCP server for AI assistants
ckb mcp

MCP Tools (74 Available)

CKB exposes code intelligence through the Model Context Protocol:

v5.1 — Core Navigation

Tool Purpose
searchSymbols Find symbols by name with filtering
getSymbol Get symbol details
findReferences Find all usages
explainSymbol AI-friendly symbol explanation
justifySymbol Keep/investigate/remove verdict
getCallGraph Caller/callee relationships
getModuleOverview Module statistics
analyzeImpact Change risk analysis
getStatus System health
doctor Diagnostics

v5.2 — Discovery & Flow

Tool Purpose
traceUsage How is this symbol reached?
listEntrypoints System entrypoints (API, CLI, jobs)
explainFile File-level orientation
explainPath Why does this path exist?
summarizeDiff What changed, what might break?
getArchitecture Module dependency overview
getHotspots Volatile areas with trends
listKeyConcepts Domain concepts in codebase
recentlyRelevant What matters now?

v6.0 — Architectural Memory

Tool Purpose
getOwnership Who owns this code?
getModuleResponsibilities What does this module do?
recordDecision Create an ADR
getDecisions Query architectural decisions
annotateModule Add module metadata
refreshArchitecture Rebuild architectural model

v6.1 — Production Ready

Tool Purpose
getJobStatus Query background job status
listJobs List jobs with filters
cancelJob Cancel queued/running job
summarizePr PR risk analysis & reviewers
getOwnershipDrift CODEOWNERS vs actual ownership

v6.2 — Federation

Tool Purpose
listFederations List all federations
federationStatus Get federation status
federationRepos List repos in federation
federationSearchModules Cross-repo module search
federationSearchOwnership Cross-repo ownership search
federationGetHotspots Merged hotspots across repos
federationSearchDecisions Cross-repo decision search
federationSync Sync federation index

v6.2.1 — Daemon Mode

Tool Purpose
daemonStatus Daemon health and stats
listSchedules List scheduled tasks
runSchedule Run a schedule immediately
listWebhooks List configured webhooks
testWebhook Send test webhook
webhookDeliveries Get delivery history

v6.2.2 — Tree-sitter Complexity

Tool Purpose
getFileComplexity Cyclomatic/cognitive complexity metrics

v6.3 — Contract-Aware Impact Analysis

Tool Purpose
listContracts List contracts in federation
analyzeContractImpact Analyze impact of contract changes
getContractDependencies Get contract deps for a repo
suppressContractEdge Suppress false positive edge
verifyContractEdge Verify an edge
getContractStats Contract statistics

v6.4 — Runtime Telemetry

Tool Purpose
getTelemetryStatus Coverage metrics and sync status
getObservedUsage Observed usage data for a symbol
findDeadCodeCandidates Find symbols with zero runtime calls

v6.5 — Developer Intelligence

Tool Purpose
explainOrigin Why does this code exist? (origin, evolution, warnings)
analyzeCoupling Find files/symbols that change together
exportForLLM LLM-friendly codebase export with importance ranking
auditRisk Multi-signal risk audit (8 weighted factors)

v7.0 — Zero-Friction UX

Feature Description
npm distribution npm install -g @tastehub/ckb or npx @tastehub/ckb
ckb setup Auto-configure Claude Code integration
ckb index Auto-detect language and run SCIP indexer
Analysis tiers Works without SCIP (basic), better with it (enhanced)

v7.3 — Doc-Symbol Linking & Production Hardening

Feature Description
Doc-Symbol Linking Bridge documentation and code with automatic symbol detection
indexDocs Scan and index documentation
getDocsForSymbol Find docs referencing a symbol
getSymbolsInDoc List symbols in a document
getDocsForModule Find docs linked to a module
checkDocStaleness Check for stale references
getDocCoverage Documentation coverage stats

v7.3 — Multi-Repo Management

Tool Purpose
listRepos List registered repos with state and active status
switchRepo Switch active repo context for MCP session
getActiveRepo Get information about currently active repo
Feature Description
Repo Registry Global ~/.ckb/repos.json for named repo shortcuts
ckb repo add Register a repository by name
ckb repo list List repos grouped by state
ckb mcp --repo Start MCP with specific repo active
Multi-Engine Up to 5 engines in memory with LRU eviction

Incremental Indexing v4 (Production-grade):

Feature Description
Delta Artifacts CI-generated diffs for O(delta) ingestion instead of O(N)
FTS5 Search SQLite FTS5 for instant search (replaces LIKE scans)
Compaction Scheduler Automatic snapshot cleanup and database maintenance
Prometheus Metrics /metrics endpoint for monitoring
Load Shedding Graceful degradation under load with priority endpoints

Language Quality (v7.3):

Feature Description
Language Tiers 4-tier classification based on indexer maturity
Quality Assessment Per-language metrics (ref accuracy, callgraph quality)
/meta/languages Language quality dashboard endpoint
/meta/python-env Python venv detection with recommendations
/meta/typescript-monorepo TypeScript monorepo detection (pnpm, lerna, nx)

Interfaces

CKB provides three ways to interact:

Interface Best For
CLI Quick queries, scripting, CI/CD
HTTP API Web integrations, custom tools
MCP Server Claude Desktop, AI assistants

License

Free for personal use. Commercial/enterprise use requires a license. See LICENSE for details.

Clone this wiki locally