Skip to content

Configuration

Lisa edited this page Dec 18, 2025 · 21 revisions

CKB Configuration Guide

Overview

CKB configuration is stored in .ckb/config.json in your repository root. This file is created when you run ckb init.

Note: The v6.0 Architectural Memory features (ownership, decisions, hotspots) are implemented but currently use sensible defaults. Configurable settings for these features are planned for a future release. See v6.0 Configuration Roadmap below.

Configuration File

Location

your-repo/
└── .ckb/
    ├── config.json    # Configuration
    └── ckb.db         # SQLite database

Full Schema (Current Implementation)

{
  "version": 5,
  "repoRoot": ".",
  "backends": {
    "scip": {
      "enabled": true,
      "indexPath": ".scip/index.scip"
    },
    "lsp": {
      "enabled": true,
      "workspaceStrategy": "repo-root",
      "servers": {}
    },
    "git": {
      "enabled": true
    }
  },
  "queryPolicy": {
    "backendPreferenceOrder": ["scip", "glean", "lsp"],
    "alwaysUse": ["git"],
    "maxInFlightPerBackend": { "scip": 10, "lsp": 3, "git": 5 },
    "coalesceWindowMs": 50,
    "mergeMode": "prefer-first",
    "supplementThreshold": 0.8,
    "timeoutMs": { "scip": 5000, "lsp": 15000, "git": 5000 }
  },
  "lspSupervisor": {
    "maxTotalProcesses": 4,
    "queueSizePerLanguage": 10,
    "maxQueueWaitMs": 200
  },
  "modules": {
    "detection": "auto",
    "roots": [],
    "ignore": ["node_modules", "build", ".dart_tool", "vendor"]
  },
  "importScan": {
    "enabled": true,
    "maxFileSizeBytes": 1000000,
    "scanTimeoutMs": 30000,
    "skipBinary": true,
    "customPatterns": {}
  },
  "cache": {
    "queryTtlSeconds": 300,
    "viewTtlSeconds": 3600,
    "negativeTtlSeconds": 60
  },
  "budget": {
    "maxModules": 10,
    "maxSymbolsPerModule": 5,
    "maxImpactItems": 20,
    "maxDrilldowns": 5,
    "estimatedMaxTokens": 4000
  },
  "backendLimits": {
    "maxRefsPerQuery": 10000,
    "maxFilesScanned": 5000,
    "maxUnionModeTimeMs": 60000
  },
  "privacy": {
    "mode": "normal"
  },
  "logging": {
    "level": "info",
    "format": "human"
  }
}

Configuration Sections

version

Schema version number. Current version is 5.

{
  "version": 5
}

repoRoot

Root directory for the repository. Usually ".".

{
  "repoRoot": "."
}

backends

Configure which backends are enabled and their settings.

backends.scip

SCIP (Source Code Intelligence Protocol) backend.

Field Type Default Description
enabled bool true Enable SCIP backend
indexPath string ".scip/index.scip" Path to SCIP index file
{
  "backends": {
    "scip": {
      "enabled": true,
      "indexPath": ".scip/index.scip"
    }
  }
}

backends.lsp

Language Server Protocol backend.

Field Type Default Description
enabled bool true Enable LSP backend
workspaceStrategy string "repo-root" How to initialize workspace
servers object {...} Language-specific server configs

Default servers configured:

  • go: gopls
  • typescript: typescript-language-server
  • dart: dart language-server
  • python: pylsp
{
  "backends": {
    "lsp": {
      "enabled": true,
      "workspaceStrategy": "repo-root",
      "servers": {
        "go": {
          "command": "gopls",
          "args": []
        },
        "typescript": {
          "command": "typescript-language-server",
          "args": ["--stdio"]
        },
        "dart": {
          "command": "dart",
          "args": ["language-server"]
        },
        "python": {
          "command": "pylsp",
          "args": []
        }
      }
    }
  }
}

backends.git

Git backend for blame, history, and fallback operations.

Field Type Default Description
enabled bool true Enable Git backend

queryPolicy

Controls how queries are routed to backends and how results are merged.

Field Type Default Description
backendPreferenceOrder string[] ["scip", "glean", "lsp"] Backend priority order
alwaysUse string[] ["git"] Backends to always query
maxInFlightPerBackend object {...} Max concurrent queries per backend
coalesceWindowMs int 50 Window for coalescing similar queries
mergeMode string "prefer-first" How to merge results
supplementThreshold float 0.8 When to supplement with additional backends
timeoutMs object {...} Timeout per backend in milliseconds

Merge Modes:

  • prefer-first: Use first successful backend response
  • union: Merge all backend responses, deduplicate
{
  "queryPolicy": {
    "backendPreferenceOrder": ["scip", "lsp"],
    "alwaysUse": ["git"],
    "maxInFlightPerBackend": {
      "scip": 10,
      "lsp": 3,
      "git": 5
    },
    "coalesceWindowMs": 50,
    "mergeMode": "prefer-first",
    "supplementThreshold": 0.8,
    "timeoutMs": {
      "scip": 5000,
      "lsp": 15000,
      "git": 5000
    }
  }
}

lspSupervisor

Controls LSP server lifecycle and resource management.

Field Type Default Description
maxTotalProcesses int 4 Max LSP processes across all languages
queueSizePerLanguage int 10 Max queued requests per language
maxQueueWaitMs int 200 Max time to wait in queue
{
  "lspSupervisor": {
    "maxTotalProcesses": 4,
    "queueSizePerLanguage": 10,
    "maxQueueWaitMs": 200
  }
}

modules

Module detection settings.

Field Type Default Description
detection string "auto" Detection strategy
roots string[] [] Additional module roots to include
ignore string[] [...] Directories to ignore

Detection Strategies:

  • auto: Detect based on language markers (go.mod, package.json, etc.)
  • manual: Only use specified roots
  • directory: Treat each top-level directory as a module

Default ignore patterns:

  • node_modules
  • build
  • .dart_tool
  • vendor
{
  "modules": {
    "detection": "auto",
    "roots": ["internal/legacy"],
    "ignore": ["node_modules", "build", ".dart_tool", "vendor", "dist"]
  }
}

importScan

Import/dependency scanning settings.

Field Type Default Description
enabled bool true Enable import scanning
maxFileSizeBytes int 1000000 Skip files larger than this (1 MB)
scanTimeoutMs int 30000 Timeout for scanning (30s)
skipBinary bool true Skip binary files
customPatterns object {} Custom import patterns by language
{
  "importScan": {
    "enabled": true,
    "maxFileSizeBytes": 1000000,
    "scanTimeoutMs": 30000,
    "skipBinary": true,
    "customPatterns": {}
  }
}

cache

Cache tier configuration.

Field Type Default Description
queryTtlSeconds int 300 Query cache TTL (5 min)
viewTtlSeconds int 3600 View cache TTL (1 hour)
negativeTtlSeconds int 60 Negative cache TTL (1 min)
{
  "cache": {
    "queryTtlSeconds": 300,
    "viewTtlSeconds": 3600,
    "negativeTtlSeconds": 60
  }
}

budget

Response budget limits for LLM optimization.

Field Type Default Description
maxModules int 10 Max modules in response
maxSymbolsPerModule int 5 Max symbols per module
maxImpactItems int 20 Max impact items
maxDrilldowns int 5 Max drilldown suggestions
estimatedMaxTokens int 4000 Target token budget
{
  "budget": {
    "maxModules": 10,
    "maxSymbolsPerModule": 5,
    "maxImpactItems": 20,
    "maxDrilldowns": 5,
    "estimatedMaxTokens": 4000
  }
}

backendLimits

Hard limits to protect against resource exhaustion.

Field Type Default Description
maxRefsPerQuery int 10000 Max references per query
maxFilesScanned int 5000 Max files to scan
maxUnionModeTimeMs int 60000 Max time for union merge (60s)
{
  "backendLimits": {
    "maxRefsPerQuery": 10000,
    "maxFilesScanned": 5000,
    "maxUnionModeTimeMs": 60000
  }
}

privacy

Privacy settings.

Field Type Default Description
mode string "normal" Privacy mode

Privacy Modes:

  • normal: Full output with paths and symbols
  • redacted: Paths and symbol names are hashed
{
  "privacy": {
    "mode": "normal"
  }
}

logging

Logging configuration.

Field Type Default Description
level string "info" Log level (debug, info, warn, error)
format string "human" Output format (human, json)
{
  "logging": {
    "level": "info",
    "format": "human"
  }
}

telemetry (v6.4)

Runtime telemetry integration for observed usage and dead code detection.

Field Type Default Description
enabled bool false Enable telemetry features
service_map object {} Static service name → repo ID mapping
service_patterns array [] Regex patterns for service mapping
aggregation object {...} Aggregation settings
dead_code object {...} Dead code detection settings
privacy object {...} Privacy settings

Service Map

Maps service.name from telemetry to repository IDs:

{
  "telemetry": {
    "enabled": true,
    "service_map": {
      "api-gateway": "repo-api",
      "user-service": "repo-users",
      "payment-service": "repo-payments"
    }
  }
}

Service Patterns

Regex patterns for services that follow naming conventions:

{
  "telemetry": {
    "service_patterns": [
      {
        "pattern": "^order-.*$",
        "repo": "repo-orders"
      },
      {
        "pattern": "^inventory-.*$",
        "repo": "repo-inventory"
      }
    ]
  }
}

Resolution Order:

  1. Exact match in service_map
  2. Pattern match in service_patterns (first match wins)
  3. Payload override via ckb_repo_id attribute in telemetry
  4. Unmapped — logged for review

Aggregation Settings

Control how telemetry data is stored:

{
  "telemetry": {
    "aggregation": {
      "bucket_size": "weekly",
      "retention_days": 365,
      "min_calls_to_store": 1
    }
  }
}
Field Type Default Description
bucket_size string "weekly" Aggregation bucket: "weekly" or "monthly"
retention_days int 365 Days to retain telemetry data
min_calls_to_store int 1 Minimum calls to store (filter noise)

Dead Code Detection Settings

Configure dead code detection behavior:

{
  "telemetry": {
    "dead_code": {
      "enabled": true,
      "min_observation_days": 30,
      "exclude_patterns": [
        "**/test/**",
        "**/testdata/**",
        "**/migrations/**",
        "**/mocks/**"
      ],
      "exclude_functions": [
        "*Migration*",
        "Test*",
        "*Scheduled*",
        "*Backup*",
        "*Cron*"
      ]
    }
  }
}
Field Type Default Description
enabled bool true Enable dead code detection
min_observation_days int 30 Minimum days of data before reporting
exclude_patterns string[] [...] Path patterns to exclude
exclude_functions string[] [...] Function name patterns to exclude

Privacy Settings

Control telemetry privacy:

{
  "telemetry": {
    "privacy": {
      "redact_caller_names": false,
      "log_unmatched_events": true
    }
  }
}
Field Type Default Description
redact_caller_names bool false Redact caller service names in storage
log_unmatched_events bool true Log events that couldn't be matched

Full Telemetry Configuration Example

{
  "telemetry": {
    "enabled": true,
    "service_map": {
      "api-gateway": "repo-api",
      "user-service": "repo-users"
    },
    "service_patterns": [
      {
        "pattern": "^order-.*$",
        "repo": "repo-orders"
      }
    ],
    "aggregation": {
      "bucket_size": "weekly",
      "retention_days": 365,
      "min_calls_to_store": 1
    },
    "dead_code": {
      "enabled": true,
      "min_observation_days": 30,
      "exclude_patterns": ["**/test/**", "**/migrations/**"],
      "exclude_functions": ["*Migration*", "Test*", "*Scheduled*"]
    },
    "privacy": {
      "redact_caller_names": false,
      "log_unmatched_events": true
    }
  }
}

OTEL Collector Configuration

CKB accepts telemetry via OTLP. Configure your OpenTelemetry Collector:

# otel-collector-config.yaml
exporters:
  otlphttp/ckb:
    endpoint: "http://localhost:9120"
    tls:
      insecure: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/ckb]

Required Metric: calls counter with these attributes:

  • code.function (required) - Function name
  • code.filepath (recommended) - Source file path
  • code.namespace (recommended) - Package/namespace
  • code.lineno (optional) - Line number for exact matching

Resource Attributes:

  • service.name (required) - Maps to repo via service_map
  • service.version (optional) - For trend analysis

CLI Command Flags

Each CLI command has specific flags. Use --help to see all options:

ckb --help           # Global help
ckb search --help    # Search command help
ckb serve --help     # Server command help

Common Command Flags

ckb search

Flag Default Description
--scope "" Limit to module ID
--kinds "" Filter by kinds (comma-separated)
--limit 20 Max results
--format "json" Output format (json, human)

ckb refs

Flag Default Description
--format "json" Output format

ckb impact

Flag Default Description
--format "json" Output format

ckb arch

Flag Default Description
--format "json" Output format
--depth 1 Dependency depth

ckb serve

Flag Default Description
--port 8080 HTTP port
--host "localhost" Bind address

ckb diag

Flag Default Description
--out "" Output file path
--anonymize false Anonymize paths and symbols

Environment Variables

Configuration can be overridden with environment variables:

Variable Description
CKB_LOG_LEVEL Override log level
CKB_LOG_FORMAT Override log format
CKB_CONFIG_PATH Custom config file path

Examples

Minimal Configuration

{
  "version": 5,
  "backends": {
    "scip": { "enabled": true },
    "lsp": { "enabled": false },
    "git": { "enabled": true }
  }
}

SCIP-Only (Fastest)

{
  "version": 5,
  "backends": {
    "scip": { "enabled": true },
    "lsp": { "enabled": false },
    "git": { "enabled": true }
  },
  "queryPolicy": {
    "backendPreferenceOrder": ["scip"],
    "alwaysUse": ["git"],
    "mergeMode": "prefer-first"
  }
}

Large Codebase

{
  "version": 5,
  "budget": {
    "maxModules": 20,
    "maxSymbolsPerModule": 10,
    "maxImpactItems": 50,
    "estimatedMaxTokens": 8000
  },
  "backendLimits": {
    "maxRefsPerQuery": 50000,
    "maxFilesScanned": 20000
  },
  "cache": {
    "queryTtlSeconds": 600,
    "viewTtlSeconds": 7200
  }
}

Privacy-Focused

{
  "version": 5,
  "privacy": {
    "mode": "redacted"
  },
  "logging": {
    "level": "warn"
  }
}

MODULES.toml Format

The MODULES.toml file allows you to explicitly declare module boundaries and metadata. Place this file in your repository root.

Note: Module declarations in MODULES.toml take priority over auto-detection and provide higher confidence scores.

Schema Version

# MODULES.toml - Explicit module declarations for CKB
version = 1

Complete Example

version = 1

[[module]]
name = "api"
path = "internal/api"
responsibility = "HTTP API handlers and middleware"
owner = "@api-team"
tags = ["core", "api"]
language = "go"

[module.boundaries]
exports = ["Handler", "Middleware", "Router"]
internal = ["internal/api/internal", "internal/api/helpers"]
allowed_dependencies = ["internal/query", "internal/storage"]

[[module]]
name = "query"
path = "internal/query"
responsibility = "Query engine for code intelligence"
owner = "@platform-team"
tags = ["core", "query"]

[[module]]
name = "storage"
path = "internal/storage"
responsibility = "Database operations and persistence"
owner = "@platform-team"
tags = ["core", "database"]

[module.boundaries]
exports = ["Repository", "Query", "Transaction"]
internal = ["internal/storage/schema"]

Module Fields

Field Type Required Description
path string Yes Repo-relative path to the module root
name string No Human-readable name (defaults to last path segment)
id string No Stable module ID (auto-generated if omitted)
responsibility string No One-sentence description of what this module does
owner string No Primary owner (@team or user@email.com)
tags string[] No Classification tags for filtering
language string No Primary language (auto-detected if omitted)

Boundaries Section

The optional [module.boundaries] section defines the module's API surface:

Field Type Description
exports string[] Symbol names that form the public API
internal string[] Paths considered internal/private
allowed_dependencies string[] Modules this module is allowed to depend on

Minimal Example

For simple cases, you only need the path field:

version = 1

[[module]]
path = "internal/api"

[[module]]
path = "internal/query"

[[module]]
path = "pkg/utils"

Generated Module IDs

When id is not specified, CKB generates a stable ID:

ckb:mod:<hash>

The hash is derived from the normalized module path, ensuring the ID remains stable as long as the path doesn't change.

Benefits of Explicit Declaration

  1. Higher confidence - Declared modules have confidence 1.0 vs 0.5-0.7 for inferred
  2. Preserved on refresh - Declared data is never overwritten by inference
  3. Better ownership - Explicit owners override git-blame heuristics
  4. Clearer boundaries - Public/internal patterns define module contracts
  5. Dependency control - allowed_dependencies enables future dependency violation detection

Validation

Validate your MODULES.toml:

ckb doctor

CKB validates:

  • Required path field is present
  • Paths exist in the repository
  • TOML syntax is correct

Architectural Decision Records (ADRs)

CKB includes a complete ADR system for documenting architectural decisions. ADRs are stored as markdown files and indexed for search.

What is an ADR?

An Architectural Decision Record (ADR) documents a significant decision made about the system architecture, including:

  • The context and problem being addressed
  • The decision made
  • The consequences of that decision
  • Alternatives that were considered

ADR Storage Locations

CKB looks for ADRs in these directories (in priority order):

  1. docs/decisions/
  2. docs/adr/
  3. adr/
  4. decisions/
  5. doc/adr/
  6. doc/decisions/

When creating new ADRs via the API/MCP, they are stored in:

  • ~/.ckb/repos/<repo-hash>/decisions/ (v6.0 global persistence)

ADR File Format

ADRs are markdown files with a specific structure:

# ADR-001: Use PostgreSQL for primary database

**Status:** accepted

**Date:** 2024-12-18

**Author:** @platform-team

## Context

We need a primary database for the application. The database must support:
- ACID transactions
- Complex queries with JOINs
- JSON column types for flexible data

## Decision

We will use PostgreSQL 15 as our primary database.

## Consequences

- PostgreSQL provides robust ACID compliance
- JSON/JSONB columns allow schema flexibility
- Requires PostgreSQL expertise on the team
- Need to manage database migrations

## Affected Modules

- internal/storage
- internal/api
- cmd/migrate

## Alternatives Considered

- MySQL - Less JSON support, weaker transaction isolation
- MongoDB - No ACID, eventual consistency concerns
- SQLite - Not suitable for production multi-user access

ADR Statuses

Status Description
proposed Under discussion, not yet accepted
accepted Approved and implemented
deprecated No longer recommended, being phased out
superseded Replaced by a newer decision

ADR Fields

Field Required Description
Title Yes Short description of the decision (in # ADR-NNN: Title format)
Status Yes One of: proposed, accepted, deprecated, superseded
Date No Decision date (defaults to file modification time)
Author No Who made or proposed the decision
Context Yes The problem being addressed
Decision Yes What was decided
Consequences Yes List of effects (positive and negative)
Affected Modules No Which modules are impacted
Alternatives Considered No Other options that were evaluated
Superseded by No ADR ID that replaces this one (if superseded)

File Naming Convention

ADR files should follow this naming pattern:

ADR-001-use-postgresql-for-database.md
adr-002-adopt-hexagonal-architecture.md
003-implement-caching-layer.md

CKB recognizes patterns:

  • ADR-NNN-*.md or adr-NNN-*.md
  • NNN-*.md (3-4 digit number prefix)

Creating ADRs

Via CLI

# Create a new decision (opens in $EDITOR or creates with title)
ckb decisions new --title "Use Redis for caching"

# List all decisions
ckb decisions

# Filter by status
ckb decisions --status accepted

# Search decisions
ckb decisions --search "caching"

Via MCP (recordDecision)

{
  "name": "recordDecision",
  "arguments": {
    "title": "Use Redis for session caching",
    "context": "User sessions need fast access with automatic expiration...",
    "decision": "We will use Redis 7 for session storage...",
    "consequences": [
      "Sessions stored in-memory for fast access",
      "Built-in TTL handles expiration",
      "Requires Redis infrastructure"
    ],
    "affectedModules": ["internal/auth", "internal/session"],
    "alternatives": [
      "Database sessions - too slow for auth checks",
      "JWT only - no server-side revocation"
    ],
    "status": "proposed"
  }
}

Returns:

{
  "adrId": "ADR-005",
  "filePath": "~/.ckb/repos/abc123/decisions/adr-005-use-redis-for-session-caching.md",
  "status": "proposed"
}

Via HTTP API

curl -X POST http://localhost:8080/decisions \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Use Redis for session caching",
    "context": "User sessions need fast access...",
    "decision": "We will use Redis 7...",
    "consequences": ["Fast access", "Built-in TTL"],
    "status": "proposed"
  }'

Querying ADRs

Via MCP (getDecisions)

{
  "name": "getDecisions",
  "arguments": {
    "status": ["accepted", "proposed"],
    "search": "database",
    "affectedModule": "internal/storage",
    "limit": 10
  }
}

Via HTTP API

# List all decisions
curl http://localhost:8080/decisions

# Filter by status
curl "http://localhost:8080/decisions?status=accepted"

# Search
curl "http://localhost:8080/decisions?search=caching"

# Filter by affected module
curl "http://localhost:8080/decisions?affectedModule=internal/api"

ADR Workflow

  1. Propose: Create ADR with status proposed
  2. Discuss: Review with team, update context/alternatives
  3. Accept: Change status to accepted
  4. Implement: Build the solution
  5. Supersede (if needed): Create new ADR, mark old as superseded

Best Practices

  1. One decision per ADR - Keep ADRs focused on a single decision
  2. Include context - Future readers need to understand why, not just what
  3. Document alternatives - Show what was considered and why it was rejected
  4. Link affected modules - Helps with impact analysis
  5. Update status - Keep ADRs current as decisions evolve
  6. Don't delete - Supersede instead; history is valuable

Integration with Federation

ADRs are indexed in federation for cross-repo search:

{
  "name": "federationSearchDecisions",
  "arguments": {
    "federation": "platform",
    "query": "authentication",
    "status": ["accepted"]
  }
}

v6.0 Configuration Roadmap

The v6.0 Architectural Memory features are implemented and working, but currently use hardcoded defaults. Future releases will add configurable settings:

Planned: ownership

{
  "ownership": {
    "enabled": true,
    "codeownersPath": ".github/CODEOWNERS",
    "gitBlameEnabled": true,
    "timeDecayHalfLife": 90,
    "excludeBots": true,
    "botPatterns": ["\\[bot\\]$", "^dependabot", "^renovate"]
  }
}

Current defaults:

  • CODEOWNERS: .github/CODEOWNERS or CODEOWNERS
  • Git blame: enabled with 90-day half-life
  • Bot exclusion: enabled

Planned: decisions

{
  "decisions": {
    "enabled": true,
    "directories": ["docs/decisions", "docs/adr", "adr", "decisions"],
    "storePath": "~/.ckb/repos/{repo}/decisions"
  }
}

Current defaults:

  • Scans common ADR directories
  • Stores new ADRs in CKB data directory

Planned: staleness

{
  "staleness": {
    "freshDays": 7,
    "freshCommits": 50,
    "staleDays": 30,
    "staleCommits": 200,
    "obsoleteDays": 90,
    "obsoleteCommits": 500
  }
}

Current defaults:

  • Fresh: <7 days or <50 commits
  • Stale: 30-90 days or 200-500 commits
  • Obsolete: >90 days or >500 commits

Validation

Validate your configuration:

ckb doctor

This checks:

  • JSON syntax
  • Schema version compatibility
  • Required fields
  • Value ranges
  • Backend availability
  • MODULES.toml syntax

Clone this wiki locally