Skip to content

Incremental Indexing

Lisa edited this page Dec 22, 2025 · 7 revisions

Incremental Indexing

Incremental indexing makes SCIP index updates O(changed files) instead of O(entire repo). After editing a file, the index updates in seconds instead of requiring a full reindex.

Availability: Go projects only (v7.4+). Other languages fall back to full reindexing.

v1.1 (v7.4): Adds incremental callgraph maintenance—outgoing calls from changed files are always accurate.

v2.0 (v7.4): Adds transitive invalidation—files depending on changed files can be automatically queued for rescanning.

Why Incremental Indexing?

Full SCIP indexing scans your entire codebase, which can take 30+ seconds for large projects. This creates friction:

  • During development: You edit one file but wait 30s for the index to update
  • In CI/CD: Every commit triggers a full reindex even if only one file changed
  • With watch mode: Frequent reindexes burn CPU and slow down your machine

Incremental indexing solves this by only processing changed files.

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌──────────────────┐
│ Change Detection│ ──► │ SCIP Extraction  │ ──► │ Delta Application│ ──► │ Transitive       │
│ (git diff -z)   │     │ (symbols + calls)│     │ (delete+insert) │     │ Invalidation (v2)│
└─────────────────┘     └──────────────────┘     └─────────────────┘     └──────────────────┘

1. Change Detection

CKB detects changes using git:

git diff --name-status -z <last-indexed-commit> HEAD

The -z flag uses NUL separators, correctly handling paths with spaces or special characters.

Tracked change types:

  • Added - New .go files
  • Modified - Changed .go files
  • Deleted - Removed .go files
  • Renamed - Moved/renamed .go files (tracks old path for cleanup)

Fallback: For non-git repos, CKB falls back to hash-based comparison against stored file hashes.

2. SCIP Extraction

CKB runs scip-go to regenerate the full SCIP index (protobuf doesn't support partial updates), but then:

  1. Loads the index into memory
  2. Iterates documents, only processing those in the changed set
  3. Extracts symbols, references, and call edges for changed files only
  4. Resolves caller symbols (which function contains each call site)
  5. Skips unchanged documents entirely

This means even though scip-go runs on the full codebase, CKB only does the expensive database work for changed files.

Call Edge Extraction (v1.1): For each reference to a callable symbol (function/method), CKB:

  • Detects callables using symbol kind or the (). pattern in symbol IDs
  • Resolves the enclosing function as the caller
  • Stores edges with location info: (caller_file, line, column, callee_id)

3. Delta Application

For each changed file, CKB applies updates using delete+insert:

Modified file.go:
  1. DELETE FROM file_symbols WHERE file_path = 'file.go'
  2. DELETE FROM indexed_files WHERE path = 'file.go'
  3. DELETE FROM callgraph WHERE caller_file = 'file.go'  -- v1.1
  4. DELETE FROM file_deps WHERE dependent_file = 'file.go'  -- v2
  5. INSERT new symbols, file state, call edges, and dependencies

Renamed old.go → new.go:
  1. DELETE using old path (including callgraph, file_deps)
  2. INSERT using new path

This approach is simple and correct—no complex diffing logic. The caller-owned edges invariant means call edges are always deleted and rebuilt with their owning file.

4. Transitive Invalidation (v2)

When a file changes, other files that depend on it may have stale references. v2 adds transitive invalidation to track and optionally rescan these dependent files.

File Dependency Tracking:

  • CKB maintains a file_deps table: (dependent_file, defining_file)
  • When a.go references a symbol defined in b.go, CKB records a.go → b.go
  • Only internal dependencies are tracked (not stdlib/external packages)

Rescan Queue:

  • When b.go changes, files depending on it (a.go) are enqueued for rescanning
  • The queue tracks: file path, reason, BFS depth, and attempt count
  • Queue processing respects configurable budgets (max files, max time)

Usage

Default Behavior (Go Projects)

# Incremental by default
ckb index

# Output for incremental update:
Incremental Index Complete
--------------------------
Files:   3 modified, 1 added, 0 deleted
Symbols: 15 added, 8 removed
Refs:    42 updated
Calls:   127 edges updated
Time:    1.2s
Commit:  abc1234 (+dirty)
Pending: 5 files queued for rescan

Accuracy:
  OK  Go to definition     - accurate
  OK  Find refs (forward)  - accurate
  !!  Find refs (reverse)  - may be stale
  OK  Callees (outgoing)   - accurate
  !!  Callers (incoming)   - may be stale

Run 'ckb index --force' for full accuracy (47 files since last full)

Force Full Reindex

# Full reindex (ignores incremental)
ckb index --force

Use --force when:

  • You need 100% accurate reverse references
  • You need accurate caller information (who calls a function)
  • After major refactoring across many files
  • When incremental reports issues
  • To clear the rescan queue and start fresh

Transitive Invalidation Modes (v2)

CKB supports four invalidation modes:

Mode Behavior
none Disabled—no dependency tracking or invalidation
lazy Enqueue dependents, drain on next full reindex (default)
eager Enqueue and drain immediately (with budgets)
deferred Enqueue and drain periodically in background

Lazy Mode (Default)

In lazy mode, dependent files are queued but not immediately rescanned:

  • Low overhead during incremental indexing
  • Queue drains automatically on next ckb index --force
  • Best for development workflows where occasional staleness is acceptable

Eager Mode

In eager mode, CKB rescans dependent files immediately:

  • Higher accuracy after incremental updates
  • Respects budget limits to prevent runaway processing
  • Best when accuracy is critical

Configuration

{
  "incremental": {
    "threshold": 50,
    "indexTests": false,
    "excludes": ["vendor", "testdata"]
  },
  "transitive": {
    "enabled": true,
    "mode": "lazy",
    "depth": 1,
    "maxRescanFiles": 200,
    "maxRescanMs": 1500
  }
}
Setting Default Description
enabled true Enable transitive invalidation
mode lazy Invalidation mode: none, lazy, eager, deferred
depth 1 BFS cascade depth (1 = direct dependents only)
maxRescanFiles 200 Max files to rescan per drain run
maxRescanMs 1500 Max time (ms) per drain run (0 = unlimited)

Accuracy Guarantees

Incremental indexing maintains forward accuracy but may have stale reverse references. With v1.1, call graph accuracy is improved: outgoing calls (callees) are always accurate. With v2 in eager mode with queue drained, all queries are accurate.

Query Type After Incremental After Queue Drained
Go to definition Always accurate Always accurate
Find refs FROM changed files Always accurate Always accurate
Find refs TO symbols in changed files May be stale Accurate
Call graph (callees) Always accurate Always accurate
Call graph (callers) May be stale Accurate
Symbol search Always accurate Always accurate

Why Reverse References May Be Stale

Consider this scenario:

// utils.go (unchanged)
func Helper() { ... }

// main.go (changed - removed call to Helper)
func main() {
    // Helper()  <- removed this line
}

After incremental indexing:

  • main.go is re-indexed correctly (no longer references Helper)
  • utils.go is NOT re-indexed (unchanged)
  • CKB's stored references still show main.goHelper from utils.go's perspective

This is the "caller-owned edges" invariant: references are owned by the FROM file, not the TO file.

Impact: When you ask "what calls Helper?", CKB might still show the deleted call from main.go until you run ckb index --force.

With v2 eager mode: If you change helper.go, files that depend on it are automatically rescanned, keeping reverse references accurate.

Index State Tracking

CKB tracks index state in the database:

Index State:
  State: partial (3 files since last full)
  Commit: abc1234
  Dirty: yes (uncommitted changes)
  Pending: 5 files queued for rescan

States:

  • full - Complete reindex, all references accurate, queue empty
  • partial - Incremental updates applied, reverse refs may be stale
  • pending - Work queued in rescan queue (v2)
  • full_dirty / partial_dirty - Uncommitted changes detected

When Full Reindex Is Required

CKB automatically triggers a full reindex when:

Condition Reason
No previous index Nothing to diff against
Schema version mismatch Database structure changed
No tracked commit Can't compute git diff
>50% files changed Incremental overhead exceeds full reindex

You'll see messages like:

Full reindex required: schema version mismatch (have 7, need 8)

Performance Characteristics

Scenario Full Index Incremental
Small project (100 files) ~2s ~0.5s
Medium project (1000 files) ~15s ~1-2s
Large project (10000 files) ~60s ~2-5s
Single file change ~60s ~1s

The key insight: incremental time is proportional to changed files, not total files.

Transitive invalidation overhead (v2):

  • Lazy mode: negligible (~1ms to enqueue dependents)
  • Eager mode: depends on cascade size and budgets

Limitations (v2)

Current limitations:

  1. Go only - Other languages always do full reindex
  2. Reverse refs may be stale in lazy mode - Use eager mode or --force when accuracy is critical
  3. Callers may be stale - Incoming calls to changed symbols may be outdated until queue drains
  4. No partial SCIP - Still runs full scip-go, just processes less output
  5. External deps not tracked - Only internal file dependencies are tracked

Troubleshooting

"Full reindex required" every time

Check that:

  1. You're in a git repository
  2. The previous index completed successfully
  3. Schema version matches (may need --force after CKB upgrade)

Incremental seems slow

If incremental takes as long as full reindex:

  1. Check how many files changed (git status)
  2. If >50% changed, CKB falls back to full automatically
  3. Large individual files still take time to process

Stale references causing issues

If you're seeing phantom references:

# Force full reindex (also clears rescan queue)
ckb index --force

This rebuilds all references from scratch.

Too many pending rescans

If the rescan queue grows large:

# Check queue status
ckb status

# Force full reindex to clear queue
ckb index --force

Or increase budgets in configuration to process more files per run.

Related

Clone this wiki locally