Skip to content

Incremental Indexing

Lisa edited this page Dec 22, 2025 · 7 revisions

Incremental Indexing

Incremental indexing makes SCIP index updates O(changed files) instead of O(entire repo). After editing a file, the index updates in seconds instead of requiring a full reindex.

Availability: Go projects only (v7.4+). Other languages fall back to full reindexing.

Why Incremental Indexing?

Full SCIP indexing scans your entire codebase, which can take 30+ seconds for large projects. This creates friction:

  • During development: You edit one file but wait 30s for the index to update
  • In CI/CD: Every commit triggers a full reindex even if only one file changed
  • With watch mode: Frequent reindexes burn CPU and slow down your machine

Incremental indexing solves this by only processing changed files.

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│ Change Detection│ ──► │ SCIP Extraction  │ ──► │ Delta Application│
│ (git diff -z)   │     │ (changed only)   │     │ (delete+insert) │
└─────────────────┘     └──────────────────┘     └─────────────────┘

1. Change Detection

CKB detects changes using git:

git diff --name-status -z <last-indexed-commit> HEAD

The -z flag uses NUL separators, correctly handling paths with spaces or special characters.

Tracked change types:

  • Added - New .go files
  • Modified - Changed .go files
  • Deleted - Removed .go files
  • Renamed - Moved/renamed .go files (tracks old path for cleanup)

Fallback: For non-git repos, CKB falls back to hash-based comparison against stored file hashes.

2. SCIP Extraction

CKB runs scip-go to regenerate the full SCIP index (protobuf doesn't support partial updates), but then:

  1. Loads the index into memory
  2. Iterates documents, only processing those in the changed set
  3. Extracts symbols and references for changed files only
  4. Skips unchanged documents entirely

This means even though scip-go runs on the full codebase, CKB only does the expensive database work for changed files.

3. Delta Application

For each changed file, CKB applies updates using delete+insert:

Modified file.go:
  1. DELETE FROM file_symbols WHERE file_path = 'file.go'
  2. DELETE FROM indexed_files WHERE path = 'file.go'
  3. INSERT new symbols and file state

Renamed old.go → new.go:
  1. DELETE using old path
  2. INSERT using new path

This approach is simple and correct—no complex diffing logic.

Usage

Default Behavior (Go Projects)

# Incremental by default
ckb index

# Output for incremental update:
Incremental Index Complete
--------------------------
Files:   3 modified, 1 added, 0 deleted
Symbols: 15 added, 8 removed
Refs:    42 updated
Time:    1.2s
Commit:  abc1234 (+dirty)

Accuracy:
  OK  Go to definition     - accurate
  OK  Find refs (forward)  - accurate
  !!  Find refs (reverse)  - may be stale
  !!  Call graph           - may be stale

Run 'ckb index --force' for full accuracy (47 files since last full)

Force Full Reindex

# Full reindex (ignores incremental)
ckb index --force

Use --force when:

  • You need 100% accurate reverse references
  • Call graph analysis is critical
  • After major refactoring across many files
  • When incremental reports issues

Accuracy Guarantees

Incremental indexing maintains forward accuracy but may have stale reverse references.

Query Type After Incremental Notes
Go to definition Always accurate Definitions are in the changed files
Find refs FROM changed files Always accurate We re-extracted these references
Find refs TO symbols in changed files May be stale Other files' refs weren't updated
Call graph (callees) Always accurate We know what changed files call
Call graph (callers) May be stale Other files' calls weren't updated
Symbol search Always accurate Symbol table is fully updated

Why Reverse References May Be Stale

Consider this scenario:

// utils.go (unchanged)
func Helper() { ... }

// main.go (changed - removed call to Helper)
func main() {
    // Helper()  <- removed this line
}

After incremental indexing:

  • main.go is re-indexed correctly (no longer references Helper)
  • utils.go is NOT re-indexed (unchanged)
  • CKB's stored references still show main.goHelper from utils.go's perspective

This is the "caller-owned edges" invariant: references are owned by the FROM file, not the TO file.

Impact: When you ask "what calls Helper?", CKB might still show the deleted call from main.go until you run ckb index --force.

When Full Reindex Is Required

CKB automatically triggers a full reindex when:

Condition Reason
No previous index Nothing to diff against
Schema version mismatch Database structure changed
No tracked commit Can't compute git diff
>50% files changed Incremental overhead exceeds full reindex

You'll see messages like:

Full reindex required: schema version mismatch (have 5, need 6)

Index State Tracking

CKB tracks index state in the database:

Index State:
  State: partial (3 files since last full)
  Commit: abc1234
  Dirty: yes (uncommitted changes)

States:

  • full - Complete reindex, all references accurate
  • partial - Incremental updates applied, reverse refs may be stale
  • full_dirty / partial_dirty - Uncommitted changes detected

Configuration

Incremental indexing uses these settings (in .ckb/config.json):

{
  "incremental": {
    "threshold": 50,
    "indexTests": false,
    "excludes": ["vendor", "testdata"]
  }
}
Setting Default Description
threshold 50 Fall back to full reindex if >N% of files changed
indexTests false Include _test.go files in change detection
excludes ["vendor"] Paths to exclude from change detection

Performance Characteristics

Scenario Full Index Incremental
Small project (100 files) ~2s ~0.5s
Medium project (1000 files) ~15s ~1-2s
Large project (10000 files) ~60s ~2-5s
Single file change ~60s ~1s

The key insight: incremental time is proportional to changed files, not total files.

Limitations (V1)

Current limitations that may be addressed in future versions:

  1. Go only - Other languages always do full reindex
  2. Reverse refs may be stale - Use --force when accuracy is critical
  3. Call graph staleness - Callers of changed symbols may be outdated
  4. No partial SCIP - Still runs full scip-go, just processes less output

Troubleshooting

"Full reindex required" every time

Check that:

  1. You're in a git repository
  2. The previous index completed successfully
  3. Schema version matches (may need --force after CKB upgrade)

Incremental seems slow

If incremental takes as long as full reindex:

  1. Check how many files changed (git status)
  2. If >50% changed, CKB falls back to full automatically
  3. Large individual files still take time to process

Stale references causing issues

If you're seeing phantom references:

# Force full reindex
ckb index --force

This rebuilds all references from scratch.

Related

Clone this wiki locally