Add incremental indexing with content-hash change detection by nnourr · Pull Request #2 · nnourr/code-index

nnourr · 2026-04-16T04:19:28Z

Summary

When an existing index is found, code-index analyze now auto-runs an incremental pipeline instead of refusing or re-indexing from scratch
Changed files are detected via git diff (committed) + SHA-256 content hashes (uncommitted), so repeated runs on a dirty working tree skip re-indexing correctly
Only nodes from changed files are re-embedded; unchanged vectors are copied from the existing FAISS index via reconstruct()
Falls back to full re-index when the old commit is unreachable (rebase/force-push) or no hash baseline exists yet

Performance on this repo (37 files, 306 embeddable nodes, 1 file changed):

Full index: ~55s → Incremental: ~3.6s (~15x faster)

Test plan

Full index with --force establishes hash baseline and works as before
Incremental run with no changes detects "No files changed" instantly
Incremental run after editing 1 file re-embeds only that file's nodes
Second incremental run (no new edits) correctly skips — hashes match what was indexed
Search works correctly after incremental update
--force flag still triggers full re-index

🤖 Generated with Claude Code

When an existing index is found, `code-index analyze` now runs an incremental pipeline that detects changed files via git diff + SHA-256 content hashes and only re-embeds nodes from those files. Unchanged node vectors are copied from the existing FAISS index. Repeated runs on a dirty working tree correctly skip re-indexing because hashes are compared against what was actually indexed, not just git state. Falls back to full re-index when the old commit is unreachable (rebase) or no hash baseline exists yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These were pulled transitively (via sentence-transformers, optimum, fastembed), so different venvs could resolve to a broken trio where transformers demands huggingface-hub>=1.5 but the resolver picked an older hub. Pinning the 4.x / 0.x / 0.2x line keeps the three in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Added dependencies for ONNX, ONNX Runtime, and Optimum in `pyproject.toml`. - Updated `config.py` to set the embedding backend to ONNX, enabling faster inference on Apple Silicon and CUDA. - Enhanced `engine.py` to support ONNX model loading and embedding, including functions for selecting the best execution provider and ensuring model export. - Refactored embedding logic to utilize ONNX runtime, improving performance and compatibility with specific models. These changes enhance the embedding engine's efficiency and broaden its compatibility with various hardware setups.

nnourr and others added 3 commits April 16, 2026 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add incremental indexing with content-hash change detection#2

Add incremental indexing with content-hash change detection#2
nnourr wants to merge 3 commits intomainfrom
tricky-chips

nnourr commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nnourr commented Apr 16, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant