Add incremental indexing with content-hash change detection#2
Open
Add incremental indexing with content-hash change detection#2
Conversation
When an existing index is found, `code-index analyze` now runs an incremental pipeline that detects changed files via git diff + SHA-256 content hashes and only re-embeds nodes from those files. Unchanged node vectors are copied from the existing FAISS index. Repeated runs on a dirty working tree correctly skip re-indexing because hashes are compared against what was actually indexed, not just git state. Falls back to full re-index when the old commit is unreachable (rebase) or no hash baseline exists yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These were pulled transitively (via sentence-transformers, optimum, fastembed), so different venvs could resolve to a broken trio where transformers demands huggingface-hub>=1.5 but the resolver picked an older hub. Pinning the 4.x / 0.x / 0.2x line keeps the three in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Added dependencies for ONNX, ONNX Runtime, and Optimum in `pyproject.toml`. - Updated `config.py` to set the embedding backend to ONNX, enabling faster inference on Apple Silicon and CUDA. - Enhanced `engine.py` to support ONNX model loading and embedding, including functions for selecting the best execution provider and ensuring model export. - Refactored embedding logic to utilize ONNX runtime, improving performance and compatibility with specific models. These changes enhance the embedding engine's efficiency and broaden its compatibility with various hardware setups.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
code-index analyzenow auto-runs an incremental pipeline instead of refusing or re-indexing from scratchreconstruct()Performance on this repo (37 files, 306 embeddable nodes, 1 file changed):
Test plan
--forceestablishes hash baseline and works as before--forceflag still triggers full re-index🤖 Generated with Claude Code