fix: index growth, stale entries, atomics, git HEAD perf, snapshot robustness#255
fix: index growth, stale entries, atomics, git HEAD perf, snapshot robustness#255
Conversation
…ale state Fixes #227, #246, #247, #248. Three interrelated bugs in TrigramIndex are fixed together: 1. removeFile (#246): moved path_to_id.remove() before the file_trigrams guard so the mapping is always cleaned even when file_trigrams has no entry (leftover from a partial OOM-failed indexFile). 2. id_to_path growth (#227, #247): removeFile now adds the freed doc_id to a new free_ids freelist and marks the id_to_path slot as "". getOrCreateDocId pops from free_ids first, reusing the old slot instead of appending a new one. After N re-indexes of the same file, id_to_path.items.len stays bounded by the number of unique files ever indexed. 3. PostingList sorted invariant: reused doc_ids are not max, so plain append would break the binary-search invariant. indexFile now detects whether a slot was reused (id_to_path did not grow) and uses getOrAddPosting (sorted insert) for reused doc_ids, keeping append for new files. 4. PostingList.removeDocId (#248): replaced O(n) linear scan with the same binary-search pattern used by getByDocId — O(log n) search + single orderedRemove. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e, snapshot double-open, searchContent O(n) fallback Fixes #249, #250, #251, #252, #253. index.zig (#251): AnyTrigramIndex.candidates and candidatesRegex mmap_overlay branches no longer leak the result ArrayList's backing buffer when toOwnedSlice fails under OOM — explicit deinit on the error path. nuke.zig (#249): rewriteConfigFile now writes to a {path}.tmp file first and renames atomically, preventing an empty config file if the process is killed mid-write. Callers updated to thread the allocator through. explore.zig (#252): commitParsedFileOwnedOutline adds an errdefer immediately after word_index.indexFile so that a subsequent trigram_index OOM failure rolls back word_index to the previous content, keeping the two indexes in sync. explore.zig (#250): Explorer gains a skip_trigram_files StringHashMap. Files indexed with skip_trigram=true are tracked in this set; the searchContent fallback loop now iterates only skip_trigram_files instead of all outlines, reducing the fallback from O(all files) to O(skip-trigram files). snapshot.zig (#253): extracted readSectionsFromFile(file, allocator) helper so both readSections and readSectionBytes share the header-parsing logic. readSectionBytes now opens the file once and calls the helper, eliminating the redundant second openFile call for each section read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…obustness watcher.zig (#254): incrementalLoop now stats .git/HEAD mtime before spawning git rev-parse HEAD. If the mtime is unchanged we skip the fork+exec entirely, eliminating a 2-second-cadence subprocess that accounted for the majority of codedb's background CPU on large repos. watcher.zig: EventQueue.head/tail were std.atomic.Value(usize) even though every access (push and pop) already holds self.mu. Replaced with plain usize fields; the mutex provides all required ordering guarantees. store.zig: Store.seq was std.atomic.Value(u64) even though the only mutation site (appendVersion) holds self.mu. Changed to a plain u64; currentSeq() now also acquires the mutex so the type is correct. snapshot.zig: readSectionString limit raised from 4096 to std.math.maxInt(u16) so symbol names longer than 4 KiB are accepted. loadSnapshotFast treats a corrupt OUTLINE_STATE section as an empty map rather than propagating the error, matching the ba13aed fix on the feature/243 branch. lib.zig: export snapshot module so callers can reach it through lib. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bdb8784109
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| var root_dir = std.fs.cwd().openDir(root, .{}) catch break :blk false; | ||
| defer root_dir.close(); | ||
| const st = compat.dirStatFile(root_dir, ".git/HEAD") catch break :blk false; |
There was a problem hiding this comment.
Resolve HEAD path before mtime gating git checks
incrementalLoop now treats a failed stat of .git/HEAD as “no change”, which breaks branch-change detection in valid repos where .git is a file (e.g., Git worktrees/submodules) instead of a directory. In that setup this line returns false on every poll, git rev-parse HEAD is never called again, and the watcher never triggers the re-scan path after checkouts/switches, leaving explorer/store state stale for the active branch.
Useful? React with 👍 / 👎.
| self.trigram_index.removeFile(stable_path); | ||
| self.sparse_ngram_index.removeFile(stable_path); | ||
| try self.skip_trigram_files.put(stable_path, {}); |
There was a problem hiding this comment.
Remove skip-trigram entries when files are deleted
The new skip-trigram tracking inserts paths into skip_trigram_files, but there is no corresponding removal on file deletion (Explorer.removeFile). That means deleted paths remain in this map permanently, and searchContent’s fallback loop keeps iterating stale entries and attempting file reads for missing files. In long-running sessions with high file churn, this regresses the intended optimization and causes unbounded growth in fallback work.
Useful? React with 👍 / 👎.
Summary
Nine correctness and performance fixes across index, explorer, watcher, store, nuke, and snapshot. All 341 tests pass.
Index (#246 #247 #227)
path_to_idfirst, preventing a stale entry that caused a subsequentindexFilecall to skip re-registrationfree_ids: ArrayList(u32)and reused bygetOrCreateDocId, preventingid_to_pathfrom growing without bound across repeated remove+re-index cyclesgetOrAddPostingpattern — O(log n) instead of O(n)candidatesandcandidatesRegexnow explicitlydeinitthe result ArrayList on the error path, plugging a buffer leak on OOMExplorer (#250 #252)
trigram_index.indexFilefails afterword_index.indexFilesucceeds, word_index is rolled back to the previous content, keeping the two indexes consistentskip_trigram=truein askip_trigram_filesStringHashMap; the content-search fallback iterates only that set instead of all outlines — O(skip-trigram files) not O(all files)Nuke (#249)
{path}.tmp, syncs, then renames atomically — prevents an empty config file if the process is killed mid-writeSnapshot (#253)
readSectionsandreadSectionBytesdon't each open the file;readSectionBytesnow opens oncestd.math.maxInt(u16)so symbol names longer than 4 KiB are acceptedWatcher (#254)
.git/HEADmtime before forkinggit rev-parse HEAD; the subprocess is skipped entirely when mtime is unchanged, eliminating a 2s-cadence fork+exec on every poll cycleStore / EventQueue (cleanup)
head/tailwerestd.atomic.Value(usize)even though every access already holdsself.mu; replaced with plainusizestd.atomic.Value(u64)mutated exclusively inside the mutex; changed to plainu64;currentSeq()acquires the mutex so external readers are safeTest plan
zig build test— all 341 tests passtest "issue-246"— removeFile stale path_to_idtest "issue-247"— id_to_path does not grow on re-indextest "issue-227"— id_to_path stays bounded across many re-indexestest "issue-248"— PostingList.removeDocId sorted order preservedtest "issue-249"— nuke.removeJsonMcpServerEntry null on missing keytest "issue-250"— searchContent finds skip-trigram files🤖 Generated with Claude Code