Skip to content

fix: index growth, stale entries, atomics, git HEAD perf, snapshot robustness#255

Merged
justrach merged 4 commits intomainfrom
fix/index-id-growth-stale-entries
Apr 13, 2026
Merged

fix: index growth, stale entries, atomics, git HEAD perf, snapshot robustness#255
justrach merged 4 commits intomainfrom
fix/index-id-growth-stale-entries

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Nine correctness and performance fixes across index, explorer, watcher, store, nuke, and snapshot. All 341 tests pass.

Index (#246 #247 #227)

  • TrigramIndex.removeFile always removes the path from path_to_id first, preventing a stale entry that caused a subsequent indexFile call to skip re-registration
  • TrigramIndex free-list: freed doc_id slots are pushed to a free_ids: ArrayList(u32) and reused by getOrCreateDocId, preventing id_to_path from growing without bound across repeated remove+re-index cycles
  • PostingList.removeDocId: replaced linear scan with binary search matching the existing getOrAddPosting pattern — O(log n) instead of O(n)
  • AnyTrigramIndex mmap_overlay: candidates and candidatesRegex now explicitly deinit the result ArrayList on the error path, plugging a buffer leak on OOM

Explorer (#250 #252)

  • commitParsedFileOwnedOutline errdefer: if trigram_index.indexFile fails after word_index.indexFile succeeds, word_index is rolled back to the previous content, keeping the two indexes consistent
  • searchContent fallback: Explorer tracks files indexed with skip_trigram=true in a skip_trigram_files StringHashMap; the content-search fallback iterates only that set instead of all outlines — O(skip-trigram files) not O(all files)

Nuke (#249)

  • rewriteConfigFile: writes to {path}.tmp, syncs, then renames atomically — prevents an empty config file if the process is killed mid-write

Snapshot (#253)

  • readSectionsFromFile: extracted shared header-parsing helper so readSections and readSectionBytes don't each open the file; readSectionBytes now opens once
  • readSectionString limit: raised from 4096 to std.math.maxInt(u16) so symbol names longer than 4 KiB are accepted
  • loadSnapshotFast: corrupt OUTLINE_STATE section now falls back to an empty map instead of propagating the error

Watcher (#254)

  • incrementalLoop git HEAD check: stats .git/HEAD mtime before forking git rev-parse HEAD; the subprocess is skipped entirely when mtime is unchanged, eliminating a 2s-cadence fork+exec on every poll cycle

Store / EventQueue (cleanup)

  • EventQueue: head/tail were std.atomic.Value(usize) even though every access already holds self.mu; replaced with plain usize
  • Store.seq: was std.atomic.Value(u64) mutated exclusively inside the mutex; changed to plain u64; currentSeq() acquires the mutex so external readers are safe

Test plan

  • zig build test — all 341 tests pass
  • test "issue-246" — removeFile stale path_to_id
  • test "issue-247" — id_to_path does not grow on re-index
  • test "issue-227" — id_to_path stays bounded across many re-indexes
  • test "issue-248" — PostingList.removeDocId sorted order preserved
  • test "issue-249" — nuke.removeJsonMcpServerEntry null on missing key
  • test "issue-250" — searchContent finds skip-trigram files

🤖 Generated with Claude Code

justrach and others added 4 commits April 12, 2026 09:46
…ale state

Fixes #227, #246, #247, #248.

Three interrelated bugs in TrigramIndex are fixed together:

1. removeFile (#246): moved path_to_id.remove() before the file_trigrams guard so
   the mapping is always cleaned even when file_trigrams has no entry (leftover from
   a partial OOM-failed indexFile).

2. id_to_path growth (#227, #247): removeFile now adds the freed doc_id to a new
   free_ids freelist and marks the id_to_path slot as "". getOrCreateDocId pops from
   free_ids first, reusing the old slot instead of appending a new one.  After N
   re-indexes of the same file, id_to_path.items.len stays bounded by the number of
   unique files ever indexed.

3. PostingList sorted invariant: reused doc_ids are not max, so plain append would
   break the binary-search invariant.  indexFile now detects whether a slot was
   reused (id_to_path did not grow) and uses getOrAddPosting (sorted insert) for
   reused doc_ids, keeping append for new files.

4. PostingList.removeDocId (#248): replaced O(n) linear scan with the same
   binary-search pattern used by getByDocId — O(log n) search + single orderedRemove.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e, snapshot double-open, searchContent O(n) fallback

Fixes #249, #250, #251, #252, #253.

index.zig (#251): AnyTrigramIndex.candidates and candidatesRegex mmap_overlay
branches no longer leak the result ArrayList's backing buffer when
toOwnedSlice fails under OOM — explicit deinit on the error path.

nuke.zig (#249): rewriteConfigFile now writes to a {path}.tmp file first and
renames atomically, preventing an empty config file if the process is killed
mid-write.  Callers updated to thread the allocator through.

explore.zig (#252): commitParsedFileOwnedOutline adds an errdefer immediately
after word_index.indexFile so that a subsequent trigram_index OOM failure
rolls back word_index to the previous content, keeping the two indexes in sync.

explore.zig (#250): Explorer gains a skip_trigram_files StringHashMap.  Files
indexed with skip_trigram=true are tracked in this set; the searchContent
fallback loop now iterates only skip_trigram_files instead of all outlines,
reducing the fallback from O(all files) to O(skip-trigram files).

snapshot.zig (#253): extracted readSectionsFromFile(file, allocator) helper so
both readSections and readSectionBytes share the header-parsing logic.
readSectionBytes now opens the file once and calls the helper, eliminating the
redundant second openFile call for each section read.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…obustness

watcher.zig (#254): incrementalLoop now stats .git/HEAD mtime before
spawning git rev-parse HEAD.  If the mtime is unchanged we skip the
fork+exec entirely, eliminating a 2-second-cadence subprocess that
accounted for the majority of codedb's background CPU on large repos.

watcher.zig: EventQueue.head/tail were std.atomic.Value(usize) even
though every access (push and pop) already holds self.mu.  Replaced with
plain usize fields; the mutex provides all required ordering guarantees.

store.zig: Store.seq was std.atomic.Value(u64) even though the only
mutation site (appendVersion) holds self.mu.  Changed to a plain u64;
currentSeq() now also acquires the mutex so the type is correct.

snapshot.zig: readSectionString limit raised from 4096 to
std.math.maxInt(u16) so symbol names longer than 4 KiB are accepted.
loadSnapshotFast treats a corrupt OUTLINE_STATE section as an empty map
rather than propagating the error, matching the ba13aed fix on the
feature/243 branch.

lib.zig: export snapshot module so callers can reach it through lib.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 692893 683056 -1.42% OK
codedb_changes 117165 136244 +16.28% FAIL
codedb_deps 33004 33378 +1.13% OK
codedb_edit 20577 24426 +18.71% FAIL
codedb_find 159893 147323 -7.86% OK
codedb_hot 161538 160795 -0.46% OK
codedb_outline 478498 461208 -3.61% OK
codedb_read 156009 152591 -2.19% OK
codedb_search 295899 286993 -3.01% OK
codedb_snapshot 4280905 4276332 -0.11% OK
codedb_status 159259 159673 +0.26% OK
codedb_symbol 69438 69443 +0.01% OK
codedb_tree 99294 83436 -15.97% OK
codedb_word 103427 100677 -2.66% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bdb8784109

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/watcher.zig
{
var root_dir = std.fs.cwd().openDir(root, .{}) catch break :blk false;
defer root_dir.close();
const st = compat.dirStatFile(root_dir, ".git/HEAD") catch break :blk false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve HEAD path before mtime gating git checks

incrementalLoop now treats a failed stat of .git/HEAD as “no change”, which breaks branch-change detection in valid repos where .git is a file (e.g., Git worktrees/submodules) instead of a directory. In that setup this line returns false on every poll, git rev-parse HEAD is never called again, and the watcher never triggers the re-scan path after checkouts/switches, leaving explorer/store state stale for the active branch.

Useful? React with 👍 / 👎.

Comment thread src/explore.zig
Comment on lines 293 to +295
self.trigram_index.removeFile(stable_path);
self.sparse_ngram_index.removeFile(stable_path);
try self.skip_trigram_files.put(stable_path, {});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove skip-trigram entries when files are deleted

The new skip-trigram tracking inserts paths into skip_trigram_files, but there is no corresponding removal on file deletion (Explorer.removeFile). That means deleted paths remain in this map permanently, and searchContent’s fallback loop keeps iterating stale entries and attempting file reads for missing files. In long-running sessions with high file churn, this regresses the intended optimization and causes unbounded growth in fallback work.

Useful? React with 👍 / 👎.

@justrach justrach merged commit bdb8784 into main Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant