Problem
PostingList maintains items sorted by doc_id. getByDocId and containsDocId already use std.sort.binarySearch for O(log n) lookup. But removeDocId uses an O(n) linear scan:
// src/index.zig — PostingList.removeDocId
pub fn removeDocId(self: *PostingList, doc_id: u32) void {
var i: usize = 0;
while (i < self.items.items.len) { // ← O(n), no early break
if (self.items.items[i].doc_id == doc_id) {
_ = self.items.orderedRemove(i);
} else { i += 1; }
}
}
For a PostingList with N entries, every file removal costs O(n) per trigram × number of trigrams per file ≈ O(n × trigrams). With 15k files at ~300 trigrams each, file removal is measurably slow.
Correctness Test (passes today, serves as regression guard)
test "issue-248: PostingList.removeDocId removes target and preserves sorted order" {
const PostingList = @import("index.zig").PostingList;
var list = PostingList{};
defer list.items.deinit(testing.allocator);
var id: u32 = 0;
while (id < 100) : (id += 1) {
const p = try list.getOrAddPosting(testing.allocator, id * 2);
p.loc_mask = 0xFF;
}
list.removeDocId(50);
try testing.expectEqual(@as(usize, 99), list.items.items.len);
try testing.expect(list.getByDocId(50) == null);
for (1..list.items.items.len) |k| {
try testing.expect(list.items.items[k].doc_id > list.items.items[k - 1].doc_id);
}
}
Fix
Replace the linear scan with std.sort.binarySearch to find the index, then a single orderedRemove. Since doc_ids are unique per list, there is at most one match.
Problem
PostingListmaintainsitemssorted bydoc_id.getByDocIdandcontainsDocIdalready usestd.sort.binarySearchfor O(log n) lookup. ButremoveDocIduses an O(n) linear scan:For a
PostingListwith N entries, every file removal costs O(n) per trigram × number of trigrams per file ≈ O(n × trigrams). With 15k files at ~300 trigrams each, file removal is measurably slow.Correctness Test (passes today, serves as regression guard)
Fix
Replace the linear scan with
std.sort.binarySearchto find the index, then a singleorderedRemove. Since doc_ids are unique per list, there is at most one match.