Skip to content

index: TrigramIndex.removeFile leaves stale path_to_id entry on early return #246

@justrach

Description

@justrach

Problem

TrigramIndex.removeFile has two guard clauses. The first checks path_to_id; the second checks file_trigrams. When path_to_id has an entry but file_trigrams does not (left by a partial indexFile that failed after getOrCreateDocId but before file_trigrams.put), removeFile returns early without removing the path_to_id entry.

The result: after calling removeFile("ghost.zig"), the file remains permanently searchable in the index mapping — every subsequent getOrCreateDocId("ghost.zig") gets the cached stale id instead of a fresh one.

// src/index.zig — removeFile
const doc_id = self.path_to_id.get(path) orelse return; // ← deletes nothing on miss
const trigrams = self.file_trigrams.getPtr(path) orelse return; // ← exits here, path_to_id still dirty

Failing Test

test "issue-246: TrigramIndex.removeFile cleans stale path_to_id left by failed indexFile" {
    var idx = TrigramIndex.init(testing.allocator);
    defer idx.deinit();

    try idx.path_to_id.put("ghost.zig", 0);
    try idx.id_to_path.append(testing.allocator, "ghost.zig");
    // file_trigrams intentionally has NO entry for "ghost.zig".

    idx.removeFile("ghost.zig");

    // Currently FAILS: removeFile returns early, leaving path_to_id dirty.
    try testing.expectEqual(@as(usize, 0), idx.path_to_id.count());
}

Expected

removeFile should remove path_to_id and id_to_path entries regardless of whether file_trigrams has an entry, so the index is left in a clean state.

Fix

Split the two guard clauses: always clean path_to_id (and id_to_path) on entry, then guard on file_trigrams only for the trigram-removal loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions