Skip to content

index: TrigramIndex.id_to_path grows by one slot per re-index of the same file #247

@justrach

Description

@justrach

Problem

Extends #227. When indexFile is called on a path that already exists, removeFile is called first to clear old trigrams. But removeFile deletes the path_to_id entry and leaves the id_to_path slot as a dead entry — it does not compact the array.

On the next indexFile call, getOrCreateDocId misses in path_to_id (entry was deleted) and appends a new slot to id_to_path. Over N re-indexes, id_to_path.items.len grows by 1 each time, wasting memory proportional to O(re-index count × files).

This is the same root cause as #227 but manifests even for a single file re-indexed repeatedly.

Failing Test

test "issue-247: TrigramIndex.id_to_path does not grow on re-index of same file" {
    var idx = TrigramIndex.init(testing.allocator);
    defer idx.deinit();

    const src = "fn alpha() void {} fn beta() void {} const X = 1;";
    var i: usize = 0;
    while (i < 5) : (i += 1) {
        try idx.indexFile("f.zig", src);
    }

    // Currently FAILS: id_to_path.items.len == 5 (grows by 1 per re-index).
    try testing.expectEqual(@as(usize, 1), idx.id_to_path.items.len);
}

Expected

After N re-indexes of the same file, id_to_path.items.len equals the number of unique files indexed, not the number of indexFile calls.

Fix

Option A: make removeFile also remove the id_to_path slot and compact (swap-remove), updating path_to_id for the moved entry.

Option B: make getOrCreateDocId reuse existing slots — check path_to_id AND id_to_path before allocating a new id.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions