bug: TrigramIndex.id_to_path grows unboundedly on re-index — 425 MB/min memory growth

## Summary

`TrigramIndex.id_to_path` grows unboundedly when files are re-indexed. Each call to `indexFile` for an already-indexed file appends a new doc_id to `id_to_path` without reclaiming the old slot, causing monotonic memory growth proportional to the number of re-index cycles.

On a project with active file watching, this causes **425 MB/min** private memory growth (observed on a ~22K file repo).

## Root Cause

In `src/index.zig`, the `indexFile` → `removeFile` → `getOrCreateDocId` sequence:

1. `indexFile` (line 583) calls `self.removeFile(path)` 
2. `removeFile` (line 580) removes path from `path_to_id` but does NOT touch `id_to_path` — the old slot remains allocated
3. `indexFile` (line 586) calls `getOrCreateDocId(path)`
4. `getOrCreateDocId` (line 554) checks `path_to_id` — path was just removed — so it falls through to line 555-557: appends a **new** entry to `id_to_path` with a new doc_id

The old `id_to_path[old_doc_id]` slot is never cleared, never reused. After K re-indexes of the same file, `id_to_path` has K entries for it — only the last is reachable via `path_to_id`.

### Compounding triggers

Two watcher paths amplify this:

- **git HEAD change** (`watcher.zig` line 466): branch switch re-indexes every file in the project. A single checkout on a 5000-file project appends 5000 stale doc_id entries.
- **`drainNotifyFile`** (`watcher.zig` line 693): every notified path triggers a full re-index with no dedup against current state.

Both paths call `indexFileContent` → `explorer.indexFile` → `TrigramIndex.indexFile`, hitting this same accumulation.

### Secondary effect: incorrect search results

Stale `id_to_path` entries cause `TrigramIndex.candidates()` (line 747) to yield doc_ids that are no longer in `path_to_id`, producing phantom search results for files that no longer match the query.

## Reproduction

```bash
# Start MCP server on a repo with file watching active
codedb --mcp

# In another terminal, touch files rapidly to trigger re-indexing
for i in $(seq 1 100); do
  touch src/*.zig
  sleep 2
done

# Monitor RSS — it should stay flat, but grows ~N_files * sizeof(entry) per cycle
```

Alternatively, switch git branches repeatedly on a large repo:
```bash
for i in $(seq 1 20); do
  git checkout main
  git checkout feature-branch
  sleep 3
done
```

## Measurements

| Scenario | id_to_path growth | Private bytes growth |
|----------|-------------------|---------------------|
| Startup, no re-indexing | 0 | Stable |
| 2s poll cycle, 1000 files with mtime changes | +1000 entries/cycle | ~425 MB/min observed |
| Single git branch switch, 5000 files | +5000 entries | Instant spike |

## Fix options

1. **Reuse doc_id**: In `removeFile`, keep the `path_to_id` mapping but clear the postings. Then `getOrCreateDocId` returns the existing id instead of appending. Requires adjusting `removeFile` to mark the slot as "cleared" rather than removing from `path_to_id`.

2. **Free-list**: When `removeFile` removes a doc_id, push it onto a free-list. `getOrCreateDocId` pops from the free-list before appending. Minimal changes, no behavioral difference.

3. **Periodic compaction**: Rebuild `id_to_path` when stale ratio exceeds a threshold. Higher latency but simplest correctness argument.

Option 2 is likely the best balance of simplicity and efficiency.

## Related

- #210 — ProjectCache retains raw file contents (separate static retention bug, same symptom of high RSS)
- #128 — original MCP memory leak report (closed, primary explorer fixed, but TrigramIndex accumulation was not addressed)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: TrigramIndex.id_to_path grows unboundedly on re-index — 425 MB/min memory growth #227

Summary

Root Cause

Compounding triggers

Secondary effect: incorrect search results

Reproduction

Measurements

Fix options

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scenario	id_to_path growth	Private bytes growth
Startup, no re-indexing	0	Stable
2s poll cycle, 1000 files with mtime changes	+1000 entries/cycle	~425 MB/min observed
Single git branch switch, 5000 files	+5000 entries	Instant spike

bug: TrigramIndex.id_to_path grows unboundedly on re-index — 425 MB/min memory growth #227

Description

Summary

Root Cause

Compounding triggers

Secondary effect: incorrect search results

Reproduction

Measurements

Fix options

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions